"A bad system will beat a good person every time" - Edwards Deming

This post is based on a tech talk I gave at eBay in early 2018. eBay had gone through a company-wide transformation to agile processes (where before this had been team-specific) and the main points I wanted to make where that it was important to make the hidden things the consumed people's time visible, explicit, and properly prioritized, if we want to improve throughput or flow. Note that this is not the same flow as Csíkszentmihályi's "being in the zone", but being in the zone can improve throughput.

It was a fun talk not least because of the inclusion of the "I Love Lucy" video. The original talk material was mostly bullet points; I have changed it to have a somewhat better narrative structure but haven't changed it too much.

We have too much Work in Progress (WIP)!

We take on too much work in progress (WIP) for many reasons, including:

  • It's hard to say no. We want to please people and be team players, and we defer to authority. So we say yes to random requests and our sprints are disrupted by unplanned work.
  • We frequently underestimate the effort involved in tasks, and we often have unclear priorities. So we end up having to scramble to meet deadlines because we have been working on the wrong things or gave bad dates that were communicated up and/or out and are now hard deadlines.
  • New things are often more fun to start than existing things are to finish, so we get to 90% done and then don't finish properly.

But too much WIP comes with a lot of associated problems:

  • delayed delivery of important features
  • throwaway work due to short-term hacks or unclear requirements that were worked on anyway
  • neglected work adding to technical debt and lower quality
  • increased costs
  • poor morale and burnout of personnel
  • context switching and multitasking, causing lack of flow (throughput)

So how can we improve flow? That is the topic for this post. You can probably guess that reducing WIP is key, but that's not all. To understand the problem better it is worth going on a short historical tour of operational optimization in the manufacturing world.

Henry Ford's Assembly Line (1908)

Henry Ford thought about the problem or improving throughput while lowering cost. He realized that if the primary objective of operations is to optimize throughput, then we want to produce at a constant (high) rate, and so he invented the assembly line, which is great for a large volume of homogenous parts.

VW Assembly Line

The problem with the assembly line is that work in progress cannot pile up or the entire line can come to a halt.

Or worse! If the line isn't halted, quality suffers badly.

So it is critical to determine when not to produce, and to abolish local efficiencies that can overload downstream stations.

Taichii Ohno, Toyota and the invention of Kanban (1948)

While Ford's assembly line was great for a limited variety of mass-produced items, in post-war Japan there was a demand for a more varied set of cars. Taichii Ohno of Toyota realized he could control the amount of work in progress by using containers of parts of fixed capacity. Upstream workers were not allowed to produce more than that many parts. This meant that inventory did not build up but that there was still a buffer that downstream workers could pull from, and that buffer was refilled "just in time" (so the system is often called JIT).

Every station had an "Andon cord" which could be used to signal an issue at that station or with the quality of parts arriving from upstream.

Furthermore, by reducing the size of the containers/buffers slowly, new issues would reveal themselves, and the lessons learned could be used to iterate and improve the system, allowing continuous improvement.

toyota production system andon cord example

This system was still sensitive to externalities like market demand: because it reduced idle inventory a sudden surge in orders could mean delays. But it became the state of the art post World War II and was studied and emulated world-wide.

Goldratt's Theory of Constraints (1984)

Eliyahu Goldratt was an Israeli academic who joined a software company, Creative Output, that made one of the first programs to do finite capacity scheduling for manufacturing. In the course of his work he realized a better way to limit WIP is to focus on time. He noticed that every system has one bottleneck tighter than all the others, which limits the throughput of the system as a whole, and that the only way to improve the overall throughput of the system is to improve the throughput at that constraint.

Ironically his "Theory of Constraints" or TOC, published in the book The Goal, was not well-received at the company and he was eventually fired along with a number of his collaborators. But The Goal has since been recognized as one of the 25 Most Influential Business Management Books by Time Magazine.

An easy way to understand the concept of the constraint, and how the optimal throughput of the overall system matches the optimal throughput of the system, is to think of a busy 3-lane highway with a partial lane closure. At the point where the lane restriction starts, there is typically a traffic jam caused by cars merging, and the remaining two open lanes end up underutilized because cars enter the area slower. The end result may be less efficient than a highway that is 2-lanes throughout (it certainly can't be more efficient).

An important lesson from this is that improvements in efficiency that are not at the limiting constraint can actually make things worse! Making the 3-lane highway 4-lanes while still closing down to just 2 can make the bottleneck even more severe and slow. So optimizing parts of the system without fixing the core constraint may give an illusion of improvement but that's all it is; if it is upstream of the bottleneck it will make things worse at the bottleneck, while if it is downstream it is simply going to be wasted excess capacity sitting idle.

Goldratt came up with the analogy of the drum-buffer-rope to help explain his theory:

  • the drum is the processing rate of the bottleneck that determines the pace at which the entire system should work
  • the rope is a signal that “pulls” a new item of work into the pipe only when an item is processed by the bottleneck
  • as every unit of lost time at the bottleneck is lost time for the entire system, the bottleneck should never go idle, so there needs to be a buffer of work just upstream ready to be processed

DBR Visualized

Goldratt suggested 5 steps to optimizing a system based on TOC:

  • identify the constraint
  • optimize the use of the constraint (implement the "buffer")
  • subordinate all non-constraints to the constraint (add the "rope")
  • elevate the constraint (i.e. add capacity or make changes to increase its capacity, to speed up the "drum")
  • rinse and repeat, because the constraint is now likely somewhere else

But This is Manufacturing, What about Knowledge Work?...

All of the above theories come from manufacturing, where tasks tend to be highly repetitive and take more-or-less fixed amounts of time. But we can still think of knowledge work as an assembly line!

Sample Kanban Board

The "Efficiency Paradox"

A company where everyone is busy working is terribly inefficient. The only way that’s possible is if everyone is optimizing their own productivity, at the expense of the bottleneck’s productivity!

For the bottleneck to be fully utilized, all other parts must have excess capacity, which directly contradicts the conventional wisdom of “everyone stay busy,” (using all available capacity). Instead, no team should take on more work than their bottleneck can process, and one of the tasks of management is to determine the capacity of the bottleneck, fill it, and then allow no more projects to begin until one is completed.

And then to fix the bottleneck :-)

As for Individual Contributors: if you don't have enough work to keep you busy, first try to help those who are overloaded, rather than coming up with new projects!

So how do you identify the bottleneck? In many cases, if you have a Kanban board it is fairly obvious, as work is piling up immediately upstream. If you keep track of agile metrics, the constraint will be the step with the longest average cycle time. In the case that it is not so obvious, you can hypothesize what it might be (by looking for scarce resources or asking people), and try adding capacity at that point. If overall throughput improves, it was the constraint; if the added capacity is idle, the constraint is upstream, else the constraint is downstream (and throughput may be worse!)

Note that constraints are often just business policies! You may have policies that block the completion of work items at some stage and that could be the bottleneck.

I should note here that if you have simple processes and largely interchangeable workers, then you may not need to worry about much of this optimization; it is mostly useful in complex situations with multiple roles and stages of work. I have worked in team like the former where optimization would have not changed anything, but also in teams where there are multiple stages of review blocked on the availability of various (non-interchangeable) people, where the places to focus attention are not always obvious.

Identifying WIP Limits at Non-Constraint Stages

If you're doing Kanban, you may want WIP limits in each of your swim lanes. It's usually pretty obvious what the WIP limit is at the constraint as it is the throughput of the whole system. What about determining WIP limits at other points? You can do this empirically for upstream swim lanes:

  • strictly enforce WIP limits and start reducing them
  • as WIP drops, the least constrained resources will start to run out of things to do, and will start sending less work to the constraint
  • the constraint throughput will stay the same or possibly even increase, as will overall throughput
  • eventually even the constraint will start having idle capacity and you will have gone too far; the sweet spot is just above this point (and then you may want to add a buffer).

Apart from reducing WIP coming from upstream and adding the buffer, there are other ways to optimize the constraint:

  • improve quality checks or other requirements on upstream work to reduce wasted time at the constraint
  • offload work from the constraint to others, if possible
  • add capacity if that is an option and other avenues have been exhausted.

It helps to identify sources of inefficiency at the constraint and improve them. For example:

  • too much partially completed or incomplete but neglected work (WIP)
  • unplanned work/randomizations
  • unnecessary work
  • unknown dependencies(software/hardware/people/resources/activities)
  • unclear and competing priorities

In order to understand what these are it is critical to make the work visible! You can't manage what you can't see. You should track anything that takes meaningful time, and even track small things that only one person knows how to do, or that impact other teams (unknown dependencies). Classifying these into different categories (e.g. bug fixes, feature work, deployments, meetings) and color-coding them is helpful.

Spend some time identifying pain points like too many meetings, conflicting priorities, interruptions, etc, and track the top ones - knowing how much time these suck up will help to make a case for reducing them or bringing them under control.

You can use horizontal swim lanes with WIP limits to limit different types of work if that is helpful.

Kanban with Horizontal WIP-Limited Swimlanes

Note that cycle time, lead time and throughput are all trailing indicators of problems but WIP is a leading indicator. Think of getting on a busy highway: you can tell when getting on if it is congested (high WIP) that it will take a long time to get to work (cycle time).

The Problem of Dependencies

Dependencies between teams can cause delays as coordination takes time and effort, and people may not be available when needed. Dependencies in code can cause delays or quality issues as changes in one place can break things in other places.

Every dependency you can eliminate, whether between people or systems, improves the chance you can deliver on time.

Often by the time you realize you have a dependency you are already in trouble, so it makes sense to identify dependencies as early as possible and try to reduce them.

If you can organize your teams around products rather than silos (whether component-based or discipline-based) that will help a lot. Otherwise implement processes to grease the wheels between teams with dependencies, and make sure the dependencies are visible across the teams. Use color-coding or dedicated swim lanes in your Kanban or Scrum boards to highlight issues with cross-team dependencies.

For dependencies in code, these can be identified with tools like dependency diagram generators. You can use the count of items dependencies to get approximation of risk. Refactor when possible to reduce dependencies.

Dealing with Unplanned Work

Unplanned work eats schedules and milestones for breakfast, by taking time away from planned work causing delays and decreased quality. Track planned vs. unplanned work over time and then allocate some WIP count to unplanned work (and/or capacity if time-boxing with Scrum), so that it becomes semi-planned. It may be helpful to have a designated person on a rotation to deal with unplanned work.

Dealing with Interruptions

Context-switching is expensive, and interruptions really break people's concentration. Have your individual contributors track interruptions over a sprint and get a sense of their cost, and their sources. Some tips for reducing interruptions and improving focus include:

  • consider using the Pomodoro technique
  • establish ground rules with your team and partners about when they can interrupt
  • plan your work, schedule it in Outlook, work your plan
  • aim for focused work in the morning and meetings etc. in the afternoon (see When by Daniel Pink)

The Importance of Prioritization

Without clear priorities, we often try to do things in parallel resulting in high WIP. Only one thing can be the most important thing, so it is important to be clear about what that is. With a properly prioritized backlog there should be much less temptation to take on multiple tasks.

To avoid arguments, it can be helpful to have a clear policy for determining priority. A useful one to use is Cost of Delay which combines both urgency and business value, but it is not always the easiest to measure. You can find a tutorial here.

Neglected Work and Technical Debt

We often start some work but never quite complete it. For example we may have varying level of test coverage depending on who did the work and how much time pressure was applied. We may have low priority bugs that fester for a long time. Eventually some of these neglected issues become more serious and costly to deal with, and now are harder to complete because the context and working memory is lost.

For incomplete features, it may well be better to kill/remove the code than to leave around in partially completed state. Avoid the sunk cost fallacy - only consider the incremental cost of completion to the expected return, not the already expended effort which is a non-recoverable cost.

To help avoid getting in this situation ion the first place, set SLAs on work in progress, and flag items that haven't moved and have exceeded the SLA. See if the flagged items can be killed or moved back to the backlog, or raise the priority and get it done.

Graham Wheeler on

Personality Patterns

Photo by Rex Pickar on Unsplash

The last post in this series covered the Five Factor Model of personality. In this post we'll dig into personality patterns that people can exhibit. Everyone has some combination of the five factors, but how does that combination manifest as a personality type?

There are many different models of personality types, but one used in psychology and psychoanalysis is the categorization in the DSM - the Diagnostic and Statistical Manual of Mental Disorders. This is a somewhat controversial publication that categorizes a number of maladaptive personality categories, and there are schools of thought in psychoanalysis who use similar categories in adaptive forms to describe similar personality types in less extreme forms; the most common forms here are captured in the Psychodynamic Diagnostic Manual.

We're treading on some slippery ground here in my opinion, but as long as you consider this as just a model it can offer some useful aggregate insights.

Read more…

Graham Wheeler on

The "Tyranny" of Metrics

Photo by Carlos Muza on Unsplash

Jerry Muller recently wrote a popular book titled "The Tyranny of Metrics". He makes a number of good arguments for why metrics, if not used properly, can have unintended consequences. For example, the body count metric that the US military optimized for in the Vietnam war caused enormous damage while losing the hearts and minds of the populace and resulting in an ignominious defeat. Muller argues that metrics are too often used as a substitute for good judgment. The book is an excellent read.

So should we be ignoring metrics? Clearly not, but we need to be cognizant of what metrics we choose and how we use them. We should also distinguish between things which can meaningfully be measured quantitatively versus things that are more suited to qualitative analyses. And we should be wary of metrics being used as an instrument of control by those far removed from the "trenches", so to speak.

Assuming that we have a problem that can meaningfully be measured in a quantitative way, we need to make sure our metrics meet a number of criteria to be useful. Here are some guidelines:

  • metrics should be actionable: they should tell you what you should be doing next. If you can't answer the question of what you would do if a metric changed then its probably not a good metric.
  • metrics should be clearly and consistently defined: changing the definition of a metric can invalidate all the historical record and is very costly. Do the work upfront to make sure the metric is well-defined and measuring what you want, and then don't change the definition unless you can do so retroactively. Ensure that the metric is defined and used consistently across the business.
  • metrics should be comparative over time (so it is useful to aggregate these over fixed periods like week-over-week or monthly - but be cognizant of seasonal effects).
  • ratios are often better than absolute values as they are less affected by exogenous factors. Similarly, trends are more important than absolute values.
  • metrics are most useful if they are leading indicators so you can take action early. For example, Work in Progress (WIP) is a leading indicator, while cycle time is a trailing indicator. Think of a highway: you can quickly tell if it is overcrowded, before you can tell how long your commute has been delayed.
  • good metrics make up a hierarchy: your team's metrics should roll up to your larger division's metrics which should roll up to top-level business metrics.
  • metrics should be in tension: you should try to find metrics that cannot be easily gamed without detrimentally affecting other metrics. Let's say I have a credit risk prediction model and my metric is the number of customers I predict are not credit risks but that default on their debt. I can just predict that every customer is high risk and my metric will look great, but that's bad for the business. So I need another metric that is in tension with this, such as the number of good customers I deny credit to, which must be minimized. More generally in prediction models we use the combination of precision and recall.
  • metrics should capture different classes of attributes such as quality and throughput.
  • you need to know when a deviation in a metric is a cause for concern. A good general guideline is that if a metric deviates more than two standard deviations from the mean over some defined period, you should start paying attention, and more than three standard deviations you should be alarmed.

Thanks to Saujanya Shrivastava for many fruitful discussions over our metrics from which these guidelines emerged.

Graham Wheeler on

Managing Engineering and Data Science Agile Teams

It is very common in modern software engineering organizations to use agile approaches to managing teamwork. At both Microsoft and eBay teams I have managed have used Scrum, which is a reasonably simple and effective approach that offers a number of benefits, such as timeboxing, regular deployments (not necessarily continuous but at least periodic), a buffer between the team and unplanned work, an iterative continuous improvement process through retrospectives, and metrics that can quickly show whether the team is on track or not.

Data science work does not fit quite as well into the Scrum approach. I've heard of people advocating for its use, and even at my current team we initially tried to use Scrum for data science, but there are significant challenges. In particular, I like my Scrum teams to break work down to user stories to a size where the effort involved is under two days (ideally closer to half a day). Yes, we use story points, but once the team is calibrated fairly well its still easy to aim for this. Trying to do this for data science work is much harder, especially when it is research work in building new models which is very open-ended.

The approach I have taken with my team is an interesting hybrid that seems to be working quite well and is worth sharing.

Read more…

Basic Machine Learning with SciKit-Learn

This is the fourth post in a series based off my Python for Data Science bootcamp I run at eBay occasionally. The other posts are:

In this post we will look into the basics of building ML models with Scikit-Learn. Scikit-Learn is the most widely used Python library for ML, especially outside of deep learning (where there are several contenders and I recommend using Keras, which is a package that provides a simple API on top of several underlying contenders like TensorFlow and PyTorch).

We'll proceed in this fashion:

  • give a brief overview of key terminology and the ML workflow
  • illustrate the typical use of SciKit-Learn API through some simple examples
  • discuss various metrics that can be used to evaluate ML models
  • dive deeper with some more complex examples
  • look at the various ways we can validate and improve our models
  • discuss the topic of feature engineering - ML models are good examples of "garbage in, garbage out", so cleaning our data and getting the right features is important
  • finally, summarize some of the main model techniques and their pros and cons

Read more…

Exploratory Data Analysis with NumPy and Pandas

This is the third post in a series based off my Python for Data Science bootcamp I run at eBay occasionally. The other posts are:

This is an introduction to the NumPy and Pandas libraries that form the foundation of data science in Python. These libraries, especially Pandas, have a large API surface and many powerful features. There is now way in a short amount of time to cover every topic; in many cases we will just scratch the surface. But after this you should understand the fundamentals, have an idea of the overall scope, and have some pointers for extending your learning as you need more functionality.


We'll start by importing the numpy and pandas packages. Note the "as" aliases; it is conventional to use "np" for numpy and "pd" for pandas. If you are using Anaconda Python distribution, as recommended for data science, these packages should already be available:

In [1]:
import numpy as np
import pandas as pd

We are going to do some plotting with the matplotlib and Seaborn packages. We want the plots to appear as cell outputs inline in Jupyter. To do that we need to run this next line:

Read more…

Using Jupyter

This is the second post in a series based off my Python for Data Science bootcamp I run at eBay occasionally. The other posts are:

Jupyter is an interactive computing environment that allows users to create heterogeneous documents called notebooks that can mix executable code, markdown text with MathJax, multimedia, static and interactive charts, and more. A notebook is typically a complete and self-contained record of a computation, and can be converted to various formats and shared with others. Jupyter thus supports a form of literate programming. Several of the posts on this blog, including this one, were written as Jupyter notebooks. Jupyter is an extremely popular tool for doing data science in Python due to its interactive nature, good support for iterative and experimental computation, and ability to create a finished artifact combining both scientific text (with math) and code. It's easiest to start to understand this by looking at an example of a finished notebook.

Jupyter the application combines three components:

Read more…

The 5-Factor Model of Personality

Shankar Vedantam has a great NPR show/podcast, "The Hidden Brain", and occasional appearances on NPR's All Things Considered. In December he had a show on Evaluating Personality Tests. It was enjoyable, especially the Harry Potter Sorting Hat references, but I felt it was a missed opportunity because of the focus on Myers-Briggs, and the fact that he mentioned the Big-5 model only in passing.

In fact, Myers-Briggs is not taken very seriously in the psychology world, and Vedantam surprised me with spending so much time on it, given his show's focus on research in psychology. On the other hand, the Big-5 model is taken quite seriously, with many studies and papers based on it and evaluating it in various contexts (take a look, for example, at the Oxford University Press book I link to at the end of this post).

In the short form NPR segment, this was the section on Big-5 in its entirety:

VEDANTAM: Many personality researchers put greater stock in a test known as the Big Five [vs Myers-Briggs]. Grant says the Big Five has lots of peer reviewed data to back it up.

GRANT: We can predict your job performance, your effectiveness in a team with different collaborators, your likelihood of sticking around in a job versus leaving as well as your probability of your marriage surviving, depending on the personality fit between you and your spouse.

Read more…

A Python Crash Course

I've been teaching a crash course in data science with Python, which starts off with learning Python itself. The target audience is Java programmers (generally senior level) so its assumed that things like classes and methods are well understood. The focus is mostly on what is different with Python. I teach it using Jupyter notebooks but the content is useful as a blog post too so here we go.

The other parts are:


Python's Origins

Python was conceived in the late 1980s, and its implementation began in December 1989 by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a successor to the ABC language. It takes its name from Monty Python's Flying Circus.

Python is a dynamic language but is strongly typed (i.e. variables are untyped but refer to objects of fixed type).

Read more…

Blogging again

Well, it's been quite a while since I last blogged. My Zite project is not dead; it's actually up and running well as a personal aggregator but not ready for multi-user access, and I'm not sure when it might be. But I've been feeling a bit of an itch to start blogging again so here goes.

I have some material already lined up: I've been teaching an introductory data science bootcamp at work and thought the notebooks from that could be useful blog posts in and of themselves. So I'll start slowly publishing those while I write some new material. I'm also going to expand the scope of this blog; I'll still cover some tech topics but I',m going to fold in the content from my dormant math blog and retire it; this may inspire me to do some math blogging again. And I'll be throwing in some stuff on management and psychology too. So this will be a mishmash living up to the random forest name. I'll use categories to make it more accessible for those only interested in specific topics.

More soon!

Graham Wheeler on