Software Project Management: Hall of Shame?
If one goes by the widely acclaimed Chaos reports released by the Standish Group, software projects seem to have a dubious record. In 1995, the report highlighted that only 16% of the IT projects were completed on time, within budget with original scope. In 2006, 12 years after the first report was published, another report indicated that the number of successful projects increased to 35%. In comparison, the number looks like a significant improvement but in absolute perspective, it means that two out of every three projects take longer than expected, cost more than estimated, lack required functionality or are never completed. While there has been some criticism of the Chaos report on the numbers published, the transparency of sampling methods followed. But, I am sure, most would agree to the key message behind the report that when it comes to software projects, it is more likely to fail in the areas of schedule and budget, than succeed.
One of the main reasons for failure is attributed to estimation failures. In his widely acclaimed book, The Mythical Man-Month, Frederick Brooks claims that most software projects have gone awry for lack of calendar time than for all causes combined. It is no surprise thus that the two most common reasons one hears of when a project fails are “The estimates were aggressive to start with” or “We messed up our estimates”. If we could lay hands on an estimation tool which is able to estimate accurately and we have the courage to stand by it, then our problems would be resolved. The solution looks simple and can be easily implemented, if only we did not have an impediment called the customer, who always forces us to accept aggressive timelines.
But wait, before we decide to surrender to this conflict forever, how do we know for sure that the reason for a software project failure is “inaccurate” or “unrealistic estimations”? When I ask this question, I get an interesting answer – the fact that we are delayed clearly proves that the initial schedules or estimations were not correct. This seems to be a trap of circular logic or as logicians would say a tautological argument error. It can also be that the time WASTED in a chain of predecessor activities could have been enough (if it was not wasted) to absorb the extra scope discovered in a task during execution. So there could be an alternative hypothesis that most projects are delayed because significant time is also wasted in most projects and there is nothing left to absorb the unavoidable uncertainties. If I ask a project manager, he will agree to the hypothesis of significant waste of time as long as we are referring to someone else’s project. In his projects, he will talk about how his resources are stretched beyond all limits. So where is the time actually wasted? It must be the erroneous estimations which lead to project schedule failures. I believe, this assumption is the only reason for the unending quest for an accurate estimation tool.
In the last few decades, many sizing and effort estimation methodologies and models have been invented, each one promising to be more accurate and more objective than before. Despite many approaches, there seems to a consensus that estimation based on past data is more objective than one based on individual’s perception. After all this effort, we are still at a level where significant projects are delayed. If we go back to the Chaos report again, we will notice that the improvement in performance is primarily due to the popularity of agile methodologies and hence many projects in the year 2006 were smaller in size as compared to the ones in 1994ii. Interestingly, the agile school of thought does not believe in upfront requirements freezing and predictability. Hence, I assume they are not so bothered with the problem of accurate estimation. They seem to have reconciled that it is impossible to bring about upfront predictability, so the key is to make workable software available to customer ASAP and keep on improving the software.
There are two conclusions which we can make
- All efforts of improving estimation seem to have not helped
- Agile techniques provide a way out to improve the situation
The key question is; can we implement the transparency required for agile implementation in all environments?
Agile requires lot of trust between the development team and the customer. The customer has to be comfortable working without upfront predictability. He is getting some usable software much earlier than otherwise. But then how does the customer believe that the reason some features could be pushed out of the release schedule is due to real issues and not because of sloppy work of the developers? The distrust becomes significant, when there is a supplier- client relationship guided by a contract. Such relationships require upfront predictability, particularly in fixed priced projects. So the need for upfront predictability is required for software projects and the harsh reality is that we have not seen any improvement despite continuous efforts to improve on estimation techniques. We are still waiting for that silver bullet which will give us the “accurate forecasting” and solve the problem of predictability.
Before we go further, let us analyse the inherent problems with estimations.
If we look at all estimation methods, there are essentially the following steps
- Sizing of work (like function points or lines of codes) factored with an assumed level of complexity
- Derived calculation of effort required to complete the work (with assumed rate of past productivity)
- Calculate the overall timeline based on available manpower.
- Use personal judgement, negotiate and agree on a final number.
Let us analyse by answering a few fundamental questions.
Can estimations be objective?
Of late, there have been some voices against the claim of objectivity in estimation. Most managers would admit in person that software estimation is a matter of personal opinion. JP Lewis in his article published in the July 2001 issue of ACM Software Engineering Notes, proved that it is impossible to objectively estimate the algorithmic complexity ( defined as the shortest program required to produce a string) of a software program. Software development time is heavily dependent on algorithm complexity, so by definition, software estimation will remain a matter of personal opinion. He argued that there cannot be any objective method for arriving at a good estimate of the complexity of a software development task prior to completing the task.
Estimation also loses out on objectivity, when it is used as commitment, where other factors like level of paranoia of the developer, his negotiation ability, and so on get added as buffers into the estimation.
The fallacy: Better the estimation at task level, better it is for the overall project
Most estimation methods ignore the impact of structural dependencies on the overall estimation of a project. When many tasks are in series, the variability of total estimation for chain as a whole improves as compared to variation at task level. More the number of tasks in series less is the variability of the estimation of the chain as a whole. We should see the effect of gains nullifying the delays. (For the statistically inclined, let us simulate the impact. If there are 5 tasks performed as 5 handovers between resources and each task has a variability of 10+2 days (80% of runs fall in this range, with variability of 20%, then 80% of the runs for the chain as a whole varies between 50+5 days. The variability improves from 20% to 10%. Or in other words the width of distribution becomes narrower for a chain as a whole as compared to the individual tasks. This is a significant improvement in variability). Well we all know about it, fluctuations should average out.
On the other hand, at each integration point,the worst delays are passed on; the variability deteriorates and becomes skewed away from the median value. (Again for the ones, who are statistically inclined and want the proof, let us simulate the impact.
If there are 5 tasks in parallel each of duration of 10+ 2 days integrating together with one task of duration 10+ 2 days, then for chain as a whole, 80% of simulated runs lie beyond 20 days, much beyond our expectation of average of 20 days for chain as a whole.) We all know probability works against us at integration points. But we tend to get amnesia about the devastating impact of probability at integration points, while estimating – we just add up at integration points for the chain as whole without considering the impact of integration.
So, focusing on improving estimates at task level and then adding up does not help as the dependencies have a major impact on the overall variation. A project with too many integrations or integrations within integrations is very different from a project with a single integration. None of the estimation methods focus on this reality.
Accuracy of estimates: is it not an oxymoron?
The word accuracy is used in the context of measurements. It is defined as deviation from the true value. The assumption is that there exists a true value, a universal truth which is independent from the act of measurement. If there is an observer’s effect (act of observer changes the observation), then the concept of a true value is nothing but an illusion. Do estimates suffer from the observer’s effect? Ask any software manager and he will talk about how work expands to fill up the allotted time (Parkinson’s effect) or delaying the start of the task close to milestones (student’s syndrome). When task estimates suffer from observer’s bias, what is meaning of a true value?
When managers talk about accurate estimate, they expect a deterministic single point estimate, as if we are counting chairs or measuring distance. When a statistician asks for an estimate, he expects a 50% probability estimate, while a manager expects a realistic estimate (high probability estimate). In an uncertain environment, the difference between 90% probability estimate and 50% estimate is significant (can be defined as buffer in the task). These realistic single point realistic estimates used as commitments turn into self fulfilling prophecy in execution. The past data bank of execution represents this self fulfilling prophecy.
So we are reaching a conclusion:
- Estimates will remain a matter of personal opinion, particularly when they are used as commitments
- Efforts at improving estimates at task level and then just adding it up does not help because structural dependencies have a dramatic effort on the variability of the overall estimate of the project
- Estimates and accuracy do not go together.
So we are back to square one, we need predictability and the way forward is not improvement in the “accuracy” of estimations. This is so depressing!
Let us not give up so easily. Some more questioning of fundamentals might offer us a direction to the solution.
What controls project durations – elapsed time or efforts?
If we take a feature and analyse the time from the first task on the feature (design) till the end when the feature is cleared by quality (call it the feature lead time), the actual value -add efforts would be a small percentage of the total lead time of the feature (around 30 to 40%). The rest of the time of the feature lead time would be split between waiting time or avoidable work expansion. The waiting time could include the period when the developer dropped work on the specific feature to attend to another high priority work or the developer waited for inputs /decision to continue on the feature or the feature waiting in queue in front of testing and so on. Similarly, work expansion is seen when work is started without complete specs and subsequent rework or work is done without optimal resources or resources doing unnecessary polishing till the committed date.
Our experience tells us that most project managers have no clue about such significant waste of time in the feature lead time. Their intuition is not developed to understand the waste. The resource focus and project monitoring systems oriented towards efforts tracking blinds them from the glaring waste. The resources have a voice and a living form so we get to understand their utilization. Sad enough we do not have projects or features shouting from the roof top, that most the time they are not being worked upon.
To imagine the amount of time wasted in a feature or project lead time, think of the situations when you spent hours waiting for the doctor and in the end, the actual value add time was 10 to 15 min. You were the feature and the doctor was the resource. The doctor feels he is very busy but the patients think otherwise. The primary reason why managers do not get a “feel” of wastages is because they view the projects from a resource point of view and from a resource perspective they have been very busy. In fact in an environment where resources are highly stretched, the time wasted on projects is very high.
So when the actual value add efforts are a small percentage of the total lead time and elapsed time is what controls the project durations, then why are we so obsessed in improving the accuracy of estimates? If you have doubts over the ratio of the actual value add time to a feature lead time, then collect data from your own environment to validate the hypothesis. (If you do really venture out to collect data, then do not include in value add time, the time wasted due to the “tread mill” syndrome; too much rework without any meaningful progress. Work started without finalization of specs is one of the key reasons behind the tread mill syndrome.)
Are we barking up the wrong tree?
If actual value-add efforts are a small percentage, and then we can focus on reducing the wastage rather than improving the “accuracy” of the estimates. The best way to reduce the wastage is to focus on execution.
But before we start on the execution process, we still have to provide predictability to the client. Let us go back to what engineering designers did when they had to guarantee performance in an environment of uncertainty – they used the factor of safety. The factor of safety was explicit and nobody was defensive about it. How many of you will fly on a plane designed with zero factor of safety? The problem with projects is the same – someone wants commitment of the project end date despite uncertainties. But strange enough, we are defensive about using the factor of safety (also called the buffer) in projects. People consider it as the ultimate sin. Since nobody likes the word buffer, buffers are hidden inside the task durations so that task level commitments can be met despite uncertainties of the task. Do not try to find out how much is the buffer – you will never get the answer. Admission to the sin is a greater sin. But you can be sure of one relation, higher the level of uncertainty, bigger is the buffer hidden in the task. In most tasks, buffer is around half of the committed task durations (buffer can be defined as the difference between the unrealistic 50% probable durations and the realistic 80% probability estimates). Since what the customer is really interested is the project end date (or getting the desired features on the release date), let us remove buffers from tasks (make them unrealistic) and put it at the end as project buffer. Similarly, we can take buffers from the feeding paths and put a feeding buffer at the point where integration happens. The feeding buffers reduce the damaging effect of probability at integrations, while project buffer protects the project as a whole from uncertainties. The aggregation of buffers provides the opportunity to work with much less buffer than when they are split and hidden in the tasks.
So now we have a plan with unrealistic estimation at the task level and buffers to protect the paths as a whole. We do not need to spend too much time in trying to estimate the unrealistic or failure prone durations – we can take realistic estimates and just cut it by half. For the realistic estimate, use whatever method suits you. As discussed, since the values add time is very short; do not spend too much time struggling for the accurate estimate. Use whatever method that suits you. If you have a debate on estimation numbers, toss a coin to settle.
The focus is required in execution. The challenge in execution is as follows:
- How do we ensure we have gains being passed along a chain of tasks, nullifying the delays along the way?
- How do we reduce the structural integration risk – which means all the feeding paths are finished ahead of time before their turn comes up for integration with the longest path? ( or in other words we do not allow probability to work against us at the integration point)
Or in other words, how do we prevent the spiking of efforts, observed close to the releases?
Execution Rule no 1: Eliminate bad multi-tasking
When there are too many features in progress simultaneously, bad-multitasking creeps in. Bad multitasking not only leads to resource capacity wastages (in terms of additional setups) but also elongation of lead time as features wait when resources switch across work fronts. Decision making is also delayed as management band-width is exhausted. Software environments are prone to bad-multitasking because many features are worked upon in parallel with few resources and with one assumption in mind – early start leads to early finish. (To understand bad multi-tasking environment, let us go back to the analogy of the doctor. Let us imagine a situation where the doctor is not an expert and needs help, time to time, from another senior doctor, who in turn also has his own queue. Of course, the junior doctor may not get an immediate response, so he shifts to another patient to occupy the potential waiting time. The frequent multi-tasking leads to additional set ups and loss of doctor’s capacity, while lead time of each patient goes up).
The way out is focus on flow. It does not matter how soon and how many work fronts we start working on immediately. What matters is the rate of completions. Flow can be achieved if there is a control system which prevents the system from clogging. The way out is to ensure a constant WIP of features where the overall system does not deteriorate to bad-multitasking. Constant WIP management paradigm could mean time to time starvations when issues are blocking flow. The focus in such situations is to resolve issues and improve flow rather than starting more work fronts. This is very different from the current paradigm where work on features is started as soon as possible and many features are worked upon simultaneously with too few resources. If you are suffering from bad-multitasking then the amount of WIP is very high. The best starting point, in most environments, is to cut the WIP by half.
Since we have to ensure closure of a feature to allow the next one in, it is important to run frequent system level testing (daily builds) to detect failure information much earlier( as opposed to running a big-bang test before release) This frequent testing is a necessary condition to follow the rule of constant WIP environment. Frequent testing also has an additional advantage of reducing the overall population of potential errors, which in turn prevents piling of uncertainties close to the release date and improves predictability.
Execution Rule no 2: Full Kit
Implementation of rule no 1, allows late start of many features as they wait for their turn to enter the system. The user specs for these features should be kept ready so that only features with full-kit enter the system. No work is allowed without a complete kit. The importance of full kit is understood by many but in most environments, work is started without complete specs leading to rework (tread mill syndrome), interruptions and associated multi-tasking. Ideally the resources dealing with work of full kit has to be separate from the resource directly working on the feature.
Execution Rule no 3: Active Supervision
Since task durations in the plan are unrealistic, we cannot use intermediate task milestones as the control point. Nobody will accept commitment around these durations (unless one is a masochist) So managers do not have the comfort of taking eyes off till close to milestone. They have to check the flow of work daily and ensure that issues are resolved and the preparation of upcoming tasks is complete. Daily estimation (not commitment) of remaining duration is used to manage tasks. The same information is also used to check impact on buffers and prioritize tasks based on the impact on buffers. Implementation of this rule brings about a cultural change, where priorities are not ad hoc and thee focus shifts from finding the scapegoat, who can take the blame for missed milestone to faster issue resolution.
Execution rule no 4: Use the right project control measures
The above processes cannot be implemented along with conventional measures of schedule and effort variance (or their complicated cousins called the earned value measures). These measures will create resistance against the aggressive task durations. Wrong measures drive wrong behaviours. Due to these erroneous measures, a person can focus on easy tasks to show “good progress” as per measures rather than working on issue resolution on the longest path. No doubt in most projects, 90% of work gets completed in X time and remaining 10% also gets completed in X time. Project look fine and then suddenly management detects the bad news close to the end when nothing much can be done. If management detects delays so late, do they have any meaningful control on projects? These measures are like the emperor’s new clothes. (None other than the child laughs at the undress state of the emperor for the fear of being seen as stupid.)
The new measures should be work completion along the longest chain (calculated using the remaining duration estimate) as compared to the buffer penetration. This measure forces early detection of bad news and timely intervention before it is too late. Top management gets real control on projects as they can intervene before it is too late.
The method described is the Critical Chain Project Management, invented by Dr Eli Goldratt in the 90s. CCPM has been implemented in wide range of industries with remarkable successes. In software domain, organizations, have reported productivity jump of more than 50% while improving the on-time delivery of software development projects. The release of capacity helps in completing more features in the same release. The results are so dramatic that people either disbelieve or explain it off as “they were so lousy to begin with – we can’t be so bad”.
Any software environment, where primary focus of management is resource utilization and projects are controlled with measures of task milestone adherence or efforts variance has the potential of such dramatic results.
Delivering much more than initially promised opens up a new set of problems. What do you do when the customer finds you repeatedly beating your own estimates, so convincingly? What happens to the time and material contracts, with such dramatic jump in productivity? Looks like, we have solved one problem to create another.
This article needs a sequel for sure.