Software Effort Mis-Under-Estimations (part 1)Posted: March 12, 2011
Kids ask parents “Are we there yet? Are we there yet? …”
Programmers are asked “How long will it take? … Are you done yet? Are you done yet? …”
Alas for programmers it can be difficult to answer either question reliably or accurately. To paraphrase a recent US President …
Programmers tend to misunderestimate tasks.
Aside: I think that “misunderestimate” belongs I’m in the English language. Apparently, I’m not alone.
Lexical promotion aside, there are both practical and behavioral dimensions that impact the process of estimation resulting in misunderestimations and sometimes misoverestimations.
In part 1 of this post I consider the practical dimensions that can make estimation difficult. I look, in particular, at the unreliability introduced by complexity and unfamiliarity of tasks.
Having dealt with that, the next post delves into the behavioral dimensions: pride, anxiety about perception, adverse responses to pressure, lack of desire to work on tasks, and other drivers that lurk beneath.
(In)Accuracy of Estimates
Years ago I saw a graph that looked something like the one below. It compares an effort estimate for a software task to the time it actually takes — a probabilistic distribution. The graph captures the common experience:
- On average programming tasks take longer than expected.
- A blow-out in time is much more common than a major time savings. Also, whereas saved time is capped there’s no limit to a blow-out.
- The chance of an estimate being accurate can be quite low.
As a response to this I’ve heard many programmers ask:
“What is the point of making an estimation if it’s usually wrong?“
(This question is especially pointed if the programmer expects to be chastised for being inaccurate.)
There are many good reasons for gathering and using estimates — even inaccurate ones. Here’s a few that come to mind (a list that no doubt misses many good reasons).
- If you don’t provide an estimate for your work then somebody else will and you may be held to it no matter how unrealistic.
- Employee costs are the biggest cost for most software projects. Estimates are important in determining whether a project should progress.
- Estimates help coordinate teams of programmers.
- Estimates help coordinate people working with, or dependent upon programming; testers, users/customers, sales, marketing etc.
Let’s divide estimates into three crude categories based on familiarity: how familiar is the task to the programmer(s) that will work on it.
- Very familiar: something very similar to work that you’ve done before
- Somewhat familiar: something roughly like work that you’ve done before (or perhaps similar to work that a colleague has done before)
- Very unfamiliar: something new.
The graph below shows roughly the way in which decreasing familiarity makes it increasingly more difficult to estimate the task.
- Very familiar is shown as the thin blue line. It should be possible to provide a reliable estimate since something similar has been done before. Blow-outs should be uncommon. Statistically, we should be accurate most of the time; the average gets close to the estimate with greater reliability (lower standard deviation).
- Very unfamiliar is shown as the thick blue line. There’s a greater chance of underestimating the effort (the average will tend to be above the estimate). Overall the estimates will be less reliable (higher standard deviation) and there’s a much greater chance of a blow-out. Alas, in my experience the variability of “very unfamiliar” estimates is much worse than the graph suggests.
There are two very different effects in play. Obviously, familiarity provides the opportunity to learn and improve estimates. But more importantly, the familiarity of a task reduces unknowns. Unknowns kill predictability in software. Unknowns might include lurking bugs, architectural limitations, ambiguous requirements and specs, changes in customer needs, and many others.
The more complex a task is the more unreliable the estimates are likely to be. While this seems pretty obvious it is often disregarded in making and combining estimates.
Again, let’s rank complexity on a three point scale of low, medium and high complexity. The estimate for a low complexity task is far more likely to be accurate than the estimate for a high complexity task.
The graph above applies equally to complexity: easy = thin line to complex = thick line. And once again, the graph understates the extent to which complex tasks are misunderestimated creating the potential for huge blow-outs.
Sometimes it is worth stating the obvious. Complexity is in the eye of the beholder. The complexity of the task depends upon the developer(s) executing it. My observation is that matching tasks to programmers can make a huge difference — a switch between programmers could easily make a x10 difference in the time to complete.
The time a task takes is not the same as complexity but it’s easy to mix them up.
I’ve been arguing with myself about whether longer tasks are more prone to misunderestimation or whether it just seems that way. I lost the argument.
What’s worse? (a) 2 hour task takes 3 hours; (b) 2 day task takes 3 days; (c ) 2 week task takes 3 weeks. Well, it depends on the context … but usually more delay = more pain.
What’s worse? (a) a 2 week task taking 3 weeks or (b) five 2 day tasks each taking 3 days. The delay is the same but 5 misunderestimations seems worse.
Decomposition of long tasks into sub-tasks is generally good practice. My preference is to avoid sub-tasks longer than about 2 days but not less than half a day. (I can’t offer any objective support for this.)
With decomposition, beware over-confidence in estimates unless you have looked hard for all the sub-tasks, looked at the familiarity and complexity of each them and looked for what could go wrong.
But decomposition can be taken too far. For example, I’ve seen managers trying to break down a month-long programming task into 1 hour chunks and then scheduling them. That’s 160 tasks to identify and estimate! This estimation takes a lot of effort and yet creates unrealistic schedules and pressures. What comes to mind is the engineer’s injunction:
It is better to be approximately right than precisely wrong.
Other Influences on Effort
In writing this post I’ve been reading through some of the formal studies on software effort estimation (there’s plenty of them). See, for example:
- Center for Systems and Software Engineering, COCOMO II (in case this abbrev is not obvious, that’s COnstructive COst MOdel II)
- Mythical Man Month (ISBN: 0-201-83595-9) is the book that got it started. See also the Wikipedia summary, a presentation, one of many reviews. [It's now sitting on my desk demanding to be read]
- A meta-review of summarizing multiple studies: “A Review of Surveys on Software Effort Estimation” by Molokken and Jorgensen and “A Review of Studies on Expert Estimation of Software Development Effort” by M. Jorgensen
- Wikipedia summary of software development effort estimation with numerous links
The research has some great observations and recommendations. As a quick take away, I’ll add the following factors to my top 3 of familiarity, complexity and time. Consider:
- Requirements for software quality, reliability, scale, performance etc.
- Quality of working environment: good equipment (esp. computers, screens, networks etc), appropriate tools (compilers, debuggers etc), lack of distractions and other practicalities
- Clarity of direction: requirements, coordination etc.
- Programmer competence, team effectiveness
- Team Size. See the “Mythical Man Month” [link] or Brook’s law that “adding staff to a late software project makes it later”.
- Scope creep
For the material I’ve read there’s reasonable attention to the factors that influence time to complete software tasks.
What I find helps greatly is to complement the estimate with a statement on the degree of unreliability of an estimate. Here’s a rough guideline. For each task in a project consider the following:
- How familiar is the task? Rank from 1 to 5 (very familiar to totally new)
- How complex is the task? Rank from 1 to 5 (easy to deeply complex)
- Add other factors that you suspect will introduce unknowns and add a penalty for each.
Now, write a program that adds these numbers. (Pessimists may prefer to multiply the numbers.) The higher the number the more likely that your estimate will be wrong. Next, do what you can do to reduce uncertainties.
I have two concerns to address about estimating and communicating uncertainty.
First, higher risk numbers will be used as an excuse for delays. I think the challenge is to be realistic about the uncertainties and work to reduce the risks (a big topic for another time).
Second, be careful when communicating uncertainty of schedules. Most people are not used to the uncertainty of complex software development and may be surprised by a wide estimated range. Communicating large uncertainty in a schedule can be mis-perceived uncertainty as evasiveness, incompetence or worse.
For now, that’s enough discussion of the practicalities of software effort estimation. It is but the tip of the iceberg but it does pave the way for part 2 where I will explore the behavioral and emotional dimensions that I think add the “mis” to “misunderestimation”. After all, that’s what the psygrammer blog is about.
[Update: the follow up post - Biases create Misunderestimations (part 2) - is now online. It looks at different emotional biases that affect software estimate accuracy: mostly misunderestimations, but a few misoverestimations too.]
Andrew H – Psygrammer