Hut 8 Labs

Hut 8 Labs

The Blog

Coding, Fast and Slow: Developers and the Psychology of Overconfidence

I’m going to talk today about what goes on in inside developers’ heads when they make estimates, why that’s so hard to fix, and how I personally figured out how to live and write software (for very happy business owners) even though my estimates are just as brutally unreliable as ever.

But first, a story.

It was the <insert time period that will not make me seem absurdly old>, and I was a young developer 1. In college, I had aced coding exercises, as a junior dev I had cranked out code to solve whatever problems someone specified for me, quicker than anyone expected. I could learn a new language and get productive in it over a weekend (or, so I believed).

And thus, in the natural course of things, I got to run my own project. The account manager explained, in rough form, what the client was looking for, we talked it out, and I said, “That should be about 3 weeks of work.” “Sounds good,” he said. And so I got to coding.

How long do you imagine this project took? Four weeks? Maybe five?

Um, actually: three months.

I have vivid memories of that time — my self-image had been wrapped up in being “a good programmer”, and here I was just hideously failing. I lost sleep. I had these little panic attack episodes. And it just Would Not End. I remember talking to that account manager, a pit in my stomach, explaining over and over that I still didn’t have something to show.

During one of those black periods, I resolved to Never Be That Wrong Again.

Unfortunately, over the course of my career, I’ve learned something pretty hard: I’m always that wrong.

Actually, I’ve learned something even better: we’re all that wrong.

Recently, I read Daniel Kahneman’s Thinking, Fast and Slow, a sprawling survey of what psychology has learned about human cognition, about its marvelous strengths and its (surprisingly predictable) failings.

My favorite section was on Overconfidence. There were, let us say, some connections to the ways developers make estimates.

Why You Suck at Making Estimates, Part I: Writing Software = Learning Something You Don’t Know When You Start

First off, there are, I believe, really two reasons why we’re so bad at making estimates. The first is the sort of irreducible one: writing software involves figuring out something in such incredibly precise detail that you can tell a computer how to do it. And the problem is that, hidden in the parts you don’t fully understand when you start, there are often these problems that will explode and just utterly screw you.

And this is genuinely irreducible. If you do “fully understand” something, you’ve got a library or existing piece of software that does that thing, and you’re not writing anything. Otherwise, there is uncertainty, and it will often blow up. And those blow ups can take anywhere from one day to one year to beyond the heat death of the universe to resolve.

E.g. connections to some key 3rd party service turn out to not be reliable… so you have to write an entire retry/failure tracking layer; or the db doesn’t understand some critical character set encoding… so you have to rebuild all your schemas from scratch; or, the real classic: when you show it to some customers, they don’t want exactly what they asked for, they want something just a tiny bit different… that is much harder to do.

When you first hit this pain, you think “We should just be more careful at the specification stage”. But this turns out to fail, badly. Why? The core reason is that, as you can see from the examples above, if you were to write a specification in such detail that it would capture those issues, you’d be writing the software. And there is really just no way around this. (if, as you read this, you’re trying to bargain this one away, I have to tell you — there is really really really no way around this. Full specifications are a terrible economic idea. Some ways below I’m going to lay out better economic choices)

But here’s where it gets interesting. Every programmer who’s been working in the real world for more than a few months has run into the problems I’m describing above.

And yet… we keep on making these just spectacularly bad estimates.

And, worse yet, we believe our own estimates. I still believe my own, in the moment I make them.

So, wait, am I suggesting that all developers somehow fall prey to the same, predictable errors in thinking?

Yep, that’s exactly what I’m suggesting.

Why You Suck at Making Estimates, Part II: Overconfidence

Kahneman talks at some length about the problem of “experts” making predictions. In a shockingly wide variety of situations, those predictions turn out to be utterly useless. Specifically, in many, many situations, the following three things hold true:

1- “Expert” predictions about some future event are so completely unreliable as to be basically meaningless

2- Nonetheless, the experts in question are extremely confident about the accuracy of their predictions

3- And, best of all: absolutely nothing seems to be able to diminish the confidence that experts feel

The last one is truly remarkable: even if experts try to honestly face evidence of their own past failures, even if they deeply understand this flaw in human cognition… they will still feel a deep sense of confidence in the accuracy of their predictions.

As Kahneman explains it, after telling an amazing story about his own failing on this front:

The confidence you will experience in your future judgments will not be diminished by what you just read, even if you believe every word.”

Interestingly, there are situations where expert prediction is quite good — I’m going to explore that below, and how to use it to hack your own dev process. But before I do that, I want to walk through some details of how the flawed overconfidence works, on the ground, so you can maybe recognize it in yourself.

What It Feels Like To Be Wrong: Systems I & II, and The 3 Weeks and 3 Months Problem

In Thinking Fast and Slow, Kahneman explains a great deal of psychology as the interplay between two “systems” which govern our thoughts: System I and System II. My far-too-brief summary would be “System II does careful, rational, analytical thinking, and System I does quick, heuristic, pattern matching thinking”.

And, crucially, it’s as if evolution designed the whole thing with a key goal of keeping System II from having to do too much. Which makes plenty of sense from an evolutionary perspective — System II is slow as molasses, and incredibly costly, it should only be deployed in very, very rare situations. But you see the problem, no doubt: without thinking, how does your mind know when to invoke System II? From this perspective, many of the various “cognitive biases” of psychology make sense as elegant engineering solutions to a brutal real-world problem: how to apportion attention in real time.

To see how the interplay between Systems I & II can lead to truly awful, and yet honestly-believed estimates, I’m going turn the mic briefly over to my friend (and Hut 8 Labs co-conspirator) Edmund Jorgensen. He described it to me in an email as follows:

When I ask myself “how long will this project take” System I has no idea, but wants to have an idea, and translates the question. Into what? I suspect it’s into something like “how confident am I that I can do this thing,” and that gets translated into a time estimate, with some multiplier that’s fairly individual (e.g. when Bob has level of confidence X, he always says 3 weeks; when Suzy has level of confidence X, she always says 5 weeks).”

Raise your hand if you’ve gradually realized you have two “big” time estimates? E.g. for me it’s “3 weeks” and “3 months”. The former means “that seems complex, but I basically think I see how to do it”, and the latter means “Wow, that’s hard, I’m not sure what’s involved, but I bet I can figure it out.”

Aka, I think Edmund is totally right.

(For those playing along at home: my “3 week” projects seem to take 5-15 weeks, my “3 month” projects usually take 1-3 years, on the rare event that someone is willing to keep paying me).

Alright, So Let’s Stop Being So Overconfident!

You might be thinking at this point: “Okay, I see where Dan is going: we have to approach these estimation challenges in some manner that engages System II instead of System I. That way, our careful, analytical minds will produce much better estimates.”

Congratulations, you’ve just invented Waterfall.

That’s basically the promise of the “full specification before we start coding” approach: don’t allow the team to make intuitive estimates, force everyone to carefully engage their analytical minds and come up with a detailed spec with estimates broken down into smaller pieces.

But that totally fails. Like, always.

The real trouble here is the interplay between the two sources of estimation error: the human bias towards overconfidence, and the inherent uncertainty involved in any real software project. That uncertainty is severe enough that even the careful, rational System II is unable to come up with accurate predictions.

Fortunately, there is a way to both play to the strengths of your own cognition and also handle the intense variability of the real world.

First, how to play to your mind’s strengths.

When Experts Are Right, and How To Use That To Your Advantage

Kahneman and other researchers have been able to identify situations where expert judgment doesn’t completely suck. As he says:

To know whether you can trust a particular intuitive judgment, there are two questions you should ask: Is the environment in which the judgment is made sufficiently regular to enable predictions from the available evidence? The answer is yes for diagnosticians, no for stock pickers. Do the professionals have an adequate opportunity to learn the cues and the regularities?”

An “adequate opportunity” means a lot of practice making predictions, and a tight feedback loop to learn their accuracy.

Now, 6-18 month software projects just miserably fail on all these criteria. As I’ve discussed above, the environment is just savagely not “regular”. Plus, experts don’t get the combo of making lots of predictions and getting rapid feedback. If something is going to take a year or more, the feedback loop is too long to train your intuition (plus you need a lot of instances).

However, there is a form of estimation in software dev that does fit that bill — 0-12 hour tasks, if they are then immediately executed. At that scale, things work differently:

  • Although there is still a lot of variability (more on that below), there is some real hope of “regularity in your environment”. Two four-hour tasks tend to have a lot more in common than two six-month projects.

  • You can expect to make hundreds of such estimates, in the course of a couple of years.

  • You get very quick feedback about your accuracy

The highest-velocity team I’ve ever been on did week sprints, and broke everything down to, basically, 0, 2, 4, or 8 hours (and there was always some suspicion about the 8 hour ones — like, we’d try pretty hard to break those down to smaller chunks). We estimated those very quickly and somewhat casually — we didn’t even use a Planning Poker style formalism.

At that point, you’re using the strengths of System I — it has a chance to get trained, it sees plenty of examples, and there are meaningful patterns to be gleaned. And, thanks to the short sprint length, you get very rapid feedback on the quality of your estimates.

Wait, Wait, Wait, Let’s Just Make a Thousand 4 Hour Estimates!

How can I both claim that you can make these micro-scale estimates, but somehow can’t roll them up into 6-18 months estimates? Won’t the errors average out?

Basically, although I think the estimates at that scale are often right, when they’re wrong, there’s simply no limit to how wrong they can be. In math-y terms, I suspect the actual times follow a power law distribution. And, power law distributions are notable for having no stable mean, and infinite variance. Which, frankly, is exactly how those big waterfall project estimates feel to me.

You might be thinking: how on earth could something you expected to take 4 hours take a month or two?

This happens all the time: you go to take some final step in something and discover some hideous blocker which completely changes the scope. E.g. at a recent startup, in trying to eliminate single points of failure from our system, we went to put a load balancer in front of an IMAP server we had written. So that, when one server machine died, the load balancer would just smoothly fail over to another box, and customers would see no impact.

And that seemed like a 4-hour-ish task.

But when we went to actually do it, we realized/remembered that the IMAP server, unlike all the HTTP servers we were so used to, maintained connection state. So if we wanted to be able to transparently fail over to another server, we’d have to somehow maintain that state on two servers, or write some kind of state-aware proxying load balancer in front of the IMAP server.

Which felt like about a 3-month project to us.2

And there is the other reason that short sprints are an absolutely key piece of all this: they place a hard limit on the cost of a horrifically bad estimate.

Are We All Just Screwed?

So what do we do? Just accept that all our projects are doomed to failure? That we’ll have poisoned relationships with the rest of the business, because we’ll always be failing to meet our promises?

The key is that you first accept that making accurate long-term estimates is fundamentally impossible. Once you’ve done that, you can tackle a challenge which, though extremely difficult, can be met: how you can your dev team generate a ton of value, even though you can not make meaningful long-term estimates?

What we’ve arrived at is basically a first-principles explanation of why the various Agile approaches have taken over the world. I work that in more detail in my next post: “No Deadlines For You! Software Dev Without Estimates, Specs or Other Lies”.

(Join in the conversation on Hacker News and Slashdot.)

  1. (the band <insert dated music reference> was on the radio, and everyone was talking about <some long-gone tv show>). 

  2. If you’re thinking “Wait, 3 months, like one of your 3 month estimates?”, I have no idea what you’re talking about.