
When it comes to errors in human judgment, we tend to focus on bias, but noise is also a cause of error. Bias is the average of errors committed by a group. You need to know the right answer to know what the bias is. Bias is systemic and predictable. Noise, on the other hand, is variable amongst a group of judgements. You can tell there’s noise even if you don’t know the right answer. It’s unpredictable and not easily explained.
Noise is apparent in many aspects of life. Medical decisions (especially psychiatry), child custody hearings, forecasting, asylum decisions, job interviews, bail decisions, areas of forensic science (such as fingerprint analysis), and decisions to grant patents are all highly subjective and have more to do with who is deciding than the case they’re judging. Professionals considering the same cases vary widely. Even the same person deciding the same case at different times will sometimes give different answers.
Judges routinely give wildly different sentences for the same crime. How many years a criminal goes to prison has more to do with which judge you get than the crime committed or who the suspect is. Bias, such as racism, definitely plays a role, but noise does as well.
For example, judges give harsher sentences when they’re hungry. Judges also give harsher sentences the day after the local sports team loses. They’re more lenient to the defendant on the defendant’s birthday. Asylum seekers are less like to get asylum on hot days. Female judges tend to be more lenient as are judges appointed by Democrats. Judges from Southern states tend to be more harsh.
Noise, differences in judgment between employees, can cost insurance companies millions of dollars. However, in order to avoid disagreements, companies prefer the illusion of agreement, assuming claims adjusters and underwriters broadly make similar judgments, even though they don’t.
When wine experts judge the same wine twice in a blind taste test, they usually give it a different score. Doctors presented with the same case and fingerprint analysts likewise often reach different conclusions on different days. We all do it. None of us is consistent.
Mood affects judgement. Whether you’re in a good or bad mood affects how you answer the classic trolley problem and how gullible you are. Stress and fatigue play a role as well. Doctors are more likely to prescribe opiates when they’re worn out at the end of the day than when they’re rested first thing in the morning.
Studies have found college admission officers focus more on academic accomplishments on cloudy days and more on nonacademic attributes on sunny days. The stock market is also effected by the weather.
The order of cases matters as well. People try to avoid streaks, so judges are less likely to grant asylum to someone if they already granted asylum to the previous two cases. Baseball umpires and loan officers try to avoid streaks as well.
Social influence causes us to like things that are popular (like songs and movies). This is true in politics as well. A referendum in the UK with an initial burst of popularity becomes self-reinforcing. A proposal with little support in the first day is doomed.
Whoever speaks first in a meeting has a big influence on the group’s final decision, especially if they hold a powerful position in the company. The comment on a website that gets the first upvote will likely remain popular even hundreds of thousands of upvotes later. (Unfortunately, this book already had over 600 reviews on Goodreads by the time I reviewed it, so there’s pretty much zero chance my review will get enough likes to appear on the first page of reviews.)
There’s a phenomenon known as the wisdom of crowds effect. Individuals making their best guess (on a non-expert question like how many beans are in a jar) will vary wildly, but the average of their answers tends to be close to the right answer.
However, the wisdom of crowds effect disappears if people aren’t deciding independently of each other, but rather going with the crowd. In meetings, people will often go with the opinion of the first speaker. Those who disagree will silence themselves so they don’t appear stupid or disagreeable, which creates the illusion of consensus.
Political pundits who are most confident about their ability to predict the future are the least accurate. Political experts in general do only slightly better than chance when predicting what will happen.
Medicine has a lot of noise in general, but psychiatry is especially bad. In several studies, two psychiatrists give the same patient the same diagnosis only about half the time. This due to diagnosis relying on the patient’s subjective symptoms, the clinician’s subjective interpretation, and the absence of objective measures.
Job interviews and performance reviews are also very noisy and don’t really work. Interviewers tend to hire people culturally similar to themselves or who have the same race, gender, or educational background. Interviewers will often disagree with each other over who is the best candidate. First impressions matter a lot. The first two or three minutes of the interview overshadow the rest. How firm your handshake is can matter more than you’d think.
So how can we solve noise? The best way to reduce noise is to let algorithms make decisions rather than people. Simple rules outperformed human bail judges by looking at just two factors: age of defendant (older people are a lower flight risk) and number of past court dates missed. AI with even more information did even better.
Of course, the algorithm is only as good as the data that goes into it, so we have to be careful not to perpetuate discrimination that’s baked into the past data. It’s also important to keep in mind that while algorithms are less error-prone than humans, they’re still not perfect.
For a variety of reasons, humans rebel against the idea of handing their decision making over to computers. So what’s the best way to reduce noise without using algorithms?
There are a number of things we can do to reduce error. The first step to fix noise is to recognize there’s a problem by having different people independently evaluate the same case and see how different their conclusions are.
When making judgements in particular cases, it’s helpful to know what typically happens on average and use that as a reference point. If you don’t know what’s typical, your judgement will be wildly off.
It’s better to compare cases to each other rather than try to judge everything on a case-by-case basis. How much money it takes to be rich depends on who is being compared to whom. Asking if a particular person is rich leads to more error and variation in judgement than if you’re comparing them to someone else. There’s more consistency if a teacher ranks student essays against each other, than if the teacher tries to grade them individually.
Fingerprint examiners shown the same fingerprints twice sometimes change their opinion on whether it’s a match. Confirmation bias has also been observed in blood pattern analysis, arson investigations, skeletal remain analysis, forensic pathology, and even in DNA analysis. To avoid bias, examiners should not know the details of the case beforehand, and when a second examiner verifies, they shouldn’t know what the first person concluded to keep their judgment truly independent.
We all tend to be more confident than we should be. We should always consider alternative explanations, not just focus on our preferred explanation. Revisiting a decision at a different time helps reduce noise related to mood, the weather, etc.
Whenever there’s more than one way to interpret something, people will come to different conclusions. When asking someone to make a judgement, be as clear and unambiguous as possible.
While all people make errors, some are better decision makers than others. What makes someone a better judge is intelligence, experience, and open-mindedness. People who are willing to change their mind make better decisions than those who stubbornly stick to their guns.
I’ll admit, the academic nature of the book went over my head at times. They spent a lot of time defining and explaining their jargon, but there’s so much of it, it’s hard to remember it all. I’m still even a bit fuzzy on the difference between bias and noise. I don’t think it was necessary to spend so much time on defining their terms when they could have used colloquial language to get the same points across. The book also felt longer than it needed to be, but maybe that’s just me. It’s common for humans to disagree with each other after all.