How/Why We Suck At Statistics

The Pen Of Darkness
7 min readFeb 19, 2021

We suck at statistics. Statisticians suck at statistics. It’s been 50 years since Tversky/Kahnemann’s paper on the belief in the law of small numbers. 50 years. Another failure of statistical intuition, to not realize just how long we’ve been living with knowledge of these results. But why do we suck at statistical intuition? Seems like a poor adaptation, to draw incorrect and incomplete conclusions about the world. It’s likely easier to ask what is different about statistical problems that lead our finely tuned minds into error, where before (or otherwise) they have strong functional bases. More generally, under what circumstances does our statistical intuition get better or worse? In our fake-news, post-truth world, it is tempting to say the answer is just trust vs misanthropy.

Take the individual heuristics that together contribute to failure of statistical intuition. We can either assume they are adaptive (positive), or maladaptive. If the latter, then our position must be either that the environment has changed so drastically so as to turn an adaptive trait (since we’ve survived and evolved to get here) into a maladaptive one, or that it’s always been a flaw of the human brain on its journey to evolutionary perfection. Like a Chesterton Fence, it’s interesting that conservative scientific institutions, Europe for instance, seem more likely than American ones to err on the side of these ‘biases’ being adaptive in some way, and not flaws of irrationality for behavioral economists to identify and fix (or manipulate).

  1. Patterned Randomness: Consider a series of 6 coin-flips,

HHHHHH

HTTHHT

We tend to choose the first as non-random and the 2nd as random. They both have the same mathematical probability but we can’t ignore patterns. We have more to lose by assuming randomness than by assuming intentionality and agency. If a psychology researcher, a street magician or anyone I suspect has a purpose, performs this coin-toss experiment, my statistical intuitions are not poor but instead overridden by my mistrust. The question instead is, if I get to control the circumstances of the coin-toss, am I still liable to statistical error? It is my coin, I know it well, I toss it everyday and observe its track-record of unbiased randomness, then am I not less likely to be fooled by my intuition that HHHHHH is less probable?

2. Mental Illusions: Given a 2D picture of three identically-sized men placed on a conveyor belt, our intuitions of depth perception use converging lines to locate the relative distances of the men, and therefore assume the one farther away is a much larger man (since his relative size is equal to the others).

There’s an alternative way to think about this though. Medieval art is strikingly different in its lack of perspective. The loss of knowledge and technique is not unlikely. But neither is the ideology of symbolism, depicting size not based on realism but on importance. It is equally striking that the formal theories on perspective in Art, whether Aristotle’s Poetics (skenographia, used to give the illusion of depth on stage) or Brunelleschi’s rules, were built around the creation of illusions. The illusion of reality. When a dear uncle produces a coin from behind my ear, I am delighted. When a con-man does it on the street, I check my pockets and change my locks. The problem with illusions is they are designed to trick. The rules of perspective drawing, or parallel projection, are all designed to trick you into thinking a 2D representation is 3D reality.

So when the illusion is done inconsistently, like sizing 3 men identically without fore-shortening of distance, then we can either assume it is done poorly by an artist who doesn’t know what he’s doing, or we can assume he knows exactly what he’s doing (especially because the background perspective is done so well) and the men are all very differently sized in reality.

The famous example of intuition failure is the Kidney Cancer problem. It is found that clusters of very low incidence of kidney cancer occur in rural areas of the Midwest and South. This must be, we reason, on account of healthy active lifestyle, clean air, and good food. We’re searching for differences in the rural Midwest, ordering them by those differences that have health impacts, and arriving at a hypothesis that fits all the data. Except it doesn’t.

Clusters of very high incidence of kidney cancer also occur in rural areas of the Midwest and South. This must be, we reason, on account of poorly informed nutrition choices, excess alcohol and tobacco. The prime lesson researchers take from this problem is our inability to spot sampling biases. But I can’t help think about another lesson. The statisticians messed up, and any credulousness on my part isn’t a failure of intuition but a willingness to trust experts. If my stoner friend, budding rap artist and failed bio-protein entrepreneur, tells me the kidney cancer problem, I’m likely to assume he got something wrong and press him for details. If a data scientist tells me the same thing, I’m likely to accept the data as fact and search for explanatory hypotheses like nutrition, carcinogenic environments, and genetic clusters.

Cultural learning is a huge adaptive trait of ours, and the fact that we are led into statistical intuition failure is not evidence of a faultily designed brain, but instead a symptom of a world where the proportion of competent and trustworthy expert-sources of information gets increasingly smaller.

3. Law Of Small Numbers: It remains a fact, though, that there is a powerful failure of statistical intuition in the kidney cancer problem, our inability to appreciate the power of large numbers and the fickleness of small ones. It is much easier to get extreme results from small samples. The chances of getting HHHH, a polarized result, are 16 times greater than getting HHHHHHHH. So from tiny rural samples, it is no surprise we find clusters of both high-incidence as well as low-incidence, whereas from large urban samples, we get more moderate data. Is this adaptive?

Large-scale data is a recent phenomenon and it isn’t clear to me how or why we could’ve ever developed statistical intuition about large samples vs small ones. How: we’ve never had access. Why: we’ve never had uniformity. Even when small tribes banded together into larger groups, there has never been perfect entropy and mixing, we’ve always remained an aggregation of small manageable uniform clusters, ethnic, religious, familial, linguistic, or favorite football team. It’s easier to think of this in terms of a jar of marbles, 50 red and 50 white. If I choose 4 at a time, it is more likely I get RRRR than if I chose 8 at a time. In a perfectly mixed jar, my intuition to believe an RRRR result to be representative of a pattern will be wrong. But I’ve never had perfectly mixed jars, only jars with individual packets containing similarly colored marbles. My intuition that an RRRR result likely means a sub-packet containing only red marbles will be statistically borne out in that case.

Our heuristics and biases therefore are received artifacts from a world with much lower entropy and scale. The conventional patterns of our world are being dismantled or reordered faster than our co-evolved impressions of these patterns,

4. Stories Over Data: In a phone survey of 300 elderly people, 60% approved of the President.

There are only a thousand things wrong with this sentence. But we like stories, and the story isn’t the subject, it’s the predicate. 60% approved of the President. So we ignore the fact that 300 is a real non-number. The elderly aren’t representative. Phone surveys aren’t credible. People self-select when agreeing to a survey. Is 60% high, low or average? If it’s gone up, was it the same 300 people or completely random different ones? We prefer to construct a representation of reality that is far too neat. Messiness upsets us.

But this is very unsatisfying. It would be a different thing to say that our appreciation for the mess, our skepticism of the data, and our inspection of statistical intuition, would all result in more complete and useful representation of reality. Like in this example, though, we are rarely in a position to say that. Our only 2 options are,

a) Identify the 1000 things wrong with the data and correctly reject it: Now I am no better off than before, I have no new information about the world.

b) Accept the new information: I now know more than I did.

My preference then stops depending on the accuracy and veracity of the information, and instead depends on the Type 1 and 2 errors of gullibility, ie how dangerous it is to me if I believed it (if wrong) and how advantageous it is to me if I did (if correct). I’ve already moved on from the ‘really?’ to the ‘so what’, and I see that as hugely adaptive. Our primary reaction to any fact is its impact, not its cause. The analytic is a process that comes later, assuming that our knowledge about the cause of a forest fire, for instance, would help us re-engineer fire itself. But first, we run away, or approach with skillets. When an Improv artist calls this Yes-And, deeming it superior to No-But, we nod along sagely. When a Behavioral Economist calls it cognitive failure, we also nod along sagely. Because they’re both experts.

--

--

The Pen Of Darkness

A novel insightful exercise to determine the pragmatic difference in intellectual payoff between a novel insight and an obvious fact mistaken for novel insight.