Suppose a study showed that of the 3,141 counties of the United States, the incidence of kidney cancer was lowest in those counties which are mostly rural, sparsely populated, and located in traditionally Republican states. In fact, this is true.9 What accounts for this interesting finding? Most people would be tempted to look for a causal explanation—to look for features of the rural environment that account for the lower incidence of cancer. However, they would be wrong (in this case) to do so. It is easy to see why once we consider the counties that have the highest incidence of kidney cancer: they are counties that are mostly rural, sparsely populated, and located in traditionally Republican states! So whatever it was you thought might account for the lower cancer rates in rural counties can’t be the right explanation, since these counties also have the highest rates of cancer. It is important to understand that it isn’t the same counties that have the highest and lowest rates—for example, county X doesn’t have both a high and a low cancer rate (relative to other U.S. counties). That would be a contradiction (and so can’t possibly be true). Rather, what is the case is that counties that have the highest kidney cancer rates are “mostly rural, sparsely populated, and located in traditionally Republican states” but also counties that have the lowest kidney cancer rates are “mostly rural, sparsely populated, and located in traditionally Republican states.” How could this be? Before giving you the explanation, I’ll give you a simpler example and see if you can figure it out from that example.
Suppose that a jar contains equal amounts of red and white marbles. Jack and Jill are taking turns drawing marbles from the jar. However, they draw marbles at different rates. Jill draws 5 marbles at a time while Jack draws 2 marbles at a time. Who is more likely to draw either all red or all white marbles more often: Jack or Jill?10
The answer here should be obvious: Jack is more likely to draw marbles of all the same color more often, since Jack is only drawing 2 marbles at a time. Since Jill is drawing 5 marbles at a time, it will be less likely that her draws will yield marbles of all the same color. This is simply a fact of sampling and is related to the sampling errors discussed in section 3.1. A sample that is too small will tend not to be representative of the population. In the marbles case, if we view Jack’s draws as samples, then his samples, when they yield marbles of all the same color, will be far from representative of the ratio of marbles in the jar, since the ratio is 50/50 white to red and his draws sometimes yield 100% red or 100% white. Jill, on the other hand, will tend not to get as unrepresentative a sample. Since Jill is drawing a larger number of marbles, it is less likely that her samples would be drastically off in the way Jack’s could be. The general point to be taken from this example is that smaller samples tend to the extremes—both in terms of overrepresenting some feature and in underrepresenting that same feature.
Can you see how this might apply to the case of kidney cancer rates in rural, sparsely populated counties? There is a national kidney cancer rate which is an average of all the kidney cancer rates of the 3,141 counties in the U.S. Imagine ranking each county in terms of the cancer rates from highest to lowest. The finding is that there is a relatively larger proportion of the sparsely populated counties at the top of this list, but also a relatively larger proportion of the sparsely populated counties at the bottom of the list. But why would it be that the more sparsely populated counties would be overrepresented at both ends of the list? The reason is that these counties have smaller populations, so they will tend to have more extreme results (of either the higher or lower rates). Just as Jack is more likely to get either all white marbles or all red marbles (an extreme result), the less populated counties will tend to have cancer rates that are at the extreme, relative to the national average. And this is a purely statistical fact; it has nothing to do with features of those environments causing the cancer rate to be higher or lower. Just as Jack’s extreme draws have nothing to do with the way he is drawing (but are simply the result of statistical, mathematical facts), the extremes of the smaller counties have nothing to do with features of those counties, but only with the fact that they are smaller and so will tend to have more extreme results (i.e., cancer rates that are either higher or lower than the national average).
The first take home lesson here is that smaller groups will tend towards the extremes in terms of their possession of some feature, relative to larger groups. We can call this the law of small numbers. The second take home message is that our brains are wired to look for causal explanations rather than mathematical explanations, and because of this we are prone to ignore the law of small numbers and look for a causal explanation of phenomena instead. The small numbers fallacy is our tendency to seek a causal explanation for some phenomenon when only the law of small numbers is needed to explain that phenomenon.
We will end this section with a somewhat humorous and incredible example of a small numbers bias that, presumably, wasted billions of dollars. This example, too, comes from Kahneman, who in turn heard the anecdote from some of his colleagues who are statisticians.11 Some time ago, the Gates foundation (which is the charitable foundation of Microsoft founder, Bill Gates) donated 1.7 billion to research a curious finding: smaller schools tend to be more successful than larger schools. That is, if you consider a rank ordering of the most successful schools, the smaller schools will tend to be overrepresented near the top (i.e., there is a higher proportion of them near the top of the list compared to the proportion of larger schools at the top of the list). This is the finding that the Gates Foundation invested 1.7 billion dollars to help understand. In order to do so, they created smaller schools, sometimes splitting larger schools in half. However, none of this was necessary. Had the Gates Foundation (or those advising them) looked that the characteristics of the worst schools, they would have found that those schools also tended to be smaller! The “finding” is merely a result of the law of small numbers: smaller groups tend towards the extremes (on both ends of a spectrum) more so than larger groups. In this case, the fact that smaller schools tend to be both more successful and less successful is explained in the same way as we explain why Jack tends to get either all red or all white marbles more often than Jill.
9 This example taken from Kahneman (2011), op. cit., p. 109.
10 This example is also taken (with minor modifications) from Kahneman (2011), p. 110.
11 Kahneman (2011), pp. 117-118.