1.11: Statistics - Making Sense of the Numbers
Data do not give up their secrets easily. They must be tortured to confess.
—JEFF HOPPER1
What Numbers Can Tell Us
Here is a chart that seems to say quite a lot.
FIGURE 4. Civilian unemployment rate
Retrieved from https://www.bls.gov/charts/employment-situation/civilian-unemployment-rate.htm# .
After the disastrous recession of 2008, the story about jobs in the United States seems rosy indeed. The trend from July 2010 until July 2018 shows a dramatic decline in the national unemployment rate. As I write these words in the summer of 2018, “job openings hit record highs and the unemployment rate dipped to the lowest level in decades.”2 Ordinarily all this would have resulted in higher wages for working men and women and an increase in their standard of living. This is not, however, how it feels to many working Americans. Perhaps the following chart gives a more accurate account of what is really going on.
FIGURE 5. Median weekly earnings, 2004–2014
Retrieved from Bureau of Labor Statistics, The Economics Daily , https://www.bls.gov/opub/ted/2014/ted_20141028.htm .
All this stuff—“the seasonally adjusted civilian unemployment rate,” “inflation adjusted median weekly earnings,” and the like—really matters for a number of reasons. The biggest concern, of course, is that most of my readers have bills to pay, families to support, and financial plans to make for their futures; what their paycheck is, and what it buys them, are of paramount importance. In addition, politicians of all stripes demand their votes because the economy is doing so well or because it is doing so poorly. Finally, as good explanation seekers, we would all like to know what’s going on.
Were it my paycheck, my vote, or simply my intellectual curiosity, I’d probably take an economics course or two, read a bit more about where the parties and their candidates stand on all this, and as you may have guessed, apply the methods of inference to the best explanation to all this statistical data.
Samples and Populations
We will use the term population as jargon for any sort of a group—a group of people; a group of things, such as vehicles that get better than thirty miles per gallon; or groups of very abstract things, such as depictions of Santa Claus in primetime television. We can use the mathematician’s notion of a set to characterize a population. Similarly, we will use the term sample as jargon for any part of the group constituting the population. Thus samples are subsets of the set making up the population. In a familiar Venn diagram, the lighter, smaller oval constitutes the sample and the darker, larger oval the population.
Very often we are interested in samples because we assume that they can tell us something interesting about the population. You might well ask, If we are really interested in the population, why we wouldn’t just look at it directly? And the simple answer is one of practicality. It would be too time consuming, too expensive, or otherwise too impractical to survey the entire population. Thus we use the sample, which can be examined and described, as a clue about the whole population, which cannot.
Inferences from samples to populations are classic examples of inferences to the best explanation. Our data are the discovery that some sample has an interesting feature or property, and we use this as evidence that the population also has this property. We ask the explanatory question—Why does the sample have P ? And our hypothesis answers that it has P because the population as a whole has P .
e 1 . Sample has property P .
t 0 . Population has property P .
Couldn’t It Just Be a Fluke?
I hope by now you are almost programmed when you see an argument such as the previous one to begin to think of rival explanations. Sure, if the population has P , that would be a good explanation of why the sample has P . But what else might explain the sample having P ?
I get home at 6:00 on a Tuesday evening and before I can finish looking at the mail and fixing a martini, the phone has rung three times, all from charitable organizations seeking contributions. I conclude that this Tuesday is a big push for getting money. My sample, those three phone calls, is pretty skimpy. After all, I’m offering a hypothesis about the whole country (or perhaps state or county). Isn’t the following rival explanation just as plausible, perhaps more plausible, than my charity full court press theory?
t 1 . It’s just a coincidence that those three calls were all from charitable organizations.
Or more generally,
t 1 . It’s just a coincidence that the sample has property P .
Modern probability theory has devoted a good deal of time and attention to developing some very sophisticated mathematical tests of how likely it is that a sample will have a given property simply as a matter of random chance. Some of you may be familiar with some of these tests for what is called statistical significance from other courses or computer software. Even those of you who hate numbers or math would be well advised, in my humble opinion, to learn a bit about all this by taking an introductory statistics course. But that is not my goal in the present context.
Even those of you with the least experience and confidence with mathematics know that the size of the sample matters in important ways. A sample of three calls tells us almost nothing, while a sample of three thousand can tell us quite a lot. We will confine our discussion to an informal treatment of what statisticians call statistical significance. How accurate are our measurements within samples of a given size? A contemporary philosopher of science Ronald Giere offers what he calls a rule of thumb for answering this question.3 He offers the following scale for correlating the size of the sample with the accuracy of what is being measured:
| SAMPLE SIZE (PEOPLE) | ACCURACY |
|---|---|
|
100 |
±10 percent |
|
500 |
±5 percent |
|
2,000 |
±2 percent |
|
10,000 |
±1 percent |
You might note a couple of things about this little chart. One is how nicely the first digit in the sample size correlates with the accuracy measurement, thus making it pretty darn easy to remember. The other is what economists call “the law of diminishing returns.” Increasing the sample from one hundred to five hundred buys you a lot of increased accuracy; increasing it from two thousand to ten thousand buys you hardly any increased accuracy. You will find, I predict, that almost all the polls you read about in the newspapers will have sample sizes around five hundred. This is because an accuracy of about ±5 percent is all that is needed for most purposes, and it would be very expensive and time consuming to improve that accuracy significantly.
Couldn’t the Sample Be Biased?
The notion of bias in colloquial speech often conveys a lack of openness or even prejudice, which counts as a kind of character defect—for example, “he’s really biased in his grading against student athletes.” I’m biased toward folk and rock music because it’s what I grew up with. Some of you, God forbid, are biased toward hip-hop for the same reason. All the notion really means is that people are not equally open—to giving good grades, appreciating a song as a good one, or noticing that the dishes need to be washed. We need to make sure that our samples are not biased but equally open to everyone or everything in the population.
Statisticians desire randomly selected samples. This is technical jargon that means every single individual in the population has an equal probability of being selected as a member of the sample. My computer can approximate random selection, so it would be relatively easy for me to feed in all my class rosters for the past five years, randomly select three students from each course, and then query this sample to discover things about my teaching, grading, and so on. Not a bad idea, actually.
In the real world, however, technical randomness is often impossible. We only have a couple of days to find out voter sentiment in the upcoming election, and so we phone a sample of six hundred likely voters. Obviously, this is not a true random sample, since every likely voter did not have an equal chance of being selected—some didn’t have phones, some were away on vacation, and some screen their calls. But for practical purposes, if the phone numbers are randomly selected from a master list of likely voters who answer their phones, the information we gather approximates what could be gathered from a technically random sample, and our sample might be characterized as practically random . Technically random samples are the exception, while what we hope are practically random samples are the rule.
Consider a very famous poll that went spectacularly wrong. The Literary Digest had been conducting polls on presidential elections since 1920 and had gotten the winner right in four straight elections; indeed, in the 1932 election, they got the popular vote right within 1 percent. As the 1936 election approached, they once again conducted a massive poll. Take a look at the relevant data.
e 1 . The Literary Digest mailed out more than ten million straw vote ballots.
e 2 . Their sample was drawn primarily from automobile registration lists and phone books.4
e 3 . “Over 2.3 million ballots were returned.”5
e 4 . 55 percent planned to vote for Alf Landon, 41 percent for Roosevelt, and 4 percent for Lemke.
This led to their conclusion that voters overwhelmingly favored Landon and their cover story prediction that he would win the election. They made a classic inference from a sample to a population.
e 1 . Literary Digest sample strongly favors Landon.
t 0 . Voters, nationally, strongly favor Landon.
Bad luck for the Literary Digest ! You, of course, know that Alf Landon never became president. I’ll bet a good number of you have never even heard of him before. Roosevelt crushed Landon in the general election 61 percent to 37 percent. What went wrong?
The Digest ’s sample was horribly biased. Not because they were prejudiced or had some ax to grind but because the way they selected the names and addresses was far from random—not the technical randomness that we almost never find, but the practical randomness that good polling requires. The clue is in e 2 . This was, after all, the height of the Great Depression. Poor people were much less likely to own a car. And even phones were then considered not necessities but, in a sense, luxuries. Again, poor people were much less likely to have phones. What the Literary Digest had unintentionally done is measure the sentiments of relatively wealthy voters, not voters in general. This suggests the following rival explanation:
t 1 . Wealthy voters strongly favor[ed] Landon.
It is well known in political science that wealthier voters tend to vote for Republicans and less wealthy voters for Democrats. It’s hardly surprising, therefore, that a sample of voters biased toward the Republican Party tended to favor the Republican candidate.
There was a second source of bias in the sample that is less well discussed in academic circles. The whole poll depended on what statisticians call the “response rate.” The Literary Digest sent out a truly amazing number of straw ballots—more than ten million. They got a pretty good response too—almost a quarter. But we should ask ourselves if there was anything special about those 2.3 million who took the trouble to mail their ballots back. It seems reasonable to suppose that they were more educated and politically concerned. So we have a second rival explanation:
t 2 . Better educated and politically concerned voters favored Landon.
And, indeed, t 1 and t 2 nicely complement one another and suggest a more comprehensive rival:
t 3 . Wealthy voters, as well as better educated and politically concerned voters, favored Landon.
Lest any of you think that all this concern with polling for presidential elections is a thing of the past, you might well reflect on the recent elections. Here’s what professional pollsters were worried about as the 2008 election approached:
“We were all scared to death in 2004, because we had a close race and the cell phone-only problem was already with us then,” says Scott Keeter, the head of surveys at the Pew Research Center . . .
“Pollsters have learned quite a bit about the cell phone-only users they do call. They are most likely to be under 30, unmarried, renters, making less than $30,000 a year, and are slightly more likely to be black or Hispanic,” says Keeter. . . .
He adds, “It suggests that if there are enough of them, and you are missing them in your landline surveys, then your polls will have a bias because of that.” 6
Naomi Oreskes’s Study
There is an interesting segment in Al Gore’s movie, An Inconvenient Truth , where he cites a scholarly study of peer-reviewed articles on climate change.
A University of California at San Diego scientist, Dr. Naomi Oreskes, published in Science magazine a massive study of every peer-reviewed science journal article on global warming from the previous 10 years. She and her team selected a large random sample of 928 articles representing almost 10% of that total, and carefully analyzed how many of the articles agreed or disagreed with the prevailing consensus view. About a quarter of the articles in the sample dealt with aspects of global warming that did not involve any discussion of the central elements of the consensus. Of the three-quarters that did address these main points, the percentage that disagreed with the consensus? Zero. 7
Here we have, a little bit secondhand, an incredibly interesting, and potentially quite important, sample. The argument leaves the conclusion unstated but still quite obvious—almost all natural scientists publishing on climate change endorse the consensus view about climate change.
e 1 . In a sample of 928 peer-reviewed articles dealing with climate change, 0 percent disagreed with the consensus view.
t 0 . Virtually all peer-reviewed research on climate change endorses the consensus view.
Mr. Gore is quite right that Dr. Oreskes published a short, but very influential, article, “Beyond the Ivory Tower: The Scientific Consensus on Climate Change,” in a prestigious journal, Science , in December of 2004.8 She begins by reminding her readers that policy makers and the mass media often suggest that great scientific uncertainty about “anthropogenic” climate change but states flatly, “This is not the case.”9
In defense of her thesis, she offers a fairly elaborate study she has conducted. She offers a working definition of what she will call “the consensus view,” from reports by the Intergovernmental Panel on Climate Change:
Human activities . . . are modifying the concentration of atmospheric constituents . . . that absorb or scatter radiant energy. . . . Most of the observed warming over the last 50 years is likely to have been due to the increase in greenhouse gas concentrations. 10
Notice the challenge she faces. She is making a claim about a very large, and not that well-defined, population—science (“great scientific uncertainty”). To make matters worse, policy makers and the media dispute her claim.
Her first move is to more carefully define the population she is interested in. She utilizes a standard reference tool in the natural sciences, the Institute for Scientific Information (ISI) database. In this database, authors are asked to identify certain “key words,” really topics, that their articles address. Professor Oreskes searched for the key word “climate change.” Her team then randomly selected more than 928 articles.
Obviously not every article is going to explicitly endorse or disagree with the consensus view, so Oreskes and her team had to read and “code” the articles. They broke them down into six categories.
The 928 papers were divided into six categories: explicit endorsement of the consensus position, evaluation of impacts, mitigation proposals, methods, paleoclimate analysis, and rejection of the consensus position. Of all the papers, 75% fell into the first three categories, either explicitly or implicitly accepting the consensus view; 25% dealt with methods or paleoclimate, taking no position on current anthropogenic climate change. Remarkably, none of the papers disagreed with the consensus position. 11
She is also quite candid that a certain amount of judgment and editing of the sample was required.
Some abstracts were deleted from our analysis because, although the authors had put “climate change” in their key words, the paper was not about climate change. 12
So what do we (none of us trained climate scientists) think of Professor Oreskes’s evidence? We possess the tools to make some sort of evaluation.
We have a fair amount of data that is being offered as evidence:
e 1 . Definition of the “consensus view”
e 2 . ISI database
e 3 . Key word: climate change
e 4 . 928 articles
e 5 . Some articles did not really address climate change and were removed.
e 6 . Six potential categories
e 7 . 75 percent “implicitly or explicitly” endorsed the consensus view.
e 8 . 25 percent took no stand.
e 9 . Not one article disagreed with the consensus view.
t 0 . Almost all scientists working and publishing on climate change endorse the consensus view.
Rival Explanations of the Sample
We will begin with two different rival explanations that attribute the fact that no one challenged the consensus view to pure chance. Perhaps it was just a fluke that all 928 articles either endorsed the consensus view or took no position on it. Perhaps the study tells us something about the articles in the ISI database, but it’s simply a fluke that the articles that the database includes are not skeptical but that other peer-reviewed articles not included are skeptical. Either of the following sorts of mathematical coincidence is possible:
t 1 . It was a fluke that the 928 articles showed no skepticism about the consensus view; the ISI database contained many articles that were skeptical.
t 2 . Although the sample told us something significant about the ISI database, it was a coincidence that the articles they included showed no skepticism when in fact many peer-reviewed articles not included show plenty of skepticism.
I have already conceded that both of these rivals are logically possible. I want to insist, however, that they are very improbable. Remember Giere’s “rule of thumb”? He tells us that for random samples, the margin of error is a direct function of the size of the sample. Samples of five hundred are accurate to about ±5 percent, and samples of two thousand are accurate to about ±2 percent. That means that Professor Oreskes’s sample has an accuracy of, conservatively, ±4 percent. For a statistician adopting a 95 percent confidence level, there is only a 5 percent chance that the population falls outside of the ±4 percent margin of error. Could it happen? Yes. Is it likely at all? No.
Much more interesting rivals will have to do with the problem of bias, either intentional or, more likely, unintentional. I suspect that some of you have already wondered if there might be a bias in the ISI database. Maybe they only list “green” articles. Again, the following rival explanation is possible:
t 3 . The ISI database is biased in favor of the consensus view.
A very different sort of bias is possible because of Oreskes’s methodology. It is highly unlikely that most of the articles in the sample came right out and said where they stood on the consensus view. Indeed, she tells us that some of the endorsement was implicit. That must mean that her team had to “code,” or otherwise interpret, that article’s intention and subsequent endorsement or nonendorsement. Perhaps her team was so unconsciously wedded to the consensus view that they misinterpreted many of the articles as endorsing or taking no stand when in fact the authors of those articles intended a rejection of the consensus view. Thus another possible rival explanation focuses on the coding of the articles:
t 4 . Oreskes, because of her biases, misinterpreted many of the articles as favorable or neutral when in fact the authors were arguing against the consensus view.
A final rival explanation centers on the possible bias of the entire scientific community. One might argue, as some have in defense of “creation science,” that there is a kind of professional conspiracy that effectively censors articles that challenge the consensus view (not just of climate change but of any accepted scientific theory) from being published in peer-reviewed journals in the first place. Here, the rival does not really challenge the population of peer-reviewed publications, but rather the implied attitude of endorsement by working scientists.
t 5 . Respectable scientists arguing against the consensus view cannot get their articles published in peer-reviewed journals.
The Best Explanation?
In the case of the rivals focusing on a statistical fluke, I could argue against their plausibility by focusing on their mathematical improbability. No such technique exists for dealing with the rivals t 3 , t 4 , and t 5 . Nevertheless, I want to argue that they are all implausible, at least when compared to the original explanation that there exists practically universal endorsement of the consensus view regarding climate change among trained climate scientists.
Consider first the journal that Oreskes’s article appeared in, Science . The journal is one of the most highly respected academic journals in the world. They have a huge interest in policing themselves, since their name is on the cover of every article they publish.
Next, we must face the charge that the Institute for Scientific Information is somehow biased. Again, we are dealing with a very prestigious and widely used reference tool, which is now operated by a for-profit corporation. The ISI has a huge stake, both its reputation and its financial outlook, in being regarded as absolutely trustworthy. Thus they too can be expected to police themselves.
The same may be argued for Professor Oreskes herself. She is a highly respected scholar, educator, and university administrator. Her own professional reputation is on the line. She would be insane not to carefully ensure the accuracy of an article in a major journal that was guaranteed to be read and debated by a wide audience of scientists and indeed, those outside of the sciences.
Finally, we come to perhaps the most serious of the charges in our rivals. Perhaps all climate science is biased against critics of the consensus view. As I said in an earlier chapter, these sorts of conscious or unconscious conspiracy theories are offered by critics of natural selection. I want to concede that something like that can happen, and the history of science tells us that it has happened on occasion. In a way, the criticism of Semmelweis’s theory by skeptics of the entrenched generation had shades of this mechanism. But with all this conceded, I have to tell you that this sort of thing is very, very rare. Most natural scientists respect the need for skepticism from their peers. Studies challenging the consensus view, in one sense, have a better chance of being published, if for no other reason than that they are saying something new. Furthermore, we live in the age of information. Much more is being published, and many more venues for peer-reviewed academic publishing exist now. Thus the fact that the ISI database did not include even one skeptical defense leads me to believe that there just aren’t many skeptics out there, at least not within mainstream climate science.
EXERCISES
- 1. For the fall quarter of 2008, Eastern had 3,666 students. When you break down that number on the basis of sex, you discover something a little surprising. 2,344 of those students were female, while only 1,322 were male. Why would it have been a bad idea to take the 2008 institutional data from Eastern as telling us anything significant about gender and college attendance nationally?
- 2. Why do we almost never see samples that are truly (technically) random?
- 3. Teaching evaluations for online courses have notoriously low response rates. Less than 10 percent of my online students return their course evaluations. What kinds of bias might infect the accuracy of these student evaluations? Is this sample close enough to practical randomness to tell us anything interesting about the quality of my online teaching?
QUIZ ELEVEN
A recent Gallup News story claims that “public concern about global warming is evident across all age groups in the U.S., with majorities of younger and older Americans saying they worry about the problem a great deal or fair amount. However, the extent to which Americans take global warming seriously and worry about it differs markedly by age, with adults under age 35 typically much more engaged with the problem than those 55 and older.”13
The following results were “based on aggregated telephone interviews from four separate Gallup polls conducted from 2015 through 2018 with a random sample of 4,103 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia. For results based on the total sample of national adults, the margin of sampling error is ±2 percentage points at the 95% confidence level. All reported margins of sampling error include computed design effects for weighting.”14
Here is a summary of their findings: 75 percent of respondents aged eighteen to thirty-four believed that “global warming is caused by human activities,” while only 55 percent of respondents aged fifty-five and over believed this. Apropos our earlier discussion, 73 percent of the younger cohort thought “most scientists believe global warming is occurring,” but only 58 percent in the older group thought this was true.15
Based on the information in the Gallup polls, use the techniques developed in this chapter to evaluate the quality of evidence we have for the author’s claim that “the extent to which Americans take global warming seriously and worry about it differs markedly by age.”
Here is the complete article from Gallup: https://news.gallup.com/poll/234314/global-warming-age-gap-younger-americans-worried.aspx .
Notes
1. Quoted in Sourav S. Bhownick and Boon-Siew Seah, Summarizing Biological Networks (New York: Springer, 2017), vii.
2. Heather Long, “In U.S., Wage Growth Is Being Wiped Out Entirely by Inflation,” Washington Post , August 10, 2018, https://www.washingtonpost.com/business/2018/08/10/america-wage-growth-is-getting-wiped-out-entirely-by-inflation/?noredirect=on&utm_term=.65fd9f744116 .
3. Ronald Giere, Explaining Science (Belmont, CA: Wadsworth, 2005), 142–44.
4. Peverill Squire, “Why the 1936 Literary Digest Poll Failed,” Public Opinion Quarterly 52, no. 1 (Spring 1988): 128.
5. Squire, 128.
6. Audie Cornish, “Do Polls Miss Views of the Young & Mobile?,” NPR, October 1, 2007, http://www.npr.org/templates/story/s...oryId=14863373 .
7. Al Gore, An Inconvenient Truth (Emmaus: Rodale, 2006), 262.
8. Naomi Oreskes, “The Scientific Consensus on Climate Change,” Science 305, no. 5702 (2004): 1686, http://www.sciencemag.org/cgi/content/full/306/5702/1686 .
9. Oreskes, 1686.
10. Oreskes, 1686.
11. Oreskes, 1686.
12. Oreskes, 1686.
13. R. J. Reinhart, “Global Warming Age Gap: Younger Americans Most Worried,” Gallup, May 11, 2018, https://news.gallup.com/poll/234314/global-warming-age-gap-younger-americans-worried.aspx?version=print .
14. Reinhart.
15. Reinhart.