Skip to main content
Humanities LibreTexts

1.12: Correlations and Causes

  • Page ID
    24666
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Most of you will have heard the maxim “correlation does not imply causation.” Just because two variables have a statistical relationship with each other does not mean that one is responsible for the other. For instance, ice cream sales and forest fires are correlated because both occur more often in the summer heat. But there is no causation; you don’t light a patch of the Montana brush on fire when you buy a pint of Haagan-Dazs.

    —NATE SILVER1

    Correlations

    The Concise Oxford Dictionary offers two definitions of the term correlation:

    1. 1. Mutual relationship between two or more things.
    2. 2. Interdependence of variable quantities, quantity measuring extent of this.2

    The latter definition gets most of the attention in statistics courses. But the more generic definition is at the heart of reasoning from a cause to an effect. What is the relationship between two things—a car accident on Tuesday and a backache on Wednesday morning? “Obviously” the crash caused the back injury. Well, maybe, but maybe not. Perhaps the back injury (caused from too much of a workout at the gym on Monday) resulted in the crash because of a muscle spasm as the driver was trying to hit the brakes. Or suppose some third thing—say, a small seizure—simultaneously caused the crash by distracting the driver and caused the back injury as the driver wrenched in surprise. And maybe the relationship is one of simple coincidence. The injury occurred at the gym and the crash from foolishly texting while driving—there was simply no causal relationship between the two occurrences. This all suggests four possible causal relationships between any two events, A and B.

    1. 1. A caused B.
    2. 2. B caused A.
    3. 3. Some third “common cause,” C, independently caused both A and B.
    4. 4. There is no causal relationship between A and B.

    We shall see, directly, that there is a fifth possible causal relationship between A and B, but I’m saving that as a surprise. Just what we have so far, though, allows us to explain the correlation between ice cream sales and forest fires. Nate Silver says “there is no causation,” but this is a little careless. He’s right, of course, that A is not the cause of B nor B the cause of A. But there is a causal relationship that best explains the correlation. C (the summer heat) is the common cause of the increased ice cream sales and greater number of forest fires.

    Explaining the Numbers

    Much of statistical reasoning in the social and natural sciences can easily be reconstructed as a related pair of inferences to the best explanation. In the first inference, the explanatory question focuses on a quantitative relationship. We typically have some study or sample that is asserted to tell us something about a larger group or population. Consider the extensive medical data that was uncovered over several decades in the famous Framingham study. Medical researchers were surprised to discover that 29 percent of the men in the forty- to forty-nine-year range suffered from coronary heart disease, while only 14 percent of the women in the same age range suffered from the disease. This tells us something potentially very important about gender and heart disease.

    e1. Of the 771 men in the forty- to forty-nine-year age group, 29 percent showed some signs of coronary heart disease.

    e2. Of the 954 women in the forty- to forty-nine-year age group, only 14 percent showed signs of coronary heart disease.


    t0. Coronary heart disease appears much more often in men than in women.

    One rival explanation that I believe current medical advances force us to take seriously is that coronary heart disease is much more prevalent in woman than was recognized by medical experts at the time of the Framingham study—then current diagnostic indicators failed to correctly identify all the signs of coronary heart disease in women. So the following may better explain some of the gender disparity:

    t1. All the clinical indicators of coronary heart disease in women were not recognized at the time of the study.

    But let’s grant that the Framingham data truly indicated some real gender disparity and that the samples do suggest that coronary heart disease was more prevalent in men.

    Explaining the Correlations

    Noticing this striking correlation between gender and heart disease is only the first step in figuring out what is going on here. We might think that there’s something deeply biological going on.

    e1. Of the 771 men in the forty- to forty-nine-year age group, 29 percent showed some signs of coronary heart disease.

    e2. Of the 954 women in the forty- to forty-nine-year age group, only 14 percent showed signs of coronary heart disease.


    t0. Coronary heart disease appears much more often in men than in women.


    t*0. The biological makeup of males, their hormones, physiology, and DNA, causes an increased danger of coronary heart disease.

    But certainly, the possibility of a cultural explanation must be taken seriously, particularly since the data was collected at a time in our history when gender roles were much more pronounced. Perhaps something regarding the differences in workforce stress between men and women accounts for the disparity in coronary heart disease. Or, perhaps, it’s a simple as diet and alcohol consumption. We are once again confronted with a serious rival explanation:

    t*1. The culturally defined differences in work and lifestyles between men and women cause the differences in coronary heart disease.

    Or this may well be one of those times when the best explanation combines the features identified in alternative explanations:

    t*2. The biological makeup of males as well as the culturally defined differences in lifestyles between men and women jointly cause an increased danger of coronary heart disease.

    I hope that it is obvious by now that I am not suggesting that statistical studies such as the Framingham study are too ambiguous to tell us anything important. The message I take from this is that explaining statistical data can be a difficult task indeed and that carefully considering alternative accounts of statistical correlations may suggest further studies that may need to be conducted before we can fully understand the causal connections between gender and coronary heart disease.

    CO2 and Global Temperatures

    Consider the following data that played such a prominent role in Al Gore’s An Inconvenient Truth.

    Mr. Gore used these data as evidence that CO2 concentrations cause global temperature variations.

    FIGURE 6. Temperature variation from present-day values (blue), atmospheric carbon dioxide concentration (green), and dust (red) based on data from ice cores retrieved at the Vostok drilling site in Antarctica.

    Retrieved from Randy M. Russell, https://eo.ucar.edu/staff/rrussell/climate/paleoclimate/ice_core_proxy_records.html.

    e1. There is a strong correlation between CO2 levels and the Earth’s average temperature.


    t0. High CO2 concentrations cause global temperature variations.

    Given that the correlation is real and not simply a fluke or coincidence—for the modern social scientist, it is statistically significant—we must now determine whether t0 is the best explanation of the correlation. We must compare it to some rival explanations. Perhaps, as some skeptics have claimed, the direction of causation is reversed:

    t1. Global temperature variations cause varying CO2 concentrations.

    This rival is probably a better account of the historical data because many believe that we see the changes in temperatures before we see changes in CO2 level in the historical record. In addition, before the advent of the Industrial Revolution, it was hard to see what else could initiate such large-scale changes in the CO2 concentrations.

    It is likely that the temperature variations . . . drove the CO2 variations, not the reverse. That might have occurred, for example, when warmer temperatures increased the rate of bacterial breakdown of plant material, releasing CO2 to the atmosphere as it warmed. This historical relationship does not, however, refute the modern relationship of human additions of CO2 to the atmosphere driving increases in temperature.3

    Why, you may ask, doesn’t the “reverse cause” rival, t1, refute the anthropogenic hypothesis? Here comes the surprise possible causal relationship between two things, A and B, that I promised earlier.

    [One] potential explanation for the observed warming of the Earth is human activity. There are several reasons to think that this can account for some portion of the observed warming. We know that human activities have been increasing the concentration of CO2 and other greenhouse gases in the atmosphere for at least the past century or two. Measurements show the concentration of CO2 has increased about 30 percent over that time . . . while other greenhouse gases have increased by similar or larger amounts. Basic physics provides strong theoretical reasons to believe that such an increase in greenhouse gases should warm the Earth.4

    It now seems likely that the best explanation of the correlation is that the causal relationship between CO2 and global warming actually points in both directions; increased CO2 concentrations cause increased temperatures, and simultaneously, increased temperatures cause increases in CO2 concentrations. We probably have a kind of feedback loop.

    t2. Increased CO2 concentrations cause increased temperatures, while increased temperatures cause increases in CO2 concentrations.

    In a way, of course, t2 does not really contradict Gore’s original causal hypothesis in t0; it merely offers more detail about the complicated causal relationship between CO2 and global temperatures. So in the sense that we are using the term in the inference-to-the-best-explanation (IBE) recipe, t2 does not even count as a rival explanation. Gore himself is very careful in how he articulates t0.

    It’s a complicated relationship, but the most important part of it is this: When there is more CO2 in the atmosphere, the temperature increases because more heat from the Sun is trapped inside.5

    Causation and Explanation

    It’s hard to write a chapter on causal inferences without noting that many philosophers of science believe that the notion of causation is the fundamental building block of any sort of explanation.

    According to the causal model of explanation, to explain a phenomenon is simply to give information about its causal history or, where the phenomenon itself is a causal regularity, to explain it is to give information about the mechanism linking cause to effect.6

    We should expect to see causal reasoning deeply involved in all inferences to the best explanation.

    Recall poor Connie. She noticed a correlation between two events—her boyfriend’s extended absence and the lipstick stain on his collar when he returned. Almost immediately thereafter she observed a second correlation—the all-too-obvious lipstick stain and Mary Jane’s lipstick being a mess. The heart and soul of Connie’s inference regarding what happened is a causal account of the lipstick stain as well as the causes of the absence and Mary Jane’s cosmetic disaster. The simple A-caused-B or B-caused-A accounts of the correlations all seem artificial or convoluted.

    t1. The extended absence caused the lipstick stain.

    t2. The lipstick stain caused the extended absence.

    t3. The lipstick stain caused the Mary Jane’s lipstick to be all a mess.

    t4. Mary Jane’s lipstick being all a mess caused the lipstick stain.

    But of course, Connie knew exactly what had happened, there was a common cause of the lipstick stain, the extended absence, and Mary Jane’s messed up lipstick.

    t0. Connie’s boyfriend had been smooching Mary Jane. The smooching caused the lipstick stain on his collar, as well as causing him to be gone for half an hour or more at the record hop and causing Mary Jane’s lipstick to get all messed up.

    Or consider Semmelweis’s predicament. He recognized a correlation between his colleague’s being cut while conducting an autopsy and his colleague dying with symptoms very similar to childbed fever. He was led to a straightforward causal explanation:

    t′0. The laceration introduced cadaveric particles into his colleague’s bloodstream, which then caused his colleague’s death.

    Almost simultaneously with this inference, he noticed the key correlation between the high death rate from childbed fever in the First Maternity Division and the fact that autopsies were routinely conducted by the physicians and medical students in the First Maternity Division. And once again, the causal diagnosis was immediately obvious to Semmelweis:

    t″0. Cadaveric particles from the hands of the physicians and medical students were being introduced into the bodies of pregnant women in the First Maternity Division during childbirth and gynecological examinations, and these particles were then causing the childbed fever.

    A Sad Story

    It’s late in the afternoon. Two young men in different cars are headed home. One is a thirty-year-old professional who works for the state; we’ll call him Tony. The other has just graduated from high school and is planning to attend college the coming fall; we’ll call him Corey. Corey is driving well within the speed limit and approaches a stop sign. He comes to a full stop. Although he sees Tony’s car coming, Corey incorrectly believes the intersection is a four-way stop, so he feels safe proceeding through the intersection. Tony is also driving well within the speed limit and having no stop sign proceeds through the intersection. The two cars collide at almost a perfect ninety-degree angle on their front ends. Corey is not hurt at all and leaves his car to check on Tony, who initially reports that he is fine too. Corey and Tony exchange contact and insurance information, and Corey heads home. Tony tries to drive home as well but discovers that the crumpled wheel well makes this impossible. After a long evening waiting for a tow truck, Tony is finally taken home by his fiancée.

    Our story now focuses on Tony. A day or two after the accident, he is stiff and sore and goes to see a chiropractor he has seen before. After hearing about the crash, the chiropractor diagnoses Tony’s complaints as a back injury and begins a treatment protocol based on this. His symptoms start to improve, but over the next few months, pain in his hip and leg gradually increase, and he consults his regular doctor. She suspects that Tony is suffering from some sort of hip injury and even goes on to guess it might be a labral tear. After an MRI and consulting an orthopedic surgeon, the labral tear diagnosis is confirmed. After months of more treatment with mixed success, Tony decides to have surgery to repair the torn labrum. Tony almost dies during surgery because of complications with the anesthetic, but from an orthopedic perspective, the surgery seems to be a success. His symptoms disappear, and he is virtually pain-free. After just a few months, however, Tony’s symptoms begin to reappear and new surgery is planned.

    Tony decides to sue Corey for his expenses—almost $100,000—and for his pain and suffering, he asked for an additional $400,000. I was chosen to serve on the jury for this civil suit. Although the story is indeed sad, sitting on this jury was something of a treat for me because I am a hopeless wannabe lawyer and because it gave me a chance to actually apply inference to the best explanation to a real-world case of legal evidence.

    Our jury was not asked to assign blame, Corey had already admitted he was at fault for the accident. The plaintiff, Tony, therefore, had already established Corey was, what lawyers call, negligent, and Tony was almost certainly going to get some damages. The question was what the amount of those damages should be. The defendant’s attorney conceded that his client was liable for some of Tony’s initial pain and suffering, that original trip to his chiropractor, and certainly the tow truck and body shop expenses. He argued vehemently, however, that Corey bore no responsibility, legally or morally, for extensive orthopedic surgery or the years of suffering that Tony had manifestly endured or his diminished lifestyle as a result of the labral tear because the car accident was in no way causally responsible for the injury. Tony’s whole case, of course, depended on the contrary assertion that the crash had caused the labral tear and that the ensuing three years of pain and psychological suffering were the direct result of Corey’s negligent driving.

    The basic evidence that got this civil suit going in the first place was a classic inference from a correlation—in the first sense defined above—to a cause.

    e1. Corey and Tony’s car were involved in a collision, and shortly after (within three months), Tony was diagnosed with a labral tear.


    t0. The collision caused the labral tear.

    We can imagine reverse cause, and common cause, rival explanations:

    t1. The labral tear occurred three weeks earlier while skiing. Tony could easily have avoided the accident by timely braking, but the loss of mobility from the hip injury prevented him getting to the brake pedal on time. Thus the tear caused the collision.

    t2. A loud crashing sound from a construction site nearby distracted Corey and led to his misreading of the stop signs. It also startled Tony, and as he wrenched to see where the crash came from, he tore his labrum, and because he was distracted, he was slow to apply the brakes. Thus the loud crashing sound caused both the labral tear and the collision.

    Corey’s attorney wisely refrained from suggesting accounts such as these and rested his case on the null hypothesis rival explanation that something completely independent of the car accident caused the hip injury.

    t3. The collision did not cause the labral tear; something else was its cause.

    You may think that t3 is a pretty vague rival theory, and indeed, it is. But it was probably a good trial strategy for two reasons. One is the rules for negligence suits. The plaintiff must “prove,” by a “preponderance of evidence,” that the defendant’s negligent action (remember, Corey had already admitted that he was at fault for the accident and thus legally negligent) caused the financial and psychological loss that needs to be compensated. The defense need not, therefore, explain what did cause the injury but simply show that the generic rival is better (or even just as good) as the plaintiff’s account. The second reason for keeping things vague is that Corey’s lawyer could toss out hints as to what the outside cause of the tear might have been without being committed to any of these theories being a better explanation. The defense, for example, made a big deal out of Tony’s own admission that he had been a very avid skier for most of his life and that the hospital records from Tony’s first surgery showed that surgeon noted a slight physiological abnormality in Tony’s hip. Who knows if a lifetime of skiing caused the tear or if Tony was genetically predisposed to develop such a tear.

    Our jury had to decide between two causal accounts of Tony’s labral tear.

    t0. The collision caused the labral tear.

    t3. The collision did not cause the labral tear; something else was its cause.

    We had before us some “direct” evidence—the chiropractor’s notes, the records from Tony’s surgery, and the towing and body shop bills. The most important evidence, though, came from expert witnesses who could tell us about crashes of this sort, the causes of labral tears, and the like. As you might suspect, the experts for the plaintiff differed quite a bit in their testimony from the experts for the defense.

    e2. Records from Tony’s chiropractor, his surgery, and the bills from the towing company and body shop

    e3. Differing expert accounts of the accident—Was it a T-bone or sideswipe?—and the forces generated

    e4. Differing expert accounts of Tony’s chiropractic history and his visits to his chiropractor following the accident

    e5. Differing expert accounts of how such an accident could cause a labral tear

    For me, and I believe for many of my fellow jurors, the key discrepancy in the expert testimony concerned the etiology of labral tears. Tony’s expert witness was a former orthopedic surgeon who testified that labral tears almost always came from traumatic forces such as athletic injuries or car accidents and almost never from general wear and tear from an active lifestyle such as Tony’s. Corey’s expert witness, also an orthopedic surgeon, testified to exactly the opposite. He told us that most common labral tears came from insidious causes and go undiagnosed for several years.

    e6. Differing expert accounts of the etiology of labral tears

    The entire jury was told in no uncertain terms by the judge that we were required to decide the case solely on the basis of the evidence presented in the trial and that under no circumstances were we permitted to Google anything concerned with the trial. I know that, except for that clear instruction, I would have done a little quick and dirty online research on labral tears. When I did that after the trial was over, I came to the conclusion that the truth was sort of halfway in between these two experts—labral tears often result from traumatic injuries but also occur from the slow degeneration of the hip.

    I hope you will remember from chapter 9, accepting testimony, including the legal testimony of expert witnesses, involves a two-step inference to the best explanation. In our case, the evidence would look something like the following:

    e1. What was said in the testimony

    e2. Context—sworn testimony in a civil trial

    e3. Relevant biography—the professional credentials of the expert


    t′0. The expert genuinely believes what he or she said in the testimony.


    t″0. The expert believes what he or she said because what he or she said is true.

    I can only speak for myself, but I would be willing to grant the absolute sincerity of every expert we listened to; t′0 was always my best explanation of what each witness had to say. Although I could imagine rival explanation t′1—he or she said it because he or she was paid to say it—being the most obvious, I never really felt this was what was going on.

    We know as a matter of simple logic, however, that t″0 cannot be the best explanation of what every expert testified to, since they explicitly contradicted each other in several instances. Labral tears can’t often be the result of insidious causes while at the same time almost never being the effect of them. For almost half of the expert witnesses, their sincere beliefs had to be mistaken. The key question, of course, was, Who was right and who was wrong?

    EXERCISES

    1. 1. When two events, A and B, are correlated (in time and space or statistically), what are the five possible causal relationships between A and B (one of these relationships is actually not a causal one in the strict sense)?
    2. 2. Use all the steps in the IBE recipe to assess the quality of evidence in the following causal argument.

      Obviously Sarah’s failure to attend the lectures caused her poor philosophy grade. She has had regular absences for the past month or so, and her grade has gone down from a B+ to a C- during that time period.

    QUIZ TWELVE

    Given what you know from the following online posting from Oregon Public Broadcasting, use all the steps in the IBE recipe to assess the quality of evidence for the claim that “Ms. Silva’s lung cancer was proximately and directly caused and its growth promoted by her exposures to the above contaminants from the Bullseye facility.”

    The complete article is available online at “Terminal Cancer Patient Sues Bullseye Glass in Portland.” (https://www.opb.org/news/series/portland-oregon-air-pollution-glass/oregon-portland-bullseye-glass-terminal-cancer-patient-sues/)7

    Notes

    1. Nate Silver, The Signal and the Noise (New York: Penguin, 2012), 187.

    2. Concise Oxford Dictionary, 6th ed. (1976), s.v. “correlation.”

    3. Andrew E. Dessler and Edward A. Parson, The Science and Politics of Global Climate Change (Cambridge: Cambridge University Press, 2007), 59.

    4. Dessler and Parson, Global Climate Change, 73.

    5. Al Gore, An Inconvenient Truth (Emmaus: Rodale, 2006), 67.

    6. Peter Lipton, Inference to the Best Explanation (London: Routledge, 1991), 32.

    7. Ryan Haas, “Terminal Cancer Patient Sues Bullseye Glass in Portland,” OPB, June 15, 2016, https://www.opb.org/news/series/portland-oregon-air-pollution-glass/oregon-portland-bullseye-glass-terminal-cancer-patient-sues/.


    This page titled 1.12: Correlations and Causes is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Jeffery L. Johnson (Portland State University Library) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.