6.3: Probability and Belief - Bayesian Reasoning

Last updated
Save as PDF

Page ID: 24350

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The great Scottish philosopher David Hume, in his An Enquiry Concerning Human Understanding, wrote, “In our reasonings concerning matter of fact, there are all imaginable degrees of assurance, from the highest certainty to the lowest species of moral evidence. A wise man, therefore, proportions his belief to the evidence.” Hume is making a very important point about a kind of reasoning that we engage in every day: the adjustment of beliefs in light of evidence. We believe things with varying degrees of certainty, and as we make observations or learn new things that bear on those beliefs, we make adjustments to our beliefs, becoming more or less certain accordingly. Or, at least, that’s what we ought to do. Hume’s point is an important one because too often people do not adjust their beliefs when confronted with evidence—especially evidence against their cherished opinions. One needn’t look far to see people behaving in this way: the persistence and ubiquity of the beliefs, for example, that vaccines cause autism, or that global warming is a myth, despite overwhelming evidence to the contrary, are a testament to the widespread failure of people to proportion their beliefs to the evidence, to a general lack of “wisdom”, as Hume puts it.

Here we have a reasoning process—adjusting beliefs in light of evidence—which can be done well or badly. We need a way to distinguish good instances of this kind of reasoning from bad ones. We need a logic. As it happens, the tools for constructing such a logic are ready to hand: we can use the probability calculus to evaluate this kind of reasoning.

Our logic will be simple: it will be a formula providing an abstract model of perfectly rational belief-revision. The formula will tell us how to compute a conditional probability. It’s named after the 18th century English reverend who first formulated it: Thomas Bayes. It is called “Bayes’ Law” and reasoning according to its strictures is called “Bayesian reasoning”.

At this point, you will naturally be asking yourself something like this: “What on Earth does a theorem about probability have to do with adjusting beliefs based on evidence?” Excellent question; I’m glad you asked. As Hume mentioned in the quote we started with, our beliefs come with varying degrees of certainty. Here, for example, are three things I believe:

1 + 1 = 2;
the earth is approximately 93 million miles from the sun (on average);
I am related to Winston Churchill.

I’ve listed them in descending order: I’m most confident in (a), least confident in (c). I’m more confident in (a) than (b), since I can figure out that 1 + 1 = 2 on my own, whereas I have to rely on the testimony of others for the Earth-to-Sun distance. Still, that testimony gives me a much stronger belief than does the testimony that is the source of (c). My relation to Churchill is apparently through my maternal grandmother; the details are hazy. Still, she and everybody else in the family always said we were related to him, so I believe it.

“Fine,” you’re thinking, “but what does this have to do with probabilities?” Our degrees of belief in particular claims can vary between two extremes: complete doubt and absolute certainty. We could assign numbers to those states: complete doubt is 0; absolute certainty is 1. Probabilities also vary between 0 and 1! It’s natural to represent degrees of beliefs as probabilities. This is one of the philosophical interpretations of what probabilities really are. (There’s a whole literature on this. See this article for an overview: Hájek, Alan, "Interpretations of Probability", The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/...ity-interpret/>.) It’s the so-called “subjective” interpretation, since degrees of belief are subjective states of mind; we call these “personal probabilities”. Think of rolling a die. The probability that it will come up showing a one is 1/6. One way of understanding what that means is to say that, before the die was thrown, the degree to which you believed the proposition that the die will come up showing one—the amount of confidence you had in that claim—was 1/6. You would’ve had more confidence in the claim that it would come up showing an odd number—a degree of belief of 1/2.

We’re talking about the process of revising our beliefs when we’re confronted with evidence. In terms of probabilities, that means raising or lowering our personal probabilities as warranted by the evidence. Suppose, for example, that I was visiting my grandmother’s hometown and ran into a friend of hers from way back. In the course of the conversation, I mention how grandma was related to Churchill. “That’s funny,” says the friend, “your grandmother always told me she was related to Mussolini.” I’ve just received some evidence that bears on my belief that I’m related to Churchill. I never heard this Mussolini claim before. I’m starting to suspect that my grandmother had an odd eccentricity: she enjoyed telling people that she was related to famous leaders during World War II. (I wonder if she ever claimed to be related to Stalin. FDR? Let’s pray Hitler was never invoked. And Hirohito would strain credulity; my grandma was clearly not Japanese.) In response to this evidence, if I’m being rational, I would revise my belief that I’m related to Winston Churchill: I would lower my personal probability for that belief; I would believe it less strongly. If, on the other hand, my visit to my grandma’s hometown produced a different bit of evidence— let’s say a relative had done the relevant research and produced a family genealogy tracing the relation to Churchill—then I would revise my belief in the other direction, increasing my personal probability, believing it more strongly.

Since belief-revision in this sense just involves adjusting probabilities, our model for how it works is just a means of calculating the relevant probabilities. That’s why our logic can take the form of an equation. We want to know how strongly we should believe something, given some evidence about it. That’s a conditional probability. Let ‘\(H\)’ stand for a generic hypothesis—something we believe to some degree or other; let ‘E’ stand for some evidence we discover. What we want to know is how to calculate \(P(H | E)\) — the probability of \(H\) given \(E\), how strongly we should believe \(H\) in light of the discovery of \(E\).

Bayes’ Law tells us how to perform this calculation. Here’s one version of the equation (It’s easy to derive this theorem, starting with the general product rule. We know

\[\mathrm{P}(\mathrm{E} \bullet \mathrm{H})=\mathrm{P}(\mathrm{E}) \times \mathrm{P}(\mathrm{H} | \mathrm{E})\]

no matter what ‘E’ and ‘H’ stand for. A little algebraic manipulation gives us

\[P(H | E)=\dfrac{P(E \bullet H)}{P(E)}\]

It’s a truth of logic that the expression ‘E • H’ is equivalent to ‘H • E’, so we can replace ‘P(E • H)’ with ‘P(H • E)’ in the numerator. And again, by the general product rule, P(H • E) = P(H) x P(E | H)—our final numerator.):

\[\mathrm{P}(\mathrm{H} | \mathrm{E})=\frac{\mathrm{P}(\mathrm{H}) \times \mathrm{P}(\mathrm{E} | \mathrm{H})}{\mathrm{P}(\mathrm{E})}\]

This equation has some nice features. First of all, the presence of ‘P(H)’ in the numerator is intuitive. This is often referred to as the “prior probability” (or “prior” for short); it’s the degree to which the hypothesis was believed prior to the discovery of the evidence. It makes sense that this would be part of the calculation: how strongly I believe in something now ought to be (at least in part) a function of how strongly I used to believe it. Second, ‘P(E | H)’ is a useful item to have in the calculation, since it’s often a probability that can be known. Notice, this is the reverse of the conditional probability we’re trying to calculate: it’s the probability of the evidence, assuming that the hypothesis is true (it may not be, but we assume it is, as they say, “for the sake of argument”). Consider an example: as you may know, being sick in the morning can be a sign of pregnancy; if this were happening to you, the hypothesis you’d be entertaining would be that you’re pregnant, and the evidence would be vomiting in the morning. The conditional probability you’re interested in is P(pregnant | vomiting)—that is, the probability that you’re pregnant, given that you’ve been throwing up in the morning. Part of using Bayes’ Law to make this calculation involves the reverse of that conditional probability: P(vomiting | pregnant)—the probability that you’d be throwing up in the morning, assuming (for the sake of argument) that you are in fact pregnant. And that’s something we can just look up; studies have been done. It turns out that about 60% of women experience have morning sickness (to the point of throwing up) during the first trimester of pregnancy. There are lots of facts like this available. Did you know that a craving for ice is a potential sign of anemia? Apparently it is: 44% of anemia patients have the desire to eat ice. Similar examples are not hard to find. It’s worth noting, in addition, that sometimes the reverse probability in question — \(P(E | H)\) — is 1. In the case of a prediction made by a scientific hypothesis, this is so. Isaac Newton’s theory of universal gravitation, for example, predicts that objects dropped from the same height will take the same amount of time to reach the ground, regardless of their weights (provided that air resistance is not a factor). This prediction is just a mathematical result of the equation governing gravitational attraction. So if \(H\) is Newton’s theory and \(E\) is a bowling ball and a feather taking the same amount of time to fall, then \(P(E | H) = 1\); if Newton’s theory is true, then it’s a mathematical certainty that the evidence will be observed. (Provided you set things up carefully. Check out this video: https://www.youtube.com/watch?v=E43-CfukEgs.)

So this version of Bayes’ Law is attractive because of both probabilities in the numerator: \(P(H)\), the prior probability, is natural, since the adjusted degree of belief ought to depend on the prior degree of belief; and \(P(E | H)\) is useful, since it’s a probability that we can often know precisely. The formula is also nice in that it comports well with our intuitions about how belief-revision ought to work. It does this in three ways.

First, we know that implausible hypotheses are hard to get people to believe; as Carl Sagan once put it, “Extraordinary claims require extraordinary evidence.” Putting this in terms of personal probabilities, an implausible hypothesis—and extraordinary claim—is just one with a low prior: \(P(H)\) is a small fraction. Consider an example. In the immediate aftermath of the 2016 U.S. presidential election, some people claimed that the election was rigged (possibly by Russia) in favor of Donald Trump by way of a massive computer hacking scheme that manipulated the vote totals in key precincts. (Note: this is separate from the highly plausible claim that the Russians hacked e-mails from the Democratic National Committee and released them to the media before the election.) I had very little confidence in this hypothesis—I gave it an extremely low prior probability—for lots of reasons, but two in particular: (a) Voting machines in individual precincts are not networked together, so any hacking scheme would have to be carried out on a machine-by-machine basis across hundreds—if not thousands—of precincts, an operation of almost impossible complexity; (b) An organization with practically unlimited financial resources and the strongest possible motivation for uncovering such a scheme—namely, the Clinton campaign—looked at the data and concluded there was nothing fishy going on. But none of this stopped wishful-thinking Clinton-supporters from digging for evidence that in fact the fix had been in for Trump. (Here’s a representative rundown: http://www.dailykos.com/story/2016/1...mpaign-Please- challenge-the-vote-in-4-States-as-the-data-says-you-won-NC-P A-WI-FL) When people presented me with this kind of evidence—look at these suspiciously high turnout numbers from a handful of precincts in rural Wisconsin!—my degree of belief in the hypothesis—that the Russians had hacked the election—barely budged. This is proper; again, extraordinary claims require extraordinary evidence, and I wasn’t seeing it. This intuitive fact about how belief-revision is supposed to work is borne out by the equation for Bayes’ Law. Implausible hypotheses have a low prior—P(H) is a small fraction. It’s hard to increase our degree of belief in such propositions—P(H | E) doesn’t easily rise—simply because we’re multiplying by a low fraction in the numerator when calculating the new probability.

The math mirrors the actual mechanics of belief-revision in two more ways. Here’s a truism: the more strongly predictive piece of evidence is for a given hypothesis, the more it supports that hypothesis when we observe it. We saw above that women who are pregnant experience morning sickness about 60% of the time; also, patients suffering from anemia crave ice (for some reason) 44% of the time. In other words, throwing up in the morning is more strongly predictive of pregnancy than ice-craving is of anemia. Morning sickness would increase belief in the hypothesis of pregnancy more than ice-craving would increase belief in anemia. Again, this banal observation is borne out in the equation for Bayes’ Law. When we’re calculating how strongly we should believe in a hypothesis in light of evidence—P(H | E)—we always multiply in the numerator by the reverse conditional probability—P(E | H)—the probability that you’d observe the evidence, assuming the hypothesis is true. For pregnancy/sickness, this means multiplying by .6; for anemia/ice-craving, we multiply by .44. In the former case, we’re multiplying by a higher number, so our degree of belief increases more.

A third intuitive fact about belief-revision that our equation correctly captures is this: surprising evidence provides strong confirmation of a hypothesis. Consider the example of Albert Einstein’s general theory of relativity, which provided a new way of understanding gravity: the presence of massive objects in a particular region of space affects the geometry of space itself, causing it to be curved in that vicinity. Einstein’s theory has a number of surprising consequences, one of which is that because space is warped around massive objects, light will not travel in a straight line in those places. (Or, it is travelling a straight line, just through a space that is curved. Same thing.) In this example, H is Einstein’s general theory of relativity, and E is an observation of light following a curvy path. When Einstein first put forward his theory in 1915, it was met with incredulity by the scientific community, not least because of this astonishing prediction. Light bending? Crazy! And yet, four years later, Arthur Eddington, an English astronomer, devised and executed an experiment in which just such an effect was observed. He took pictures of stars in the night sky, then kept his camera trained on the same spot and took another picture during an eclipse of the sun (the only time the stars would also be visible during the day). The new picture showed the stars in slightly different positions, because during the eclipse, their light had to pass near the sun, whose mass caused their path to be deflected slightly, just as Einstein predicted. As soon as Eddington made his results public, newspapers around the world announced the confirmation of general relativity and Einstein became a star. As we said, surprising results provide strong confirmation; hardly anything could be more surprising that light bending. We can put this in terms of personal probabilities. Bending light was the evidence, so P(E) represents the degree of belief someone would have in the proposition that light will travel a curvy path. This was a very low number before Eddington’s experiments. When we use is to calculate how strongly we should believe in general relativity given the evidence that light in fact bends—P(H | E)—it’s in the denominator of our equation. Dividing by a very small fraction means multiplying by its reciprocal, which is a very large number. This makes P(H | E) go up dramatically. Again, the math mirrors actual reasoning practice.

So, our initial formulation of Bayes’ Law has a number of attractive features; it comports well with our intuitions about how belief-revision actually works. But it is not the version of Bayes’ Law that we will settle on the make actual calculations. Instead, we will use a version that replaces the denominator—P(E)—with something else. This is because that term is a bit tricky. It’s the prior probability of the evidence. That’s another subjective state—how strongly you believed the evidence would be observed prior to its actual observation, or something like that. Subjectivity isn’t a bad thing in this context; we’re trying to figure out how to adjust subjective states (degrees of belief), after all. But the more of it we can remove from the calculation, the more reliable our results. As we discussed, the subjective prior probability for the hypothesis in question—P(H)— belongs in our equation: how strongly we believe in something now ought to be a function of how strongly we used to believe in it. The other item in the numerator—P(E | H)—is most welcome, since it’s something we can often just look up—an objective fact. But P(E) is problematic. It makes sense in the case of light bending and general relativity. But consider the example where I run into my grandma’s old acquaintance and she tells me about her claims to be related to Mussolini. What was my prior for that? It’s not clear there even was one; the possibility probably never even occurred to me. I’d like to get rid of the present denominator and replace it with the kinds of terms I like—those in the numerator.

I can do this rather easily. To see how, it will be helpful to consider the fact that when we’re evaluating a hypothesis in light of some evidence, there are often alternative hypotheses that it’s competing with. Suppose I’ve got a funny looking rash on my skin; this is the evidence. I want to know what’s causing it. I may come up with a number of possible explanations. It’s winter, so maybe it’s just dry skin; that’s one hypothesis. Call it ‘H1’. Another possibility: we’ve just started using a new laundry detergent at my house; maybe I’m having a reaction. H2 = detergent. Maybe it’s more serious, though. I get on the Google and start searching. H3 = psoriasis (a kind of skin disease). Then my hypochondria gets out of control, and I get really scared: H4 = leprosy. That’s all I can think of, but it may not be any of those: H5 = some other cause.

I’ve got five possible explanations for my rash—five hypotheses I might believe in to some degree in light of the evidence. Notice that the list is exhaustive: since I added H5 (something else), one of the five hypotheses will explain the rash. Since this is the case, we can say with certainty that I have a rash and it’s caused by the cold, or I have a rash and it’s caused by the detergent, or I have a rash and it’s caused by psoriasis, or I have a rash and it’s caused by leprosy, or I have a rash and it’s caused by something else. Generally speaking, when a list of hypotheses is exhaustive of the possibilities, the following is a truth of logic:

\[E ≡ (E • H_1) ∨ (E • H_2) ∨ ... ∨ (E • H_n)\]

For each of the conjunctions, it doesn’t matter what order you put the conjuncts, so this true, too:

\[E ≡ (H_1 •E) ∨ (H_2 •E) ∨ ... ∨ (H_n •E)\]

Remember, we’re trying to replace P(E) in the denominator of our formula. Well, if E is equivalent to that long disjunction, then P(E) is equal to the probability of the disjunction:

\[P(E) = P[(H_1 •E) ∨ (H_2 •E) ∨ ... ∨ (H_n •E)]\]

We’re calculating a disjunctive probability. If we assume that the hypotheses are mutually exclusive (only one of them can be true), then we can use the Simple Addition Rule (I know. In the example, maybe it’s the cold weather and the new detergent causing my rash. Let’s set that possibility aside.):

\[P(E) = P(H_1 • E) + P(H_2 • E) + ... + P(H_n • E)\]

Each item in the sum is a conjunctive probability calculation, for which we can use the General Product Rule:

\[P(E) = P(H_1) \times P(E | H_1) + P(H_2) \times P(E | H_2) + ... + P(H_n) \times P(E | H_n)\]

And look what we have there: each item in the sum is now a product of exactly the two types of terms that I like—a prior probability for a hypothesis, and the reverse conditional probability of the evidence assuming the hypothesis is true (the thing I can often just look up). I didn’t like my old denominator, but it’s equivalent to something I love. So I’ll replace it. This is our final version of Bayes’ Law:

\[\mathrm{P}\left(\mathrm{H}_{\mathrm{k}} | \mathrm{E}\right)=\frac{\mathrm{P}\left(\mathrm{H}_{\mathrm{k}}\right) \times \mathrm{P}\left(\mathrm{E} | \mathrm{H}_{\mathrm{k}}\right)}{\mathrm{P}\left(\mathrm{H}_{1}\right) \times \mathrm{P}\left(\mathrm{E} | \mathrm{H}_{1}\right)+\mathrm{P}\left(\mathrm{H}_{2}\right) \times \mathrm{P}\left(\mathrm{E} | \mathrm{H}_{2}\right)+\ldots+\mathrm{P}\left(\mathrm{H}_{\mathrm{n}}\right) \times \mathrm{P}\left(\mathrm{E} | \mathrm{H}_{\mathrm{n}}\right)}\]

with \(1 \leq \mathrm{k} \leq \mathrm{n}\).

(We add the subscript ‘\(k\)’ to the hypothesis we’re entertaining, and stipulate the k is between 1 and n simply to ensure that the hypothesis in question is among the set of exhaustive, mutually exclusive possibilities \(H_1\), \(H_2\), ..., \(H_n\).)

Let’s see how this works in practice. Consider the following scenario:

Your mom does the grocery shopping at your house. She goes to two stores: Fairsley Foods and Gibbons’ Market. Gibbons’ in closer to home, so she goes there more often—80% of the time. Fairsley sometimes has great deals, though, so she drives the extra distance and shops there 20% of the time.

You can’t stand Fairsley. First of all, they’ve got these annoying commercials with the crazy owner shouting into the camera and acting like a fool. Second, you got lost in there once when you were a little kid and you’ve still got emotional scars. Finally, their produce section is terrible: in particular, their peaches—your favorite fruit—are often mealy and bland, practically inedible. In fact, you’re so obsessed with good peaches that you made a study of it, collecting samples over a period of time from both stores, tasting and recording your data. It turns out that peaches from Fairsley are bad 40% of the time, while those from Gibbons’ are only bad 20% of the time. (Peaches are a fickle fruit; you’ve got to expect some bad ones no matter how much care you take.)

Anyway, one fine day you walk into the kitchen and notice a heaping mound of peaches in the fruit basket; mom apparently just went shopping. Licking your lips, you grab a peach and take a bite. Ugh! Mealy, bland—horrible. “Stupid Fairsley,” you mutter as you spit out the fruit. Question: is your belief that the peach came from Fairsley rational? How strongly should you believe that it came from that store?

This is the kind of question Bayes’ Law can help us answer. It’s asking us about how strongly we should believe in something; that’s just calculating a (conditional) probability. We want to know how strongly we should believe that the peach came from Fairsley; that’s our hypothesis. Let’s call it ‘F’. These types of calculations are always of conditional probabilities: we want the probability of the hypothesis given the evidence. In this case, the evidence is that the peach was bad; let’s call that ‘B’. So the probability we want to calculate is P(F | B)—the probability that the peach came from Fairsley given that it’s bad.

At this point, we reference Bayes’ Law and plug things into the formula. In the numerator, we want the prior probability for our hypothesis, and the reverse conditional probability of the evidence assuming the hypothesis is true:

\[\mathrm{P}(\mathrm{F} | \mathrm{B})=-\frac{\mathrm{P}(\mathrm{F}) \times \mathrm{P}(\mathrm{B} | \mathrm{F})}{\test{ }}\]

In the denominator, we need a sum, with each term in the sum having exactly the same form as our numerator: a prior probability for a hypothesis multiplied by the reverse conditional probability. The sum has to have one such term for each of our possible hypotheses. In our scenario, there are only two: that the fruit came from Fairsley, or that it came from Gibbons’. Let’s call the second hypothesis ‘G’. Our calculation looks like this:

\[\mathrm{P}(\mathrm{F} | \mathrm{B})=\frac{\mathrm{P}(\mathrm{F}) \times \mathrm{P}(\mathrm{B} | \mathrm{F})}{\mathrm{P}(\mathrm{F}) \times \mathrm{P}(\mathrm{F} | \mathrm{B})+\mathrm{P}(\mathrm{G}) \mathrm{x} \mathrm{P}(\mathrm{F} | \mathrm{G})}\]

Now we just have to find concrete numbers for these various probabilities in our little story. First, P(F) is the prior probability for the peach coming from Fairsley—that is, the probability that you would’ve assigned to it coming from Fairsley prior to discovering the evidence that it was bad— before you took a bite. Well, we know mom’s shopping habits: 80% of the time she goes to Gibbons’; 20% of the time she goes to Fairsley. So a random piece of food—our peach, for example—has a 20% probability of coming from Fairsley. P(F) = .2. And for that matter, the peach has an 80% probability of coming from Gibbons’, so the prior probability for that hypothesis— P(G)—is .8. What about P(B | F)? That’s the conditional probability that a peach will be bad assuming it came from Fairsley. We know that! You did a systematic study and concluded that 40% of Fairsley’s peaches are bad; P(B | F) = .4. Moreover, your study showed that 20% of peaches from Gibbons’ were bad, so P(G | F) = .2. We can now plug in the numbers and do the calculation:

\[P(F | B)=\frac{0.2 \times 0.4}{(0.2 \times 0.4)+(0.8 \times 0.2)}=\frac{0.08}{0.08 + 0.16}=-\frac{1}{3}\]

As a matter of fact, the probability that the bad peach you tasted came from Fairsley—the conclusion to which you jumped as soon as you took a bite—is only 1/3. It’s twice as likely that the peach came from Gibbons’. Your belief is not rational. Despite the fact that Fairsley peaches are bad at twice the rate of Gibbons’, it’s far more likely that your peach came from Gibbons’, mainly because your mom does so much more of her shopping there.

So here we have an instance of Bayes’ Law performing the function of a logic—providing a method for distinguishing good from bad reasoning. Our little story, it turns out, depicted an instance of the latter, and Bayes’ Law showed that the reasoning was bad by providing a standard against which to measure it. Bayes’ Law, on this interpretation, is a model of perfectly rational belief-revision. Of course many real-life examples of that kind of reasoning can’t be subjected to the kind of rigorous analysis that the (made up) numbers in our scenario allowed. When we’re actually adjusting our beliefs in light of evidence, we often lack precise numbers; we don’t walk around with a calculator and an index card with Bayes’ Law on it, crunching the numbers every time we learn new things. Nevertheless, our actual practices ought to be informed by Bayesian principles; they ought to approximate the kind of rigorous process exemplified by the formula. We should keep in mind the need to be open to adjusting our prior convictions, the fact that alternative possibilities exist and ought to be taken into consideration, the significance of probability and uncertainty to our deliberations about what to believe and how strongly to believe it. Again, Hume: the wise person proportions belief according to the evidence.

Exercises

1. Women are twice as likely to suffer from anxiety disorders as men: 8% to 4%. They’re also more likely to attend college: these days, it’s about a 60/40 ratio of women to men. (Are these two phenomena related? That’s a question for another time.) If a random person is selected from my logic class, and that person suffers from an anxiety disorder, what’s the probability that it’s a woman?

2. Suppose I’m a volunteer worker at my local polling place. It’s pretty conservative where I live: 75% of voters are Republicans; only 25% are Democrats (third-party voters are so rare they can be ignored). And they’re pretty loyal: voters who normally favor Republicans only cross the aisle and vote Democrat 10% of the time; normally Democratic voters only switch sides 20% of the time. On Election Day 2016 (it’s Democrat Hillary Clinton vs. Republican Donald Trump for president), my curiosity gets the best of me, and I’ve gotta peek—so I reach into the pile of ballots (pretend it’s not an electronic scanning machine counting the ballots, but an old-fashioned box with paper ballots in it) and pick one at random. It’s a vote for Hillary. What’s the probability that it was cast by a (normally) Republican voter?

3. Among Wisconsin residents, 80% are Green Bay Packers fans, 10% are Chicago Bears fans, and 10% favor some other football team (we’re assuming every Wisconsinite has a favorite team). Packer fans aren’t afraid to show their spirit: 75% of them wear clothes featuring the team logo. Bears fans a quite reluctant to reveal their loyalties in such hostile territory, so only 25% of them are obnoxious enough to wear Bears clothes. Fans of other teams aren’t quite as scared: 50% of them wear their teams’ gear. I’ve got a neighbor who does not wear clothes with his favorite team’s logo. Suspicious (FIB?). What’s the probability he’s a Bears fan?

4. In my logic class, 20% of students are deadbeats: on exams, they just guess randomly. 60% of the students are pretty good, but unspectacular: they get correct answers 80% of the time. The remaining 20% of the students are geniuses: they get correct answers 100% of the time. I give a true/false exam. Afterwards, I pick one of the completed exams at random; the student got the first two questions correct. What’s the probability that it’s one of the deadbeats?