14.1: Conditional Probabilities

Last updated
Save as PDF

Page ID: 95137

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

As the world changes, probabilities change too. The probability of drawing an ace from a full deck of cards is 4/52. But if you draw two aces and don’t replace them, the probability of drawing an ace changes. We say that the conditional probability of drawing an ace, given that two aces have been removed, is 2/50.

The probability of something being the case given that something else is the case is called a conditional probability. We express the conditional probability of A on B by writing Pr(A|B). We read this as ‘the probability of A given B’. In the example above, we are interested in the probability of drawing an ace given that two aces have already been drawn.

Much learning involves conditionalization. As we acquire new information, our assessments of probabilities change. You always thought Wilbur was very honest, but now you learn that he stole someone’s wallet and then lied about it. This leads you to reassess your belief that he has probably been honest on other occasions. You conditionalize on the new information about Wilbur, updating your views about how probable things are in light of the new evidence.

Example 1: Your friend asks you to pick a card, any card, from a full deck. How likely is it that you drew a king? Now your friend looks as the card and declares that it’s a face card. This new information changes your estimate of the probability that you picked a king. You are now concerned with the probability that you drew a king, given that you drew a face card.

Example 2: The probability of getting lung cancer (C) is higher for smokers (S) than for nonsmokers. In our new notation, this means that Pr(C|S) is greater than Pr(C|~S).

Example 3: You are about to roll a fair die. The probability that you will roll a four is 1/6. You roll too hard and it tumbles off the table where you can’t see it, but Wilbur looks and announces that you rolled an even number. This thins the set of relevant outcomes by eliminating the three odd numbers. Figure 14.1.1 depicts the possibilities before and after Wilbur’s announcement. Before the announcement, the probability of rolling a four was 1/6. But once you thin out the relevant outcomes (by conditionalization), there are only three possibilities left, and only one way out of those three of rolling a four. When we restrict our attention in this way, now focusing only on the even numbers, we are said to conditionalize on the claim that the number is even.

Screenshot (80).png — Figure \(\PageIndex{1}\): Thinning the Relevant Outcomes

Characterization of Conditional Probability

The next rule gives the definition for conditional probabilities.

Rule 7. (conditional probability): The probability of A given B is the probability of the conjunction of A & B, divided by the probability of B.

Pr(A|B) = Pr (A & B) / Pr (B)

In Rule 7, we must also require that the probability of B is not zero (because division by zero is undefined).

The idea behind Rule 7 is that conditional probabilities change the set of relevant outcomes. When your friend tells you that you selected a face card, the set of relevant possibilities shrinks from 52 (it might be any of the cards in the deck) down to 12 (we now know that it is one of the twelve face cards).

We put A & B in the numerator, because we have now restricted the range of relevant cases to those covered by B. This means that the only relevant part of the region for A is the part that overlaps B, which is just the part where the conjunction A & B is true. So, in terms our diagrams, Pr(A|B) is the amount of B occupied by A.

And we put Pr(B) in the denominator because we want to restrict the range of relevant possibilities to those in which B is true. This is just what it means to talk about the probability of A given B. It may not be obvious that these numbers do the desired job, though, so we’ll work through an example to see exactly how things work.

How the Numbers Work

Suppose there are 100 students in your English class. There are 50 men (M), and 20 of them are Texans (T). We can use these probabilities and Rule 7 to determine the probability of someone being a Texan given that they are male, i.e., Pr(T|M). We have:

Pr(T & M) = 20 / 100 (the probability—or proportion—of people in the class who are male and Texans).

Pr(M) = 50 / 100 (the probability—or proportion—of males in the class).

We then plug these numbers into the formula given by Rule 7 on the left to get the actual values at the right (Figure 14.1.2)

Screenshot (81).png — Figure \(\PageIndex{2}\): Conditionalization Trims out a New Unit

So, the probability of someone in the class being a Texan if they are male is 20/100 x 100/50 = 20/50 (the two 100s cancel) 2/5 = .4.

What the Numerator Does

We disregard everyone who is not male (some of whom may, but need not, be Texans). Figure 14.1.2 represents this by cutting out the circle of Males. We are then only interested in the percentage of Texans among males, which is given by the probability of someone in the class being both Texan and Male. We represent this as Pr(T & M). It’s just the overlap between the Texans and Males.

What the Denominator Does

M only had half of the probability before, but once we focus on Males, once we conditionalize on this, trimming away everything else, the probability of M should become 1. So, we need to increase the probability of M from 1/2 (what it was before) to 1 (what it is once we confine attention to males). Dividing by a fraction yields the same result as inverting and multiplying by it. So, things work out because dividing by 50/100 is the same as multiply by 100/50, i.e., it’s the same as multiplying by 2. This ensures that we can treat M as now having the entire unit of probability (once we conditionalize on M).

In terms of mud, when we shear off everything outside M we must also throw away all the mud that was originally outside M. We then think of M as the new total area, and so we now view the amount of mud on it as one unit. Another way to see that M should now have a probability of 1 once we conditionalize on M is to note that Pr(M|M) = 1.

Pr(A|B) = Pr(A & B) / Pr(B)

the less probable B was before we conditionalized, the more we have to multiple Pr(A & B) to inflate the new probability of B up to 1. If the probability of B was 1/2 we divide by 1/2, which has the effect of multiplying by 2. If the probability of B was 1/5 we divide by 1/5, which has the effect of multiplying by 5. Here 1/5 x 5/1 gets us back to 1 unit of probability. In short, division by the old Pr(B) makes the new (post conditionalization) Pr(B) = 1.

In general, Pr(A|B) is not equal to Pr(B|A). The probability that someone is a male given that he plays for the New York Yankees is 1. But the probability that someone is a Yankee given that he is male is very small. We will see in a later chapter that Pr(A|B) = Pr(B|A) just in case Pr(A) = Pr(B). More importantly, we will see that confusing these two probabilities is responsible for a good deal of bad reasoning.

The General Conjunction Rule

By rearranging the terms in Rule 7, we obtain a general rule for conjunctions (divide both sides of the equality in Rule 7 by Pr(B)).

Rule 8. (conjunctions): The probability of the conjunction A & B, where the conjuncts need not be independent, is the probability of A multiplied by the probability of B given A.

Pr(A & B) = Pr(A) x Pr(B|A)

This rule is more general than Rule 5. It applies to all conjunctions, whether their conjuncts are independent or not. Unlike Rule 7, we will often use Rule 8 in our calculations.

Example: You draw two cards from a full deck, and you don’t replace the first card before drawing the second. The probability of getting a king on both of your draws is the probability of getting a king on the first draw times the probability of getting a king on the second draw, given that you already got a king on the first. In symbols: Pr(K₁ & K₂) = Pr(K₁) x Pr(K₂|K₁).

Now that we have conditional probabilities, we can define independence quite precisely. A and B are independent just in case the truth (or occurrence) of one has no influence or effect on the occurrence of the other.

Independence: A and B are independent just in case Pr(A) = Pr(A|B). Whether B occurs (or is true) or not has no effect on whether A occurs (or is true). If we learn that B is true (or false), that should do nothing to change our beliefs about the probability of A.

Rule 5 tells us that if A and B are independent, then Pr(A & B) = Pr(A) x Pr(B). This is just a special case of the more general Rule 8. It works because if A and B are independent, Pr(B) = Pr(B|A). So instead of writing Pr(B|A) in the special case (independent conjuncts) covered by Rule 5, we can get by with the simpler Pr(B).

Rule 8. tells us that Pr(A & B) = Pr(A) x Pr(B|A). But we know that the order of the conjuncts in a conjunction doesn’t affect the meaning of the conjunction: A & B says the same thing as B & A. So Pr(A & B) = Pr(B & A). This means that Pr(A & B) = Pr(B & A) = Pr(B) x Pr(A|B). The value for this will be the same as the value we get when we use Rule 8, though in some cases one approach will be easier to calculate and in other cases the other one will be.

Search

Text Color

Text Size

Margin Size

Font Type