Skip to main content
Humanities LibreTexts

17.4: Base-Rates

  • Page ID
    95173
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    A group of men in Belleville, Kansas consists of 70 engineers and 30 lawyers. Suppose that we select Dick at random from the group. The following is true of Dick:

    Marcos is a 30-year-old man, married, no children. He has high ability and high motivation and promises to be quite successful in his field. He is well liked by his colleagues.

    Based on this: Is Marcos is more likely to be an engineer, a lawyer, or are these equally likely? What’s relevant to deciding?

    Researchers Kahneman and Tversky told subjects that they were dealing with a pool of a hundred people, 70 of whom were engineers and 30 of whom were lawyers. If they were simply asked to estimate the likelihood that some person, Marcos, selected at random from this group was an engineer, most said 70%. Another group was given the above description of Marcos. The important thing about this description is that it is an equally accurate description of a lawyer or an engineer (and most subjects in pretests thought so).

    The information in the description could be of no help in estimating whether someone is a lawyer or engineer, so we should ignore it and (in the absence of any other relevant information) simply go by the base rates. This means that we should conclude that the probability that Marcos is a lawyer is .7. In the absence of the irrelevant description, people did just this. But when they were given the irrelevant description, they concluded that the probability that Marcos was a lawyer was .5 (fifty/fifty). The irrelevant information led them to disregard base rates; they simply threw away information that is clearly relevant.

    This is an instance of the so-called dilution effect, the capacity of irrelevant information to dilute or weaken relevant information. Sometimes relevant information is called diagnostic, because it can help us make accurate predictions or diagnoses, and irrelevant information is said to be non-diagnostic. Using these terms, the dilution effect is the tendency for non-diagnostic information (like the description of Marcos) to dilute diagnostic information (like the percentage of engineers vs. that of lawyers).

    In this case, the base rate of engineers is 70% and the base rate of lawyers is 30%. This information is highly relevant to the questions here. But descriptive information of marginal relevance can lead us to completely ignore highly relevant information about base rates. Remember Mike (7.3), the six two, muscular, aggressive college athlete? Why is it more likely that Mike is a banker than a pro football player? Because there are many more bankers than pro football players. The base rate for bankers is higher.

    The base rate for a characteristic (like being a banker, or being killed by a pig) is the frequency or proportion of things in the general population which have that characteristic. It is sometimes called the initial or prior probability of that trait. For example, if one out of every twelve hundred people are bankers, the base rate for bankers is 1/1,200. Often, we don’t know the exact base rate for something, but we still know that the base rate for one group is higher, or lower, than the base rate for another. We don’t know the base rate for farmers or for chimney sweeps in the United States, but there are clearly far more of the former than the latter.

    When we acquire information about someone or something (like our description of Mike) we need to integrate it with the old, prior information about base rates (many more people are bankers than pro football players). In the next section, we will see that in many cases this can be done quite precisely. But the important point now is that although both pieces of information are important, in cases where the size of the relevant group (or the difference in size between two relevant groups, e.g., bankers and pro football players) is large, the old, base-rate information can be much more important. Unfortunately, we often let the new information completely overshadow the prior information about base rates.

    The base-rate fallacy occurs when we neglect base-rates in forming our judgments about the probabilities of things. We commit this fallacy if we judge it more likely that Mike is a pro football player than a banker (thus ignoring the fact that there are far more bankers than pro football players). Overreliance on the representativeness heuristic often leads us to underestimate the importance of base rate information. In the present case, Mike resembles our picture of the typical pro football player, so we forget what we know about base rates and conclude that he probably is one.

    Pigs vs. Sharks

    We conclude this section with a quick examination of your chances of being killed by a shark and a pig, discussed earlier. The implication of the passage was that live pigs, not infected pork that people eat, kill substantially more people than sharks do. The only way to know for sure whether this is true is to check the statistics (if anyone keeps statistics on death by pig). That said, it seems probable that you are more likely to die from a pig than a shark, because the base rate for contact with pigs is much higher than the base rate for contact with sharks. (Of course, individual risks may vary. If you’re a shark hunter, who comes into contacts with sharks far more often than you do pigs, your individual risk of death by shark would be higher). Most contacts are uneventful, but once in every several thousand, or hundred thousand, contacts commonsense tells us that something will go wrong. So, you probably are more likely to be killed by a pig, and it is much more likely that you will be injured by one. But a movie named Snout just wouldn’t have the cachet of a movie named Jaws.

    Confusions about Inverse Probabilities

    We know that a conditional probability like Pr(red| heart) may be quite different from its inverse, here Pr(heart|red). The first probability is 1 whereas the second is 1/2. But in many cases, it is easy to confuse a probability and its inverse. It is true that the probability of someone fitting Mike’s profile if they are a professional football player is reasonably high. By contrast, the probability of being a professional football player if they fit the profile is low (because the base rate of pro footballers is low, lower than the base rate of non-pros who fit the profile). Here it is easy to confuse a probability with its inverse. We will return to this problem in more detail in a later chapter.

    Safeguards

    1. Don’t be misled by highly detailed descriptions, profiles, or scenarios. The specificity makes them easier to imagine, but it also makes them less likely.
    2. Use base-rate information whenever possible. You often do not need any precise knowledge of base rates. Just knowing that there are a lot more of one sort of thing (e.g., bankers) than another (e.g., professional football players) is often enough.
    3. Be careful to distinguish conditional probabilities from their inverses.

    This page titled 17.4: Base-Rates is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Jason Southworth & Chris Swoyer via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.