Skip to main content
Humanities LibreTexts

15.1: Descriptive Statistics

  • Page ID
    95148
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In New York City each summer, some number of cats fall from open windows in high-rise apartment buildings. On August 22, 1989, The New York Times reported the startling fact that cats that fell further seemed to have a better chance of survival. When they checked with the Animal Medical Center, the paper found that 129 cats that had fallen were brought in for treatment. Seventeen of these were put to sleep by their owners (in most cases because they could not afford treatment, rather than because the cat was likely to die). Eight of the remaining 115 cats died. But the surprising thing is that the cats that fell the furthest seemed to have the highest probability of living. Only one of the 22 cats that fell from above 7 stories died, and there was but a single fracture among the 13 that fell more than 9 stories. What could account for this?

    We will begin with several basic concepts from descriptive statistics that are important for reasoning. We won’t be concerned with formulas for calculating them, but you will encounter these concepts outside of this class, so you need to learn what they mean.

    A population is a group of things (e.g., Florida voters, households, married couples, fruit flies). And a sample is a subgroup of the population. For example, we might conduct a poll of 1,000 college graduates and ask them to report their income. These 1,000 people would constitute our sample; the parent population would be all college graduates. In the next section, we will see that information about samples can be used to draw inferences about entire populations, but in this section, we will be concerned with description rather than inference.

    A parameter is some numerical characteristic of an entire population (e.g., average GPA of all freshmen, average income of all college graduates; as we will see in a moment, it could also be a measure of dispersion or a measure of correlation. For example, an average income in the population of adult U. S. citizens of $18,525 is a parameter.

    By contrast, a statistic is a corresponding numerical characteristic of a sample (e.g., the average GPA of college students contacted in a recent survey). One way to remember what goes with what is that the two pwords—population and parameter—go together, and the two s-words— sample and statistic—go together.

    Features of Samples

    Properties or characteristics that come in degrees are called variables. For example, the age, weight, and income of people in the United States are variables. Each of them can take on many different values: Wilbur weights 165 pounds, Martha 103, and Sam 321. We can also think of more abstract things as variables; for example, probability is a variable that can take any of the infinitely many values from 0 to 1. In the simplest case, a variable might only have two values; for example, if you are taking a class pass/fail (such variables are important; they are called dichotomous variables).

    When the members of a population or sample are measured with respect to some variable like their score on the ACT test, the resulting set of all the numerical scores is a distribution of values for that variable. Thus, the set of all the ACTs scores from a given year is a distribution of values for the variable of that year’s ACT scores. Similarly, the set of all the scores on the first exam in this class is a distribution of the variable of scores on the first exam.

    It can be difficult to see what a large distribution of values really amounts to; we get lost in a sea of numbers. So, it is often useful to condense the information in the distribution into simpler numbers. The most basic ways of doing this is to calculate measures of central tendency. There are three common measures of this sort.

    Measures of Central Tendency

    The mean is what you already know under the name average. To find the mean of a distribution, you add all the numbers in the distribution together and divide by the number of items in the distribution. When the class gets an exam back, the first thing many people want to know is the average (i.e., mean) score on the test; this tells them how well the class did collectively. The mean is the most important measure of central tendency, but it has the weakness that it is affected by just a few extreme values.

    The median of a distribution is the number such that half the numbers in the distribution are less than it and half are greater. The median of the numbers 1, 2, 3, 4, 5 is 3, because two numbers are less than it and two are greater. What if no single number splits a distribution into two equal parts, as occurs in the distribution 1, 2, 3, 4? Here we will take the number halfway between 2 and 3, i.e., 2.5 as the median; clearly half the cases fall below it and half fall above.

    The mode of a distribution is the value that occurs most frequently in it. The mode of 1, 2, 3, 2, 4 is 2, because 2 occurs twice and all on the other numbers occur only once. A distribution may have more than one mode. For example, the distribution 1, 2, 3, 2, 4, 4, 2, 4 has two modes: 2 and 4.

    What are the mean, median, and mode of the following set of numbers: 179, 193, 99, 311, 194, 194, 179?

    1. Mean: Add the seven numbers together, which yields 1349. Then divide this by 7, which (rounding off) comes to 192.7.
    2. Median: The median is easiest to see if we list these numbers in order of magnitude, as 99, 179, 179, 193, 194, 194, 311. Here we find that 193 splits the distribution into two equal parts, so it is the median.
    3. Mode: This distribution has two numbers which occur twice, 179 and 194. So, it has two modes, 179 and 194.

    Measures of Dispersal

    Measures of central tendency are often useful. For example, it will help you understand how you did on an exam to know the class average (the mean). And it will be easier to choose a major if you know the average number of people with that major who found jobs soon after they graduated. But measures of central tendency don’t tell us much about the relative position of any given item or about the extent to which values are spread out around a mean.

    For example, the distributions

    • 7, 8, 8, 9 and
    • 1, 3, 11, 17

    have the same mean, namely 8. But the items in the first distribution are clustered much more tightly around the mean than those of the second. If the values in a distribution are quite spread out, then the mean may not be very informative. Measures of dispersal provide additional information; they tell us how spread out (“dispersed”) the values in a distribution are.

    The range is the distance between the largest and the smallest value in the distribution. In the distribution: 179, 193, 99, 311, 193, 194, 179, the range is the distance between 311 and 99, i.e., 311- 99 = 212.

    Percentiles

    Often a numerical value or score doesn’t tell you much in and of itself. If you learn that you scored a 685 on the math component of the ACT or that you got an 86 on the first exam in this course, that doesn’t really tell you how well you did. What you want to know is how well you did in comparison with those who took the same exam. Percentiles provide information about such relative positions. The percentile rank of a value or score is the percentage of values that fall below it. For example, if Sandra got an 86% on the first exam and 75% of the class got lower grades, than Sandra’s score has a percentile rank of 75%. And her score, 86, falls at, or is, the 75th percentile.

    Percentiles provide relative positions in percentage terms. For example, suppose that 100 people take the first exam and that Wilbur gets a 79%. If 60 (= 60%) students scored lower than 79, then Wilbur’s score of 79 falls at the 60th percentile.

    Quartiles work like the median. The first quartile is the value such that 1/4 of the values are less than it, the second quartile the value such that half of the values are less than it (this number is also the median), the third quartile the value such that 3/4 of the values are less than it. The first quartile falls at the 25th percentile. The standard deviation is a very important measure of dispersal. We can’t calculate it without a formula (which we won’t worry about here), but the intuitive idea is that the standard deviation measures the average distance of all the values from the mean. It tells us how far, on average, the values deviate from the mean or average value in the distribution. The greater the standard deviation, the more spread out the values are. Hence, although the distributions 7, 8, 8, 9 and 1, 3, 11, 17 have the same mean, namely 8, the first will have a lower standard deviation than the second.

    Exercises

    1. Find the mean, median, mode, and range of each of the following distributions (which we may think of as measurements of people’s weight in pounds):
      1. 176, 132, 221, 187, 132, 194, 190
      2. 176, 193, 99.5, 321, 112, 200, 120

    Here is a list of people in a class, their score on their final, and the percentage of people who scored below them. In each case, give the percentile where their grade falls.

    1. Olivia got a 97%, 95% scored lower.
    2. Erik got a 46%, 5% scored lower.
    3. Wilbur got an 85%, 80% scored lower.
    4. Which distribution will have the greater standard deviation?
      • 10, 11, 14, 9
      • 6, 9.5, 10, 18.6

    This page titled 15.1: Descriptive Statistics is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Jason Southworth & Chris Swoyer via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.