Skip to main content
Humanities LibreTexts

13.1.1: Random Sample

  • Page ID
    22028
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Statisticians have discovered several techniques for avoiding bias. The first is to obtain a random sample. When you sample at random, you don't favor any one member of the population over another. For example, when sampling tomato sauce cans, you don't pick the first three cans you see.

    Definition

    A random sample is any sample obtained by using a random sampling method.

    Definition

    A random sampling method is taking a sample from a target population in such a way that any member of the population has an equal chance of being chosen.

    It is easy to recognize the value of obtaining a random sample, but achieving this goal can be difficult. If you want to poll students for their views on canceling the school's intercollegiate athletics program in the face of the latest school budget crisis, how do you give everybody an equal chance to be polled? Some students are less apt to want to talk with you when you walk up to them with your clipboard. If you ask all your questions in three spots on campus, you may not be giving an equal chance to students who are never at those spots. Then there are problems with the poll questions themselves. The way the questions are constructed might influence the answers you get, and so you won't be getting a random sample of students' views even if you do get a random sample of students.

    Purposely not using a random sample is perhaps the main way to lie with statistics. For one example, newspapers occasionally report that students in American middle schools and high schools are especially poor at math and science when compared to students in other countries. This surprising statistical generalization is probably based on a biased sample. It is quite true that those American students taking the international standardized tests of mathematics and science achievement do score worse than foreign students. The problem is that school administrators in other countries try too hard to do well on these tests. "In many countries, to look good is very good for international prestige. Some restrict the students taking the test to elite schools," says Harold Hodgkinson, the director of the Center for Demographic Policy in Washington and a former director of the National Institute of Education. For example, whereas the United States tests almost all of its students, Hong Kong does not. By the 12th grade, Hong Kong has eliminated all but the top 3 percent of its students from taking mathematics and thus from taking the standardized tests. In Japan, only 12 percent of their 12th grade students take any mathematics. Canada has especially good test results for the same reason. According to Hodgkinson, the United States doesn't look so bad when you take the above into account.

    The following passage describes a non-statistical generalization from a sample. Try to spot the conclusion, the population, the sample, and any bias.

    David went to the grocery store to get three cartons of strawberries. He briefly looked at the top layer of strawberries in each of the first three cartons in the strawberry section and noticed no fuzz on the berries. Confident that the berries in his three cartons were fuzz-free, he bought all three.

    David's conclusion was that the strawberries in his cartons were not fuzzy. His conclusion was about the population of all the strawberries in the three cartons. His sample was the top layer of strawberries in each one. David is a trusting soul, isn't he? Some grocers will hide all the bad berries on the bottom. Because shoppers are aware of this potential deception, they prefer their strawberries in see-through, webbed cartons. If David had wanted to be surer of his conclusion, he should have looked more carefully at the cartons and sampled equally among bottom, middle, and side berries, too. Looking at the top strawberries is better than looking at none, and looking randomly is better than looking non-randomly.

    When we sample instances of news reporting in order to draw a conclusion about the accuracy of news reports, we want our sample to be representative in regard to the characteristic of "containing a reporting error." When we sample voters about how they will vote in the next election, we want our sample to be representative in regard to the characteristic of "voting for the candidates.” Here is a formal definition of the goal, which is representativeness:

    Definition A sample S is a (perfectly) representative sample from a population P with respect to characteristic C if the percentage of S that are C is exactly equal to the percentage of P that are C.

    A sample S is less representative of P according to the degree to which the percentage of S that are C deviates from the percentage of P that are C.

    If you are about to do some sampling, what can you do to improve your chances of getting a representative sample? The answer is to follow these four procedures, if you can:

    1. Pick a random sample.

    2. Pick a large sample.

    3. Pick a diverse sample.

    4. Pick a stratified sample.

    We’ve already discussed how to obtain a random sample. After we explore the other three procedures, we’ll be in a better position to appreciate why it can sometimes be a mistake to pick a random sample.

    Exercise \(\PageIndex{1}\)

    Which is the strongest and which is the weakest argument? The four arguments differ only in their use of the words random and about.

    a. Twenty percent of a random sample of our university's students want library fines to be lower; so, 20 percent of our university's students want library fines to be lower.
    b. Twenty percent of a sample of our university's students want library fines to be lower; so, 20 percent of our university's students want library fines to be lower.
    c. Twenty percent of a random sample of our university's students want library fines to be lower; so, about 20 percent of our university's students want library fines to be lower.
    d. Twenty percent of a sample of our university's students want library fines to be lower; so, about 20 percent of our university's students want library fines to be lower.

    Answer

    Answer (c) is strongest and (b) is the weakest. The word about in the conclusions of (c) and (d) make their conclusions less precise and thus more likely to be true, all other things being equal. For this reason, arguments (c) and (d) are better than arguments (a) and (b). Within each of these pairs, the argument whose premises speak about a random sample is better than the one whose premises don't speak about this. So (c) is better than (d), and (b) is worse than (a). Answers (d) and (b) are worse because you lack information about whether the samples are random; however, not being told whether they are random does not permit you to conclude that they are not random.

    Exercise \(\PageIndex{1}\)

    For the following statistical report, (a) identify the sample, (b) identify the population, (c) discuss the quality of the sampling method, and (d) find other problems either with the study or with your knowledge of the study.

    Voluntary tests of 25,000 drivers throughout the United States showed that 25 percent of them use some drug while driving and that 85 percent use no drugs at all while driving. The conclusion was that 25 percent of U.S. drivers do use drugs while driving. A remarkable conclusion. The tests were taken at random times of the day at randomly selected freeway restaurants.

    Answer

    (a) The sample is 25,000 U.S. Drivers, (b) The population is U.S. drivers, (c) The sample size is large enough, but it is not random, for four reasons: (1) Drivers who do not stop at roadside restaurants did not have a chance of being sampled, (2) the study overemphasized freeway drivers rather than other drivers, (3) it overemphasized volunteers, (4) it overemphasized drivers who drive at 4 a.m. (d) The most obvious error in the survey, or in the report of the survey, is that 25 percent plus 85 percent is greater than 100 percent. Even though the survey said these percentages are approximate, the 110 percent is still too high. Also, the reader would like more information in order to assess the quality of the study. In particular, how did the study decide what counts as a drug, that is, how did it operationalize the concept of a drug? Are these drugs: Aspirin? Caffeine? Vitamins? Alcohol? Only illegal drugs? Did the questionnaire ask whether the driver had ever used drugs while driving, or had ever used drugs period? Did the pollster do the sampling on one day or over many days? Still, lack of information about the survey is not necessarily a sign of error in the survey itself.


    This page titled 13.1.1: Random Sample is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Bradley H. Dowden.