Skip to main content
Humanities LibreTexts

13.1.4: Stratified Samples

  • Page ID
    36885
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In addition to seeking a large, random, diverse sample, you can improve your chances of getting a representative sample by stratifying the sample. In the example in the Concept Check about taking the drug tests at random times, there was a mistake made because many more

    drivers are on the road at 5 p.m. than at 5 a.m. Random sampling on times would be biased in favor of the 5 a.m. drivers. To remove this bias, the sampling method should take advantage of this knowledge of who drives when by stratifying according to time of day. For example, if you know that 30 percent of drivers are on the road from 5 p.m. to 6 p.m. and 3% are on the road from 5 a.m. to 6 a.m., then make sure that 30 percent of the sampled drivers are randomly picked from 5 p.m. to 6 p.m. and only 3 percent from 5 a.m. to 6 a.m. Do the same for the other driving times if you know the percentages for those other times.

    Suppose you are planning a poll to learn how Ohio citizens will vote in the next presidential election. You can use your knowledge of politics to help pick the best sample. You already have specific political information that the race of a voter is apt to affect how he or she will vote. Suppose you also know that, even though Ohio citizens are 65 percent white and 30 percent black, the expected voters will be 70 percent white and 25 percent black.1 You can use all this information about the voting population to take a better sample by making sure that your random sample contains exactly 70 percent white voters and exactly 25 percent black voters. If your poll actually were to contain 73 percent white voters, you would be well advised to randomly throw away some of the white voters' responses until you get the number down to 70 percent. The resulting stratification on race will improve the chances that your sample is representative. Stratification on the voters' soft drink preference would not help, however.

    The definition of stratification uses the helpful concept of a variable. Roughly speaking, a variable is anything that comes in various types or amounts. There are different types of races, so race is a variable; there are different amounts of salaries, so salary is a variable; and so forth. Each type or amount of the variable is called a possible value of the variable. White and black are two values of the race variable. Suppose a population (say, of people) could be divided into different groups or strata, according to some variable characteristic (such as race). Suppose each group's members have the same value for that variable (for example, all the members of one group are black, all the members of another group are white, and so on). Suppose a sample is taken under the requirement that the percentage that has a given value (black) of the variable (race) must be the same as the known percentage of the value for the population as a whole. If so, then a stratified sample has been taken from that population, and the sample is said to be stratified on that variable.

    Stratification is a key to reducing sample size, thereby saving time and money. If you want to know how people are going to vote for the Republican candidate in the next presidential election, talking to only one randomly selected voter would obviously be too small a sample. However, getting a big enough sample is usually less of a problem than you might expect when you pay careful attention to stratification on groups that are likely to vote similarly. Most nonprofessionals believe that tens of thousands of people would need to be sampled. I asked my next-door neighbor how many he thought would be needed, and he said, "Oh, at least a hundred thousand." Surprisingly, 500 would be enough if the sample were stratified on race, income, employment type, political party, and other important variables. This 500 figure assumes the pollster need only be 95 percent sure that the results aren't off by more than 2 percent. If you can live with a greater margin of error than 2 percent and less confidence than 95%, then you can use a much smaller sample size.

    The first great triumph of stratified sampling came in 1936 when one unstratified poll using a sample size of 10,000,000 people predicted that President Roosevelt would not be re-elected. A poll by George Gallup using a small stratified poll of only 3,000 people correctly predicted that Roosevelt would be re-elected.

    The most important variables affecting voting are the voters' political party, race, sex, income, and age. If the pollster has no idea what these variables are that will influence the results of the voting, then the pollster cannot ensure the sample is diverse in regard to these variables, so a very large sample will be needed to have the same confidence in the results that could be had with a smaller stratified sample.

    Exercise \(\PageIndex{1}\)

    Your quality control engineer conducts a weekly inspection of your company's new beverage. He gathers a random sample of 100 bottles produced on Mondays or Tuesdays. Over several weeks, at most he finds one or two sampled bottles each week to be faulty. So you conclude that your manufacturing process is doing well on an average every week, since your goal was to have at least 98 percent of the beverage be OK.

    Suppose, however, that the quality control engineer knows that your plant produces an equal amount of the beverage on each weekday and that it produces beverages only on weekdays. Describe the best way for the quality control engineer to improve the sampling by paying attention to stratification.

    a. Sample one beverage from each weekday.
    b. Pick a larger and more random sample.
    c. Take an equal number of samples on Saturdays and Sundays as well.
    d. Make sure that 20 percent of the sample comes from each weekday.
    e. Sample more of the bottles that will be delivered to your most valued customers.

    Answer

    Answer (d). The suggestion in (b) would be good to do, but it has nothing to do with stratification.


    1 These numbers are not reliable.


    This page titled 13.1.4: Stratified Samples is shared under a not declared license and was authored, remixed, and/or curated by Bradley H. Dowden.

    • Was this article helpful?