13.1: Generalizing from a Sample

Last updated
Save as PDF

Page ID: 22027

Bradley H. Dowden
California State University Sacramento

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Scientists collect data not because they are in the business of gathering facts at random but because they hope to establish a generalization that goes beyond the individual facts. The scientist is in the business of sampling a part of nature and then looking for a pattern in the data that holds for nature as a whole. For example, a sociologist collects data about murders in order to draw a general conclusion, such as "Most murders involve guns used on acquaintances." A statistician would say that the scientist has sampled some cases of murder in order to draw a general conclusion about the whole population of murders. The terms sample and population are technical terms. The population need not be people; in our example it is the set of all murders. A sample is a subset of the population. The population is the set of things you are interested in generalizing about. The sample is examined to get a clue to what the whole population is like. We sample in order to discover a pattern that is likely to hold across the whole population.

The goal in drawing a generalization based on a sample is for the sample to be representative of the population, to be just like it. If your method of selecting the sample is likely to be unrepresentative then you are using a biased method and that will cause you to commit the fallacy of biased generalization. If you draw the conclusion that the vast majority of philosophers write about the meaning of life because the web pages of all the philosophers at your university do, then you’ve got a biased method of sampling philosophers’ writings. You should use a more diverse sampling method. Sample some of the philosophers at another university.

Whenever a generalization is produced by generalizing on a sample, the reasoning process (or the general conclusion itself) is said to be an inductive generalization. It is also called an induction by enumeration or an empirical generalization. Inductive generalizations are a kind of argument by analogy with the implicit assumption that the sample is analogous to the population. The more analogous or representative the sample, the stronger the inductive argument.

Generalizations may be statistical or non-statistical. The generalization, "Most murders involve guns," contains no statistics. Replacing the term most with the statistic 80 percent would transform it into a statistical generalization. The statement "80 percent of murders involve guns" is called a simple statistical claim because it has the form

x percent of the group G has characteristic C.

In the example, x = 80, G = murders, and C = involving guns.

A general claim, whether statistical or not, is called an inductive generalization only if it is obtained by a process of generalizing from a sample. If the statistical claim about murders were obtained by looking at police records, it would be an inductive generalization, but if it were deduced from a more general principle of social psychology, then it would not be an inductive generalization, although it would still be a generalization.

Exercise \(\PageIndex{1}\)

Is the generalization "Most emeralds are green" a statistical generalization? Is it an inductive generalization?

Answer: It is not statistical, but you cannot tell whether it is an inductive generalization just by looking. It all depends on where it came from. If it was the product of sampling, it's an inductive generalization. If not, then it's not an inductive generalization. Either way, however, it is a generalization.

Back from the grocery store with your three cans of tomato sauce for tonight's spaghetti dinner, you open the cans and notice that the sauce in two of the cans is spoiled. You generalize and say that two-thirds of all the cans of that brand of tomato sauce on the shelf in the store are bad. Here is the pattern of your inductive generalization:

x percent of sample S has characteristic C.
-------------------------------------------------------------
x percent of population P has characteristic C.

In this argument x = 66.7 (for two-thirds), P = all the tomato sauce cans of a particular brand from the shelf of the grocery store, S = three tomato sauce cans of that brand from the shelf of the grocery store, and C = spoiled. Alternatively, this is the pattern:

Sample S has characteristic C. So, population P has characteristic C.

where C is now not the property of being spoiled but instead is the property of being 66.7 percent spoiled. Either form is correct, but be sure you know what the C is.

The more the sample represents the population, the more likely the inductive generalization is to be correct. By a representative sample we mean a sample that is perfectly analogous to the whole population in regard to the characteristics that are being investigated. If a population of 888 jelly beans in a jar is 50 percent black and 50 percent white, a representative sample could be just two jelly beans, one black and one white. A method of sampling that is likely to produce a non-representative sample is a biased sampling method. A biased sample is a non-representative sample.

The fallacy of hasty generalization occurs whenever a generalization is made too quickly, on insufficient evidence. Technically, it occurs whenever an inductive generalization is made with a sample that is unlikely to be representative. For instance, suppose Jessica says that most Americans own an electric hair dryer because most of her friends do. This would be a hasty generalization, since Jessica's friends are unlikely to represent everybody when it comes to owning hair dryers. Her sampling method shows too much bias toward her friends.

Search

Text Color

Text Size

Margin Size

Font Type