Measures of the Spread of Data

Learning Outcomes

  • Calculate and interpret z-scores

Comparing Values from Different Data Sets

The standard deviation is useful when comparing data values that come from different data sets. If the data sets have different means and standard deviations, then comparing the data values directly can be misleading.

  • For each data value, calculate how many standard deviations away from its mean the value is.
  • Use the formula: value = mean + (#ofSTDEVs)(standard deviation); solve for #ofSTDEVs.
  • #ofSTDEVs = [latex]\frac{\mathrm{value} - \mathrm{mean}}{\mathrm{standard \ deviation}}[/latex]
  • Compare the results of this calculation.

#ofSTDEVs is often called a “[latex]z[/latex]-score”; we can use the symbol [latex]z[/latex]. In symbols, the formulas become:

Sample [latex]x=\overline{x}+zs[/latex] [latex]z = \frac{x - \overline{x}}{s}[/latex]
Population [latex]x = μ + zσ[/latex] [latex]z = \frac{x - μ}{σ}[/latex]

Example

Two students, John and Eric, from different high schools, wanted to find out who had the highest GPA when compared to his school. Which student had the highest GPA when compared to his school?

Student GPA School Mean GPA School Standard Deviation
John [latex]2.85[/latex] [latex]3.0[/latex] [latex]0.7[/latex]
Eric [latex]77[/latex] [latex]80[/latex] [latex]10[/latex]

Try It

Two swimmers, Angie and Beth, from different teams, wanted to find out who had the fastest time for the 50 meter freestyle when compared to her team. Which swimmer had the fastest time when compared to her team?

Swimmer Time (seconds) Team Mean Time Team Standard Deviation
Angie [latex]26.2[/latex] [latex]27.2[/latex] [latex]0.8[/latex]
Beth [latex]27.3[/latex] [latex]30.1[/latex] [latex]1.4[/latex]

The following lists give a few facts that provide a little more insight into what the standard deviation tells us about the distribution of the data.

For ANY data set, no matter what the distribution of the data is:

  • At least [latex]75[/latex]% of the data is within two standard deviations of the mean.
  • At least [latex]89[/latex]% of the data is within three standard deviations of the mean.
  • At least [latex]95[/latex]% of the data is within [latex]4.5[/latex] standard deviations of the mean.
  • This is known as Chebyshev’s Rule.

For data having a distribution that is BELL-SHAPED and SYMMETRIC:

  • Approximately [latex]68[/latex]% of the data is within one standard deviation of the mean.
  • Approximately [latex]95[/latex]% of the data is within two standard deviations of the mean.
  • More than [latex]99[/latex]% of the data is within three standard deviations of the mean.
  • This is known as the Empirical Rule.
  • It is important to note that this rule only applies when the shape of the distribution of the data is bell-shaped and symmetric. We will learn more about this when studying the “Normal” or “Gaussian” probability distribution in later chapters.