Appropriate Measures of Center
In the previous example, we saw how the mean was not an accurate representation of the typical salary for a Texas NBA player due to the existence of outliers. Now, let’s take a look at other situations to determine whether it would be more appropriate to use the mean or median to describe a typical observation.
Consider the distribution of three different sets of data:
- Income in New York City
- GPA at a local college
- Body temperature
Situation 1: Data are collected on the income of residents in New York City.
Try It 8
Situation 2: Data are collected on the GPAs of students enrolled at a local college.
try it 9
Situation 3: Data are collected on peoples’ body temperatures.
Try it 10
These examples illustrate some general guidelines for choosing numerical summaries:
- Use the mean and the standard deviation as measures of center and spread only for distributions that are reasonably symmetric with a central peak. When outliers or skew are present, the mean and standard deviation are not a good choice.
- Use the median, the range, and the IQR for all other cases.
Both of these examples also highlight another important principle: Always plot the data.
We need to see the distribution to help us determine the shape of the distribution. By looking at the shape, we can determine which measures of center and spread best describe the data.
video placement
[Wrap-up: Provide a transition from these particular examples to larger situations in which a quantitative variable would tend to be skewed or symmetric: if the data would tend toward a bunched-up group of values but contain some extreme values, what would the shape of the distribution look like? If data were distributed on the graph “as though it had fallen through a funnel onto a plane” what would it look like? Then show and discuss the simulation at https://dcmathpathways.shinyapps.io/MeanvsMedian/ .Finally, show some distributions and ask viewers to predict the relationship between mean and median. ]