What to Know About Interpreting the Mean and Median of a Dataset: 4C – 23

When examining the distribution of a quantitative variable using a histogram or a dotplot, we often find that the distribution follows a bell shape with a mound of observances in the middle of the distribution and even amounts of data falling to the right and left. But sometimes a distribution’s values are bunched up to one side or the other, with a few observations stretching way out to the other side. You may recall from What to Know About Applications of Histograms: 3D that there are specialized statistical terms we use for these different distribution shapes: skewness and symmetry. In this section, you’ll learn that there are certain ways the mean of the data relates to the median under these different shapes.

Using Skew to Describe Datasets

Recall that we say a quantitative variable has a right-skewed distribution or a positive skew if there is a “tail” of infrequent values on the right (upper) end of the distribution. We say a dataset has an approximately symmetric distribution if values are similarly distributed on either side of the mean/median. We say a dataset has a left-skewed distribution or a negative skew if there is a “tail” of infrequent values on the left (lower) end of the distribution.

skewed distributions

I’d like an animation here (super simple) of a data set that moves from right skew to symmetry to left skew with a slider students can manipulate. The labels would change over the slider: right skew / roughly symmetric / roughly symmetric / left skew.

In the next activity, you’ll need to calculate and interpret the mean and median in skewed distributions. Let’s get some practice with these skills using data collected around the T.V. show Friends.

Interpreting Mean and Median

Friends was a popular American television show that aired from 1994 to 2004. The show followed a group of six friends living in New York City and chronicled their relationships and day-to-day adventures. The show became known in popular culture for its comedy and for the closeness of its cast.[1]

The following table lists the number of U.S. viewers of each episode of the 10th and final season of Friends.[2]

Friends Final Season Viewers by episode
Episode Number Episode Title Air Date U.S. Viewers (Millions)
1 The One After Joey and Rachel Kiss 9/25/03 24.54
2 The One Where Ross Is Fine 10/2/03 22.38
3 The One with Ross’s Tan 10/9/03 21.87
4 The One with the Cake 10/23/03 18.77
5 The One Where Rachel’s Sister Babysits 10/30/03 19.37
6 The One with Ross’s Grant 11/6/03 20.38
7 The One with the Home Study 11/13/03 20.21
8 The One with the Late Thanksgiving 11/20/03 20.66
9 The One with the Birth Mother 1/8/04 25.49
10 The One Where Chandler Gets Caught 1/15/04 26.68
11 The One Where the Stripper Cries 2/5/04 24.91
12 The One with Phoebe’s Wedding 2/12/04 25.9
13 The One Where Joey Speaks French 2/19/04 24.27
14 The One with Princess Consuela 2/26/04 22.83
15 The One Where Estelle Dies 4/22/04 22.64
16 The One with Rachel’s Going Away Party 4/29/04 24.51
17 The Last One* 5/6/04 52.46
18 The Last One* 5/6/04 52.46
                 *Note: the final two episodes aired back-to-back on the same night               

We’ll use technology to analyze this dataset.

Go to the Describing and Exploring Quantitative Variables tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/.

Step 1) Select the Single Group tab.

Step 2) Locate the drop-down menu under Enter Data and select Your Own.

Step 3) Under Do you have, select Individual Observations.

Step 4) Under Name of Variable, type “U.S. Viewers (Millions).”

Step 5) Cut and paste or enter the data presented in the above table for U.S. Viewers (Millions).

question 1

Use the tool to calculate the median episode viewership for Season 10 of Friends. You can scroll in the observations entry box to verify that you pasted the data correctly.

question 2

Which of the following does the median tell you about the number of people who watched episodes of Friends during Season 10?

a) Half the episodes in Season 10 of Friends had more than 23.5 million viewers, and half the episodes had fewer than 23.5 million viewers.

b) The most common episode viewership was 23.5 million viewers per episode during Season 10.

c) If we took the total number of viewers for the whole season and split them equally among all 18 episodes, each episode would have about 23.5 million viewers.

question 3

Use the tool to calculate the mean episode viewership for Season 10 of Friends.

question 4

Which of the following does the mean tell you about the number of people watching episodes of Friends during Season 10?

a) Half the episodes in Season 10 of Friends had more than 26.1 million viewers, and half the episodes had fewer than 26.1 million viewers.

b) The most common episode viewership was 26.1 million viewers per episode during Season 10.

c) If we took the total number of viewers for the whole season and split them equally among all 18 episodes, each episode would have about 26.1 million viewers.

question 5

The mean number of viewers is _______ the median number of viewers.

a) greater than

b) less than

c) roughly equal to

For this question, use the following histogram of the Season 10 Friends viewership data.

A histogram labeled "US Viewers (Millions)" on the x-axis and "Count" on the y-axis. The x-axis is numbered in increments of five from 15 to 55 and the y-axis is numbered in increments of 1 from 0 to 4. For 18-19, the count is 1. For 19-20, the count is 1. For 20-21, the count is 3. For 21-22, the count is 1. For 22-23, the count is 3. For 24-25, the count is 4. For 25-26, the count is 2. For 26-27, the count is 1. For 52-53, the count is 2. For all other ranges, the count is 0.

question 6

Which of the following describes the distribution of the data?

a) Left-skewed

b) Symmetric

c) Right-skewed

question 7

Use what you see on the histogram to justify your answer to Question 5.

question 8

Which episodes have unusually high numbers of viewers?

question 9

The last two episodes of Friends aired in a row on the same night. Why do you think these episodes have such high numbers of viewers?

effects of skew on mean and median

[Perspective video — a 3-instructor video that shows how to think about the tail and the two outliers in the data above together with the fact that the mean is larger than the median to begin to understand that the mean tends to be pulled to the right of the median under a right skew.] 

Relating Mean and Median to the Skewness of a Dataset from a Histogram

For each of the plots of data below, choose the description that matches the shape of the data’s distribution, and then select the choice that gives the relationship between the mean and median for those data. Base your answers on the understanding you established in Questions 1 – 9 about the direction the mean was pulled in under the skewness in the dataset.

question 10

An unlabeled bar graph with seven bars. The bar on the far left is the highest. Moving to the right, each bar is progressively shorter than the last. In most places, this is by approximately the same amount, but it is a larger difference between the third and fourth bars.

Is the distribution Left-skewed, Symmetric, or Right-skewed? [drop down choices]
Is the mean greater than, less than, or roughly equal to the median? [drop down choices]

 

question 11

An unlabeled bar graph with seven bars. The bar on the far right is the highest. Moving from the left, each bar is progressively taller than the last. In most places, this is by approximately the same amount, but it is a larger difference between the fourth and fifth bars.

Is the distribution Left-skewed, Symmetric, or Right-skewed?

Is the mean greater than, less than, or roughly equal to the median?

[see question 10 as option for question/answer method]

question 12

An unlabeled bar graph. The bar is the center is the highest and going either direction away from it, the bars get shorter by equal increments.

Is the distribution Left-skewed, Symmetric, or Right-skewed?

Is the mean greater than, less than, or roughly equal to the median?

[see question 10 as option for question/answer method]

Resistant and Nonresistant Measures of Center

[Worked example – a 3-instructor video showing a symmetric dataset with the mean and median identical, then, skewing the distribution to show what happens to the mean while the median remains in place.]

question 13

Look back on your answers to Questions 10, 11, and 12. Which of mean or median appeared to be resistant to skew? That is, which of the two measures of center is not affected by the skewness of a graph?

a) The mean is resistant to skew. The median is sensitive to skew and/or the presence of outliers.

b) The median is resistant to skew. The mean is sensitive to skew and/or the presence of outliers.

Hopefully, you have noticed that when a distribution is symmetric, the mean and median occupy the same value. But under a skew, the mean is “pulled” in the direction of the outliers: greater than the median in the case of positive (right) skew, and less than the median in the case of negative (left) skew. It appears that the mean is affected by the presence of outliers while the median is not.

Looking ahead

Broadly speaking, we consider a value in a dataset to be an outlier if that value is unusual or extreme, given the other values in the dataset.

Suppose you have two groups of people:

      • Group 1 is made up of five professional basketball players, and Group 2 is made up of four professional basketball players and one kindergartener.
      • Dataset 1 contains the number of three-pointers each person in Group 1 can make in one minute. Dataset 2 contains the number of three-pointers each person in Group 2 can make in an hour.

question 14

Which dataset do you think is more likely to contain an outlier?

a) Group 1

b) Group 2

Summary

In this section, you’ve learned about skewed distributions vs. symmetric distributions and how skew affects the mean of a data distribution. You also got some practice calculating and interpreting the mean and median of a dataset. Let’s summarize where these skills showed up in the material.

      • In Question 1, you calculated the median of a dataset, and interpreted the median in Question 2.
      • In Question 3, you calculated the mean of a dataset, and interpreted the mean in Question 4.
      • In Question 5, you began to see how the mean and median relate in a distribution.
      • In Questions 6, and 10 – 13, you used statistical terms for skew and extreme values to describe the features of a dataset, and began to make connections between the mean and median under differently shaped distributions.
      • In Questions 7 -9, you interpreted the mean and median to make connections between them and the data distribution.
      • In Question 13, you identified which of the mean or median is resistant to skew.

Being able to interpret the mean and median with regard to the shape of a distribution and the presence of outliers will be essential skills to use when assessing claims made about data that rely on measures of center. If you feel comfortable with these skills, please move on to the activity!


  1. Encyclopedia Britannica. (n.d.). Friends. In Encyclopedia Britannica.com. https://www.britannica.com/topic/Friends
  2. Mock, T. (2020). A weekly data project aimed at the R ecosystem. TidyTuesday. https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-08/readme.md#friends_infocsv