Forming Connections in Interpreting the Mean and Median of a Dataset: 4C – 24

objectives for this activity

During this activity, you will:

Identify misleading claims made using means
Given characteristics of a distribution including skew and outliers, identify under which conditions it is appropriate to use the mean as a measure of center.

Click on a skill above to jump to its location in this activity.

There are two LOs identified in DC and present in the content:
1) Identify misleading claims made using means <——————Add this one and an h3 tag that jumps to it in content.
2) Suggest the most appropriate measure of center to use in different situations

Is It Worth It?

Consider this scenario. A college basketball player is skilled enough to make an NBA roster and is thinking about dropping out of college this year.

question 1

From a financial perspective, would you encourage this player to drop out of college? Explain.

In this activity, you’ll use a distribution of professional basketball salaries to see that medians are resistant to influence from skew and outliers, while means are not. Importantly, means, in certain circumstances, can be misleading.

recall

Before beginning this activity, take a moment to recall the meanings of the terms left-skewed, right-skewed, symmetric, and outlier. You’ll need to be able to use those terms to describe features of a dataset.

Core skill:

Define skew and outlier

video placement

[Intro: Starting from a sentence or two discussing Question 1, remind students that they have recently been working to calculate and interpret the mean and median of a dataset. That is, the median is the value that splits the data in half, with half the observations above the mean and half below, regardless of the presence of skew or outliers. The median is fixed. But the mean is not; it gets pulled to the left or right of the mean under the presence of skew or outliers. The mean is sensitive to extreme values. So when we see that the mean is higher than the median, we say that it has been “pulled to the right,” and we understand the quantitative variable is skewed right. Likewise, if the mean is smaller, we’ll say it’s been “pulled to the left,” and we understand the quantitative variable is skewed left. If the mean and median are similar, though, we understand that the distribution is symmetric. In this activity, we’ll use a distribution of professional basketball salaries to explore how skew arises in a quantitative variable and why we must be careful to consider all the characteristics of a quantitative variable’s distribution before deciding if the mean or median would be more responsible to use as a measure of a “typical” value. ]

Below is a dotplot of NBA salaries^[1] for Texas players in the 2017–2018 season:

question 2

Describe the shape of the distribution in the dotplot. Comment on any visible skew or outliers.

Hint

question 3

Using the dotplot, make an estimate of the “typical” salary. Explain the reasoning for your estimate.

Hint

Identify misleading claims made using means

In fact, the median salary among Texas NBA players was $1,577,320. The mean salary was $5,262,279. Use this information to complete Questions 4-6.

question 4

question 5

There are 61 players in this dataset. Fill in the table below with your best estimates of the percentage of players who have salaries above the mean and the percentage of salaries above the median? Your answers do not have to be exact. Just use your own reasoning to estimate these.

Percentage of Salaries Above the Mean	Percentage of Salaries Above the Median

Hint

question 6

Explain briefly, in your own words, why the answers for the mean and median are different in Question 5.

Hint

video placement

[Guidance: “Consider your answers to Questions 3 – 6. [voice over images of the dotplot with the vertical lines drawn] What did you consider to be a “typical” salary? What characteristic of this variable’s distribution caused the mean to be different from the median?”]

Now consider the following scenario. An NBA recruiter for the Houston Rockets approaches a promising college basketball player and says, “the typical salary among Texas NBA players is $5,262,279.”

question 7

Is the statement made by the recruiter misleading? Why or why not?

Hint

video placement

[insert a sub-summary here. “How did you your answer the question, “is the recruiter’s statement misleading?” Did you consider the mean to be a “typical” salary among these NBA players? What could the recruiter have said instead? That it is likely a player would make $5.3 million by joining the team? That is is possible for some highly skilled and talented players? Or would it have been less misleading for the recruiter to have emphasized the median salary of $1.58 million? If you were in the prospective player’s position, would you have asked to see the distribution to make your own assessment? Which value would you have used, mean or median, if you were in the recruiter’s position?”]

You’ve seen that the mean, under certain conditions, can be a misleading indicator of a “typical” observation value, such as the salary of a professional basketball player. Now try to apply this understanding to some other types of data collections.

Identify under which conditions it is appropriate to use the mean as a measure of center

Three situations are given below in which data is collected on a quantitative variable. For each, visualize what the distribution might look like and make predictions about the shape of the distribution (skewed or symmetric?), the relationship between the mean and median (will they be similar or will the mean be smaller or greater than the median?), and whether or not it would be appropriate to use the mean to represent a “typical” observation. Use what you learned about resistance in the previous section, What to Know About Interpreting the Mean and Median of a Dataset: 4C, to guide you.

Situation 1: Data are collected on incomes in New York City.

Hint

question 8

What do you think the shape of the dataset’s distribution will be?

a) The distribution will be skewed right.
b) The distribution will be roughly symmetric
c) The distribution will be skewed left.

question 9

Will the mean be higher or lower than the median, or will they be similar?

a) The mean will be higher than median.
b) The mean will be lower than median.
c) The mean and median will be similar.

question 10

Considering the shape of the data, will it be appropriate to use the mean as a measure of center (representing a “typical” data value)?

a) Yes, the mean will be appropriate as measure of center.
b) No, the mean will be misleading as a measure of center; use the median instead.

Situation 2: Data are collected on GPAs at a local college.

Hint

question 11

What do you think the shape of the dataset’s distribution will be?

a) The distribution will be skewed right.
b) The distribution will be roughly symmetric
c) The distribution will be skewed left.

question 12

Will the mean be higher or lower than the median, or will they be similar?

a) The mean will be higher than median.
b) The mean will be lower than median.
c) The mean and median will be similar.

question 13

Considering the shape of the data, will it be appropriate to use the mean as a measure of center (representing a “typical” data value)?

a) Yes, the mean will be appropriate as measure of center.
b) No, the mean will be misleading as a measure of center; use the median instead.

Situation 3: Data are collected on peoples’ body temperatures.

Hint

question 14

What do you think the shape of the dataset’s distribution will be?

a) The distribution will be skewed right.
b) The distribution will be roughly symmetric
c) The distribution will be skewed left.

question 15

Will the mean be higher or lower than the median, or will they be similar?

a) The mean will be higher than median.
b) The mean will be lower than median.
c) The mean and median will be similar.

question 16

Considering the shape of the data, will it be appropriate to use the mean as a measure of center (representing a “typical” data value)?

a) Yes, the mean will be appropriate as measure of center.
b) No, the mean will be misleading as a measure of center; use the median instead.

video placement

[Wrap-up: Provide a transition from these particular examples to larger situations in which a quantitative variable would tend to be skewed or symmetric: if the data would tend toward a bunched-up group of values but contain some extreme values, what would the shape of the distribution look like? If data were distributed on the graph “as though it had fallen through a funnel onto a plane” what would it look like? Then show and discuss the simulation at https://dcmathpathways.shinyapps.io/MeanvsMedian/ .Finally, show some distributions and ask viewers to predict the relationship between mean and median. ]

NBA player salary dataset (2017-2018). (2018) Kaggle. Retrieved from https://www.kaggle.com/koki25ando/salary ↵

Alpha Module 2: Exploring Quantitative Variables Using Graphical Displays