Forming Connections in Comparing Quantitative Distributions: 3E – 15

objectives for this activity

During this activity, you will:

Click on the skill above to jump to its location in this activity.

Decisions, Decisions, Decisions

A woman in a wheelchair holding a diploma in one hand and raising a graduation cap in the other.

Now that you’ve had a chance to practice using technology to create graphs and compare distributions of quantitative variables, let’s put it all together to see how histograms and dotplots can be used to compare distributions of a quantitative variable across groups.

To do so, we’ll consider median salary levels for recent college graduates. Before we get started, think for a moment about the reasons why a college student might choose a particular major. Some may choose a major based primarily on interests, and others choose a major based on its job prospects.

question 1

What is your major, or what major are you thinking about choosing? What do you think the job prospects are for your major?

video placement

[Intro: “What variables may play a role in a student’s choice of major? Maybe what percent of students get a job in their major? Starting salary? Think about how we might be able to visualize data collected from students with different majors who answer these questions.  In this activity, we’ll see how stacked histograms and side-by-side dotplots can be used to compare distributions of a quantitative variable like median salary levels across several groups, in this case: college majors. You will be able to compare the center, shape, and spread of the quantitative variable across the groups using the graphical displays. Before we begin, let’s take a look at the dataset together. [display image of the dataset Salary Levels of College Majors as show in the page below] These are a few lines from the dataset. You can see that each major category, like Business, contains several college majors, like Accounting, Actuarial science, Finance, and so on. And each of those majors has a median salary associated with it. What do you think the observational units are in this dataset? That is, on what entities are we collecting the information about the major category and the median salary? It may be tempting to put yourself in this picture and think the entities are college graduates who have received their first job. But the observational unit is not a person. You’ll give your answer to that question below. “]

In this activity, you will explore the distribution of median salary levels of college majors across different major categories for recent college graduates in 2011. For each college major in the table, the median salary and major category is listed. A small part of the data is shown in a table below. For example, the Business major category includes college majors such as actuarial science, finance, and business economics. The major categories (Major_category) included in the complete dataset are: Agriculture & Natural Resources, Arts, Biology & Life Sciences, Business, Communications & Journalism, Computers & Mathematics, Education, Engineering, Health, Humanities & Liberal Arts, Industrial Arts & Consumer Services, Law & Public Policy, Physical Sciences, Psychology & Social Work, and Social Science.[1]

The following table displays a subset (a few rows) of the data.

Salary Levels of College Majors
Major Major_category Median_salary
ACCOUNTING Business 45000
ACTUARIAL SCIENCE Business 62000
FINANCE Business 47000
GENERAL BUSINESS Business 40000
HOSPITALITY MANAGEMENT Business 33000
MARKETING AND MARKETING RESEARCH Business 38000
MISCELLANEOUS BUSINESS AND MEDICAL ADMINISTRATION Business 40000
OPERATIONS LOGISTICS AND E-COMMERCE Business 50000
AEROSPACE ENGINEERING Engineering 60000

recall

Recall from [Forming Connections: 1C] that we record information about variables of interest on each observational unit to form the dataset.

Core skill:

question 2

What are the observational units in the dataset above?

a) College students

b) College majors

c) Median salaries

d) College graduates

question 3

Which type of variable is median salary?

a) categorical
b) quantitative

question 4

Which type of variable is major category?

a) categorical
b) quantitative

Comparing Distributions Across Groups

Next, let’s go to the dataset in the data analysis tool and create side-by-side dotplots for all the median salaries for each of the major categories. We’ll start with a comparison of median salaries for just Business, Engineering, and Education.

Go to the Describing and Exploring Quantitative Variables tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/. <– there is a problem with the horizontal axis labels on the dotplot. They are displaying in scientific notation — e.g., “2e+04” (2×10^4) instead of 20,000. 

Step 1) Click the Several Groups tab at the top of the page.

Step 2) Select dataset “Recent Grads – Salary.”

Step 3) Select “Dotplot” and adjust the dot size appropriately.

This process will create a comparative dotplot of the median salaries for each major: Business, Engineering, and Education. Use the dotplot to approximate the typical median salary for majors in the Business major category, and the typical median salary for majors in the Engineering major category (we’ll look at the Education category a little later).

For each of the plots, examine just the dotplot (not the descriptive statistics) to answer the questions below.

recall

What does a dot on a dotplot represent? In particular, what does each dot on the Business dotplot represent?

Core skill:

question 5

Which dotplot (Business or Engineering) has a greater center?

question 6

Does either category (Business or Engineering) have any outliers in the distribution of median salaries? If so, identify the outlier(s).

question 7

Compare the shapes of just the distributions of median salaries for Business and Engineering. Which of these two major categories (Business or Engineering) has a distribution that is more symmetric?

question 8

Compare the spread in median salaries of the two distributions (Business and Engineering). Which major category has greater variability in median salaries?

Now let’s include Education majors in the comparison.

question 9

Which of the three major categories (Business, Engineering, or Education) has the least amount of spread in median salaries?

question 10

Compare the typical median salary of the three distributions of median salaries. Which of the three major categories (Business, Engineering, or Education) has the smallest typical value of median salary?

Now switch the view in the tool from dotplots to histograms.

question 11

What features of the distribution are easier to see with a dotplot?

question 12

What features of the distribution are easier to see with a histogram?

video placement

 [insert a sub-summary: How are you doing so far? The types of questions you’ve been answering should feel familiar. We recently described distributions in histograms using shape, center, variability (spread) and outliers. The main difference is that now we are comparing the same variable for more than one group at a time. Displaying the groups side by side over the same scale makes it easy to make these quick comparisons. ]

Now let’s look at the other major categories. Select the dataset “Recent Grads – Salary Many Majors.” This dataset includes the median salaries for recent graduates in Arts, Biology & Life Sciences, Computers & Mathematics, and Humanities & Liberal Arts.

question 13

Write a short paragraph comparing and contrasting the distribution of median salaries between the four major categories.

question 14

If you had to choose a major between these four major categories based solely on median salary, which one would you choose and why?

video placement

[wrap-up: “What did you chose based solely on median salary in the last question? Of course, there are many other considerations that go into making a choice of major. You do want to have an interest in your field of choice and feel that you could persist in your future career! But answering questions like that helps you to realize how nicely the side-by-side graphical displays enable comparisons of a quantitative variable across groups. In the next part of the course, we’ll cover summary statistics for quantitative data. We’ll learn about numerical measures for spread, mean and median, and how they relate in differently shaped distributions. We’ll also learn how to use standard deviation as a measure of spread.”]


  1. American Community Survey 2010-2012 Public Use Microdata Series. n.d.). College majors. Github. https://github.com/fivethirtyeight/data/tree/master/college-majors.