Comparing Quantitative Distributions: Apply It 1

Learning Goals

Deepen your understanding and form connections within these skills:

  • Summarize a comparison of quantitative distributions across groups.

Decisions, Decisions, Decisions

A woman in a wheelchair holding a diploma in one hand and raising a graduation cap in the other.

Now that you’ve had a chance to practice using technology to create graphs and compare distributions of quantitative variables, let’s put it all together to see how histograms and dotplots can be used to compare distributions of a quantitative variable across groups.

To do so, we’ll consider median salary levels for recent college graduates. Before we get started, think for a moment about the reasons why a college student might choose a particular major. Some may choose a major based primarily on interests, and others choose a major based on its job prospects.

question 1

video placement

[Intro: “What variables may play a role in a student’s choice of major? Maybe what percent of students get a job in their major? Starting salary? Think about how we might be able to visualize data collected from students with different majors who answer these questions.  In this activity, we’ll see how stacked histograms and side-by-side dotplots can be used to compare distributions of a quantitative variable like median salary levels across several groups, in this case: college majors. You will be able to compare the center, shape, and spread of the quantitative variable across the groups using the graphical displays. Before we begin, let’s take a look at the data set together. [display image of the data set Salary Levels of College Majors as show in the page below] These are a few lines from the data set. You can see that each major category, like Business, contains several college majors, like Accounting, Actuarial science, Finance, and so on. And each of those majors has a median salary associated with it. What do you think the observational units are in this data set? That is, on what entities are we collecting the information about the major category and the median salary? It may be tempting to put yourself in this picture and think the entities are college graduates who have received their first job. But the observational unit is not a person. You’ll give your answer to that question below. “]

In this activity, you will explore the distribution of median salary levels of college majors across different major categories for recent college graduates in 2011. For each college major in the table, the median salary and major category is listed. A small part of the data is shown in a table below. For example, the Business major category includes college majors such as actuarial science, finance, and business economics. The major categories (Major_category) included in the complete data set are: Agriculture & Natural Resources, Arts, Biology & Life Sciences, Business, Communications & Journalism, Computers & Mathematics, Education, Engineering, Health, Humanities & Liberal Arts, Industrial Arts & Consumer Services, Law & Public Policy, Physical Sciences, Psychology & Social Work, and Social Science.[1]

The following table displays a subset (a few rows) of the data.

Salary Levels of College Majors
Major Major_category Median_salary
ACCOUNTING Business [latex]45000[/latex]
ACTUARIAL SCIENCE Business [latex]62000[/latex]
FINANCE Business [latex]47000[/latex]
GENERAL BUSINESS Business [latex]40000[/latex]
HOSPITALITY MANAGEMENT Business [latex]33000[/latex]
MARKETING AND MARKETING RESEARCH Business [latex]38000[/latex]
MISCELLANEOUS BUSINESS AND MEDICAL ADMINISTRATION Business [latex]40000[/latex]
OPERATIONS LOGISTICS AND E-COMMERCE Business [latex]50000[/latex]
AEROSPACE ENGINEERING Engineering [latex]60000[/latex]

recall

Recall from [Forming Connections: 1C] that we record information about variables of interest on each observational unit to form the data set.

Core skill:

question 2

question 3

question 4

Comparing Distributions Across Groups

Next, let’s go to the data set in the data analysis tool and create side-by-side dotplots for all the median salaries for each of the major categories. We’ll start with a comparison of median salaries for just Business, Engineering, and Education.

Go to the Describing and Exploring Quantitative Variables tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/. <– there is a problem with the horizontal axis labels on the dotplot. They are displaying in scientific notation — e.g., “2e+04” (2×10^4) instead of 20,000. 

Step 1) Click the Several Groups tab at the top of the page.

Step 2) Select data set “Recent Grads – Salary.”

Step 3) Select “Dotplot” and adjust the dot size appropriately.

This process will create a comparative dotplot of the median salaries for each major: Business, Engineering, and Education. Use the dotplot to approximate the typical median salary for majors in the Business major category, and the typical median salary for majors in the Engineering major category (we’ll look at the Education category a little later).

For each of the plots, examine just the dotplot (not the descriptive statistics) to answer the questions below.

recall

What does a dot on a dotplot represent? In particular, what does each dot on the Business dotplot represent?

Core skill:

question 5

question 6

question 7

question 8

Now let’s include Education majors in the comparison.

question 9

question 10

Now switch the view in the tool from dotplots to histograms.

question 11

question 12

video placement

 [insert a sub-summary: How are you doing so far? The types of questions you’ve been answering should feel familiar. We recently described distributions in histograms using shape, center, variability (spread) and outliers. The main difference is that now we are comparing the same variable for more than one group at a time. Displaying the groups side by side over the same scale makes it easy to make these quick comparisons. ]

Now let’s look at the other major categories. Select the data set “Recent Grads – Salary Many Majors.” This data set includes the median salaries for recent graduates in Arts, Biology & Life Sciences, Computers & Mathematics, and Humanities & Liberal Arts.

question 13

question 14

video placement

[wrap-up: “What did you chose based solely on median salary in the last question? Of course, there are many other considerations that go into making a choice of major. You do want to have an interest in your field of choice and feel that you could persist in your future career! But answering questions like that helps you to realize how nicely the side-by-side graphical displays enable comparisons of a quantitative variable across groups. In the next part of the course, we’ll cover summary statistics for quantitative data. We’ll learn about numerical measures for spread, mean and median, and how they relate in differently shaped distributions. We’ll also learn how to use standard deviation as a measure of spread.”]


  1. American Community Survey 2010-2012 Public Use Microdata Series. n.d.). College majors. Github. https://github.com/fivethirtyeight/data/tree/master/college-majors.