Corequisite Support Activity for Five Number Summary in Box Plots and Datasets: 4D – 25

What you’ll need to know

In this support activity you’ll become familiar with the following:

The upcoming section of material and following activity will introduce a new graph for displaying quantitative data called a boxplot. The image below shows a boxplot labeled with the five-number-summary and interquartile range. We’ll explore boxplots in detail soon. The focus of this support activity is to help you become familiar with the characteristics of a boxplot: minimum and maximum values, median, first quartile, third quartile, and interquartile range.

A general horizontal boxplot displaying the following features from left to right: lower outliers, minimum, Q1, median, Q3, maximum, and upper outliers. The Interquartile Range (IQR) is shown at the top of the boxplot.

A boxplot is a graphical visualization of a quantitative variable that shows median, spread, skew, and outliers by illustrating a set of numbers called the five-number summary. In the next section of the course material, you will need to be able to relate the features of a boxplot to the dataset it comes from. In the following activity, you will need to be able to interpret and compare boxplots. Begin to familiarize yourself with boxplots in this corequisite support activity during which you’ll build up an understanding of the parts of the five-number summary and how to determine whether a data value is “unusual enough” to qualify as an outlier.

To introduce this new quantitative graph, we’ll use a dataset that contains the gross domestic product per capita for the 10 most populous countries.

GDP of the World’s Most Populous Countries

a semispherical map of the world

The following table lists data for the 10 most populous countries in 2018, and it includes each country’s population rank (we can see that China had the largest population in 2018, India had the second largest population, and so on) and each country’s gross domestic product (GDP) per capita. ^[1] A country’s GDP is the total monetary value of everything produced in that country over the year. The GDP per capita is a country’s GDP divided by its population.

Country	Population Rank	GDP per Capita
China	1	$9,771
India	2	$2,016
United States	3	$62,641
Indonesia	4	$3,894
Pakistan	5	$1,473
Brazil	6	$8,921
Nigeria	7	$2,028
Bangladesh	8	$1,698
Russia	9	$11,289
Japan	10	$39,287

question 1

Are there any observations that seem unusual compared to the other entries?

Hint

Did you identify one or two observations in Question 1 as being unusual? It can be difficult sometimes to decide if a particular value really is an outlier. Keep this thought in mind as you work through this activity. We’ll come back to this question again at the end.

Identify minimum and maximum data values

question 2

List the 10 most populous countries’ GDP per capita in 2018 from least to greatest.

Hint

question 3

What are the minimum and maximum values of this dataset?

Hint

Calculate and interpret the median

question 4

In 2018, what was the median GDP per capita among the 10 most populous countries?

Hint

question 5

Interpret the median.

Hint

Calculate the first quartile (Q1)

question 6

What values lie below the median? List them in order from least to greatest.

Hint

question 7

What is the median of the list you generated in Question 6 above?

Hint

We call this value the first quartile, and we sometimes denote it as Q1. It is the median of the values that lie below the median for the whole dataset. It is also equal to the 25th percentile.

Calculate the third quartile (Q3)

Determine the values that lie above the median for the whole dataset.

question 8

List the values above the median in order from least to greatest.

Hint

question 9

What is the median of the list above?

Hint

We call this value the third quartile, and we sometimes denote it as Q3. It is the median of the values that lie above the median for the whole dataset. It is also equal to the 75th percentile.

List the the five-number summary

question 10

We have identified the first quartile (Q1) and the third quartile (Q3). What do you think the second and fourth quartiles are? Why do you think we call these values “quartiles?”

Hint

question 11

The collection of the minimum, first quartile, median, third quartile, and maximum form the five-number summary of the data. Record the five-number summary for this dataset.

Hint

Calculate the interquartile range (IQR)

The interquartile range (sometimes denoted as IQR) is the quantity Q3 – Q1.

question 12

About how much of the data lie between Q3 and Q1?

Hint

question 13

What is the IQR for the GDP per capita of the 10 most populous countries in 2018?

Hint

Determine if a value is an outlier

Some outliers seem quite simple to spot (such as the GDP per capita of the United States), but others are harder to identify (such as Japan’s GDP per capita). If you were to make up a rule for testing whether a value is “unusual enough” to be called an outlier, what would it be? Use your rule on the value of Japan’s GDP per capita to decide whether or not it is an outlier. What did you decide?

question 14

Return to your answer to Question 1. How can you decide whether an entry is unusual or not? In other words, which entries did you decide are outliers in this dataset? Explain your reasoning.

Hint

In the next section, you’ll learn about an accepted method of determining whether a data value “qualifies” to be an outlier in a skewed distribution like this one. It’s called the IQR method and states that if a data value is located more than 1.5 times the IQR to the left of Q1 or to the right of Q3, then that value is “unusual enough” to be called an outlier. It’s important to note that, while this method can be used to identify unusual observations in skewed distributions like this one, other methods, which you’ll learn about in an upcoming section, are well suited for symmetrical distributions. In certain applications, it may be desirable to distinguish between “mild outliers” (using 1.5 times IQR) and “extreme outliers” (using 3 times IQR). We can really set the threshold for “unusual” values as far away as we’d like, depending on the application. But 1.5 times IQR is commonly used, so we’ll use it here and in the upcoming section.

Let’s apply the method to Japan’s GDP per capita in the interactive example below.

Interactive example

Recall that Japan’s GDP per capita from the dataset is $39,287. We would like to know how unusual this value really is in comparison to the rest of the data values. We’ll use the IQR method to make the determination.

Under this method, a data value is considered an outlier if it lies 1.5 [latex]\times[/latex] (IQR) above Q3 or below Q1. Since 39,287 is greater than the median, we’ll test it to see if it exceeds Q3 + 1.5 [latex]\times[/latex] (IQR). (If it were a very small number, we’d test to see if it were lower than Q1 – 1.5 [latex]\times[/latex] (IQR).)

Recall, for this dataset: Q3 = 11,289 and IQR = 9,273.

Step 1) Calculate 1.5 [latex]\times[/latex] (IQR).

Step 2) Calculate Q3 + 1.5 [latex]\times[/latex] (IQR)

Step 3) Compare Japan’s GDP per capita. If it exceeds Q3 + 1.5 [latex]\times[/latex] (IQR), then it is an outlier.

What did you discover? Is Japan’s GDP per capita an outlier in the dataset?

Show Answer

In this support activity, you’ve seen how to calculate the five-number summary and interquartile range (IQR) by hand for a dataset, and you’ve learned about a method to mathematically determine if an observation is an outlier. These make up the features of a box-plot. It’s time to move on to the next section where you’ll use these skills as you explore boxplots for visualizing the distribution of a quantitative variable.

Bevins, V. (2020). The Jakarta method: Washington’s anticommunist crusade and the mass murder program that shaped our world. PublicAffairs. ↵

Alpha Module 2: Exploring Quantitative Variables Using Graphical Displays