What you’ll need to know
In this support activity you’ll become familiar with the following:
-
- Identify the minimum and maximum values of a dataset.
- Calculate and interpret the median.
- Calculate the first quartile (Q1).
- Calculate the third quartile (Q3).
- List the five-number-summary for a quantitative variable.
- Calculate the interquartile range (IQR) for a quantitative variable.
- Determine whether or not a value is an outlier.
The upcoming section of material and following activity will introduce a new graph for displaying quantitative data called a boxplot. The image below shows a boxplot labeled with the five-number-summary and interquartile range. We’ll explore boxplots in detail soon. The focus of this support activity is to help you become familiar with the characteristics of a boxplot: minimum and maximum values, median, first quartile, third quartile, and interquartile range.

A boxplot is a graphical visualization of a quantitative variable that shows median, spread, skew, and outliers by illustrating a set of numbers called the five-number summary. In the next section of the course material, you will need to be able to relate the features of a boxplot to the dataset it comes from. In the following activity, you will need to be able to interpret and compare boxplots. Begin to familiarize yourself with boxplots in this corequisite support activity during which you’ll build up an understanding of the parts of the five-number summary and how to determine whether a data value is “unusual enough” to qualify as an outlier.
To introduce this new quantitative graph, we’ll use a dataset that contains the gross domestic product per capita for the 10 most populous countries.
GDP of the World’s Most Populous Countries

The following table lists data for the 10 most populous countries in 2018, and it includes each country’s population rank (we can see that China had the largest population in 2018, India had the second largest population, and so on) and each country’s gross domestic product (GDP) per capita. [1] A country’s GDP is the total monetary value of everything produced in that country over the year. The GDP per capita is a country’s GDP divided by its population.
| Country | Population Rank | GDP per Capita |
| China | 1 | $9,771 |
| India | 2 | $2,016 |
| United States | 3 | $62,641 |
| Indonesia | 4 | $3,894 |
| Pakistan | 5 | $1,473 |
| Brazil | 6 | $8,921 |
| Nigeria | 7 | $2,028 |
| Bangladesh | 8 | $1,698 |
| Russia | 9 | $11,289 |
| Japan | 10 | $39,287 |
question 1
Are there any observations that seem unusual compared to the other entries?
Did you identify one or two observations in Question 1 as being unusual? It can be difficult sometimes to decide if a particular value really is an outlier. Keep this thought in mind as you work through this activity. We’ll come back to this question again at the end.
Identify minimum and maximum data values
question 2
List the 10 most populous countries’ GDP per capita in 2018 from least to greatest.
question 3
What are the minimum and maximum values of this dataset?
Calculate and interpret the median
question 4
In 2018, what was the median GDP per capita among the 10 most populous countries?
question 5
Interpret the median.
Calculate the first quartile (Q1)
question 6
What values lie below the median? List them in order from least to greatest.
question 7
What is the median of the list you generated in Question 6 above?
We call this value the first quartile, and we sometimes denote it as Q1. It is the median of the values that lie below the median for the whole dataset. It is also equal to the 25th percentile.
Calculate the third quartile (Q3)
Determine the values that lie above the median for the whole dataset.
question 8
List the values above the median in order from least to greatest.
question 9
What is the median of the list above?
We call this value the third quartile, and we sometimes denote it as Q3. It is the median of the values that lie above the median for the whole dataset. It is also equal to the 75th percentile.
List the the five-number summary
question 10
We have identified the first quartile (Q1) and the third quartile (Q3). What do you think the second and fourth quartiles are? Why do you think we call these values “quartiles?”
question 11
The collection of the minimum, first quartile, median, third quartile, and maximum form the five-number summary of the data. Record the five-number summary for this dataset.
Calculate the interquartile range (IQR)
The interquartile range (sometimes denoted as IQR) is the quantity Q3 – Q1.
question 12
About how much of the data lie between Q3 and Q1?
question 13
What is the IQR for the GDP per capita of the 10 most populous countries in 2018?
Determine if a value is an outlier
Some outliers seem quite simple to spot (such as the GDP per capita of the United States), but others are harder to identify (such as Japan’s GDP per capita). If you were to make up a rule for testing whether a value is “unusual enough” to be called an outlier, what would it be? Use your rule on the value of Japan’s GDP per capita to decide whether or not it is an outlier. What did you decide?
question 14
Return to your answer to Question 1. How can you decide whether an entry is unusual or not? In other words, which entries did you decide are outliers in this dataset? Explain your reasoning.
In the next section, you’ll learn about an accepted method of determining whether a data value “qualifies” to be an outlier in a skewed distribution like this one. It’s called the IQR method and states that if a data value is located more than 1.5 times the IQR to the left of Q1 or to the right of Q3, then that value is “unusual enough” to be called an outlier. It’s important to note that, while this method can be used to identify unusual observations in skewed distributions like this one, other methods, which you’ll learn about in an upcoming section, are well suited for symmetrical distributions. In certain applications, it may be desirable to distinguish between “mild outliers” (using 1.5 times IQR) and “extreme outliers” (using 3 times IQR). We can really set the threshold for “unusual” values as far away as we’d like, depending on the application. But 1.5 times IQR is commonly used, so we’ll use it here and in the upcoming section.
Let’s apply the method to Japan’s GDP per capita in the interactive example below.
Interactive example
Recall that Japan’s GDP per capita from the dataset is $39,287. We would like to know how unusual this value really is in comparison to the rest of the data values. We’ll use the IQR method to make the determination.
Under this method, a data value is considered an outlier if it lies 1.5 [latex]\times[/latex] (IQR) above Q3 or below Q1. Since 39,287 is greater than the median, we’ll test it to see if it exceeds Q3 + 1.5 [latex]\times[/latex] (IQR). (If it were a very small number, we’d test to see if it were lower than Q1 – 1.5 [latex]\times[/latex] (IQR).)
Recall, for this dataset: Q3 = 11,289 and IQR = 9,273.
Step 1) Calculate 1.5 [latex]\times[/latex] (IQR).
Step 2) Calculate Q3 + 1.5 [latex]\times[/latex] (IQR)
Step 3) Compare Japan’s GDP per capita. If it exceeds Q3 + 1.5 [latex]\times[/latex] (IQR), then it is an outlier.
What did you discover? Is Japan’s GDP per capita an outlier in the dataset?
In this support activity, you’ve seen how to calculate the five-number summary and interquartile range (IQR) by hand for a dataset, and you’ve learned about a method to mathematically determine if an observation is an outlier. These make up the features of a box-plot. It’s time to move on to the next section where you’ll use these skills as you explore boxplots for visualizing the distribution of a quantitative variable.
- Bevins, V. (2020). The Jakarta method: Washington’s anticommunist crusade and the mass murder program that shaped our world. PublicAffairs. ↵