## 2.3 Percentiles, Box Plots, and 5-Number Summary

The common measures of location are quartiles and percentiles.

Quartiles are special percentiles.

• The first quartile, Q1, is the same as the 25th percentile.
25% of data will be less than 25th percentile; 75% of data will be more than 25th percentile.
• The second quartile, Q2, is the same as the 50th percentile / median.
50% of data will be less than 50th percentile; 50% of data will be more than 50th percentile.
• The third quartile, Q3, is the same as the 75th percentile.
75% of data will be less than 75th percentile; 25% of data will be more than 75th percentile.

The general form is :

### n % of data will be less than nth percentile and (100% – n%) of data will be more than nth percentile.

The following video gives an introduction to Median, Quartiles and Interquartile Range, the topic you will learn in this section.

To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score.

Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively. One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor. For example, suppose Duke accepts SAT scores at or above the 75th percentile. That translates into a score of at least 1220. To be admitted as Duke student, your SAT score has to be at least 1220.

Percentiles are mostly used with very large populations. Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.

The median is a number that measures the “center” of the data. You can think of the median as the “middle value,” but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger.

## Example 1

Consider the following data.

1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; 1

Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data.

 To find the quartiles, Find the median or second quartile (Q2) first. Find the first quartile (Q1), the median of the lower half of the data. Find the third quartile (Q3), the median, of the upper half of the data.

## Example 2

To get the idea, consider the same data set:

1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5

## Interpreting Percentiles, Quartiles, and Median

A percentile indicates the relative standing of a data value when data are sorted into numerical order from smallest to largest. Percentages of data values are less than or equal to the pth percentile. For example, 15% of data values are less than or equal to the 15th percentile.

• Low percentiles always correspond to lower data values.
• High percentiles always correspond to higher data values.

A percentile may or may not correspond to a value judgment about whether it is “good” or “bad.” The interpretation of whether a certain percentile is “good” or “bad” depends on the context of the situation to which the data applies. In some situations, a low percentile would be considered “good;” in other contexts a high percentile might be considered “good”. In many situations, there is no value judgment that applies.

Understanding how to interpret percentiles properly is important not only when describing data, but also when calculating probabilities in later chapters of this text.

### Guideline

When writing the interpretation of a percentile in the context of the given data, the sentence should contain the following information.

• information about the context of the situation being considered
• the data value (value of the variable) that represents the percentile
• the percent of individuals or items with data values below the percentile
• the percent of individuals or items with data values above the percentile.

## Example 3

On a timed math test, the first quartile for time it took to finish the exam was 35 minutes.
Interpret the first quartile in the context of this situation.

### Try It

For the 100-meter dash, the third quartile for times for finishing the race was 11.5 seconds.
Interpret the third quartile in the context of the situation.

## Example 4

On a 20 question math test, the 70th percentile for number of correct answers was 16.
Interpret the 70th percentile in the context of this situation.

### Try It

On a 60 point written assignment, the 80th percentile for the number of points earned was 49.
Interpret the 80th percentile in the context of this situation.

## Example 5

At a community college, it was found that the 30th percentile of credit units that students are enrolled for is 7 units.
Interpret the 30th percentile in the context of this situation.

### Try It

During a season, the 40th percentile for points scored per player in a game is eight. Interpret the 40th percentile in the context of this situation.

## Boxplots

Box plots (also called box-and-whisker plots or box-whisker plots) give a good graphical image of the concentration of the data. They also show how far the extreme values are from most of the data. A box plot is constructed from five values: the minimum value, the first quartile (Q1), the median (Q2), the third quartile (Q3), and the maximum value. We use these values to compare how close other data values are to them.

To construct a box plot, use a horizontal or vertical number line and a rectangular box. The smallest and largest data values label the endpoints of the axis. The first quartile marks one end of the box and the third quartile marks the other end of the box. Approximately the middle 50 percent of the data fall inside the box. The “whiskers” extend from the ends of the box to the smallest and largest data values. The median or second quartile can be between the first and third quartiles, or it can be one, or the other, or both. The box plot gives a good, quick picture of the data.

#### Note:

You may encounter box-and-whisker plots that have dots marking outlier values. In those cases, the whiskers are not extending to the minimum and maximum values.

Consider, again, this dataset.

1 1 2 2 4 6 6.8 7.2 8 8.3 9 10 10 11.5

The first quartile, Q1 is 2.
The median, Q2 is 7.
The third quartile, Q3 is 9.
The smallest value is 1.
The largest value is 11.5.

The following image shows the constructed box plot.

The two whiskers extend from the first quartile to the smallest value and from the third quartile to the largest value. The median is shown with a dashed line.

#### Note:

It is important to start a box plot with a scaled number line. Otherwise the box plot may not be useful.

## Example 6

The following data are the heights of 40 students in a statistics class.

59 60 61 62 62 63 63 64 64 64 65 65 65 65 65 65 65 65 65 66 66 67 67 68 68 69 70 70 70 70 70 71 71 72 72 73 74 74 75 77

Construct a box plot with the following properties; the calculator instructions for the minimum and maximum values as well as the quartiles follow the example.

• Minimum value = 59
• Maximum value = 77
• Q1, First quartile = 64.5
• Q2, Second quartile or median= 66
• Q3, Third quartile = 70

1. Each quarter has approximately 25% of the data.
2. The spreads of the four quarters are 64.5 – 59 = 5.5 (first quarter), 66 – 64.5 = 1.5 (second quarter), 70 – 66 = 4 (third quarter), and 77 – 70 = 7 (fourth quarter). So, the second quarter has the smallest spread and the fourth quarter has the largest spread.
3. Range = maximum value – the minimum value = 77 – 59 = 18
4. Interquartile Range: IQR = Q3 – Q1 = 70 – 64.5 = 5.5.
5. The interval 59–65 has more than 25% of the data so it has more data in it than the interval 66 through 70 which has 25% of the data.
6. The middle 50% (middle half) of the data has a range of 5.5 inches.

### Try It

The following data are the number of pages in 40 books on a shelf.

136 140 178 190 205 215 217 218 232 234 240 255 270 275 290 301 303 315 317 318 326 333 343 349 360 369 377 388 391 392 398 400 402 405 408 422 429 450 475 512

Construct a box plot using a graphing calculator, and state the interquartile range.

This video explains what descriptive statistics are needed to create a box and whisker plot.

For some sets of data, some of the largest value, smallest value, first quartile, median, and third quartile may be the same. For instance, you might have a data set in which the median and the third quartile are the same. In this case, the diagram would not have a dotted line inside the box displaying the median. The right side of the box would display both the third quartile and the median. For example, if the smallest value and the first quartile were both one, the median and the third quartile were both five, and the largest value was seven, the box plot would look like:

In this case, at least 25% of the values are equal to one. 25% of the values are between one and five, inclusive.
At least 25% of the values are equal to five. The top 25% of the values fall between five and seven, inclusive.

## Example 7

Test scores for a college statistics class held during the day are:

99 56 78 55.5 32 90 80 81 56 59 45 77 84.5 84 70 72 68 32 79 90

Test scores for a college statistics class held during the evening are:

98 78 68 83 81 89 88 76 65 45 98 90 80 84.5 85 79 78 98 90 79 81 25.5

1. Find the smallest and largest values, the median, and the first and third quartile for the day class.

2. Find the smallest and largest values, the median, and the first and third quartile for the night class.

3. Create a box plot for each set of data. Use one number line for both box plots.

4. Which box plot has the widest spread for the middle 50% of the data (the data between the first and third quartiles)? What does this mean for that set of data in comparison to the other set of data?

### Try It

The following data set shows the heights in inches for the boys in a class of 40 students.

66; 66; 67; 67; 68; 68; 68; 68; 68; 69; 69; 69; 70; 71; 72; 72; 72; 73; 73; 74

The following data set shows the heights in inches for the girls in a class of 40 students.

61; 61; 62; 62; 63; 63; 63; 65; 65; 65; 66; 66; 66; 67; 68; 68; 68; 69; 69; 69

Construct a box plot using a graphing calculator for each data set, and state which box plot has the wider spread for the middle 50% of the data.

## Example 8

Graph a box-and-whisker plot for the data values shown.

10 10 10 15 35 75 90 95 100 175 420 490 515 515 790

### Try It

Graph a box-and-whisker plot for the data values shown.

0 5 5 15 30 30 45 50 50 60 75 110 140 240 330

## References

Data from West Magazine.

## Concept Review

Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.