Histograms

Learning Outcomes

  • Construct a histogram

For most of the work you do in this book, you will use a histogram to display the data. One advantage of a histogram is that it can readily display large data sets. A rule of thumb is to use a histogram when the data set consists of [latex]100[/latex] values or more.

Recall: Ordering Integers

It is helpful to use a number line to order numbers. A number line has a center point, zero. To the left of zero are negative numbers and to the right of zero are positive numbers. Remember, zero is neither positive nor negative.

As you move from left to right on a number line, the numbers get bigger. As you move from right to left on a number line, the numbers get smaller.

This figure is a number line with 0 in the middle. Then, the scaling has positive numbers 1 to 4 to the right of 0 and negative numbers, negative 1 to negative 4 to the left of 0. There is also a green arrow on the bottom that goes to the left with words "Numbers Decrease" below the arrow. There is another green arrow at the top pointing to the right with the words "Numbers Increase" above the arrow.

histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents (for instance, distance from your home to school).

Recall: Order of Operations

When simplifying mathematical expressions perform the operations in the following order:
1. Parentheses and other Grouping Symbols

  • Simplify all expressions inside the parentheses or other grouping symbols, working on the innermost parentheses first

2. Exponents

  • Simplify all expressions with exponents

3. Multiplication and Division

  • Perform all multiplication and division in order from left to right. These operations have equal priority.

4. Addition and Subtraction

  • Perform all addition and subtraction in order from left to right. These operations have equal priority.

Recall: Evaluating Expressions

We use letters to represent unknown numerical values, these are called variables. Any variable in an algebraic expression may take on or be assigned different values. When that happens, the value of the algebraic expression changes. To evaluate an algebraic expression means to determine the value of the expression for a given value of each variable in the expression. Replace each variable in the expression with the given value then simplify the resulting expression using the order of operations.

example

Evaluate [latex]x+7[/latex] when

  1. [latex]x=3[/latex]
  2. [latex]x=12[/latex]

Solution:

1. To evaluate, substitute [latex]3[/latex] for [latex]x[/latex] in the expression, and then simplify.

[latex]x+7[/latex]
Substitute [latex]\color{red}{3}+7[/latex]
Add [latex]10[/latex]

When [latex]x=3[/latex], the expression [latex]x+7[/latex] has a value of [latex]10[/latex].
2. To evaluate, substitute [latex]12[/latex] for [latex]x[/latex] in the expression, and then simplify.

[latex]x+7[/latex]
Substitute [latex]\color{red}{12}+7[/latex]
Add [latex]19[/latex]

When [latex]x=12[/latex], the expression [latex]x+7[/latex] has a value of [latex]19[/latex].

Notice that we got different results for parts 1 and 2 even though we started with the same expression. This is because the values used for [latex]x[/latex] were different. When we evaluate an expression, the value varies depending on the value used for the variable.

The vertical axis is labeled either frequency or relative frequency (or percent frequency or probability). The graph will have the same shape with either label. The histogram (like the stemplot) can give you the shape of the data, the center, and the spread of the data.

The relative frequency is equal to the frequency for an observed value of the data divided by the total number of data values in the sample. (Remember, frequency is defined as the number of times an answer occurs.) If:

  • [latex]f[/latex] = frequency
  • [latex]n[/latex] = total number of data values (or the sum of the individual frequencies), and
  • [latex]RF[/latex] = relative frequency,

then [latex]\displaystyle{R}{F}=\frac{{f}}{{n}}[/latex]

For example, if three students in Mr. Ahab’s English class of [latex]40[/latex] students received from [latex]90[/latex]% to [latex]100[/latex]%, then, [latex]\displaystyle{f}={3},{n}={40}[/latex], and [latex]{R}{F}=\frac{{f}}{{n}}=\frac{{3}}{{40}}={0.075}[/latex]. [latex]7.5[/latex]% of the students received [latex]90 -100[/latex]%. [latex]90–100[/latex]% are quantitative measures.

Recall: Add or Subtract Decimals

  1. Write the numbers vertically so the decimal points line up
  2. Use zeros as place holders, as needed
  3. Add or subtract the numbers as if they were whole numbers. Then, place the decimal in the answer under the decimal points in the given numbers.

To construct a histogram, first decide how many bars or intervals, also called classes, represent the data. Many histograms consist of five to [latex]15[/latex] bars or classes for clarity. The number of bars needs to be chosen. Choose a starting point for the first interval to be less than the smallest data value. A convenient starting point is a lower value carried out to one more decimal place than the value with the most decimal places. For example, if the value with the most decimal places is [latex]6.1[/latex] and this is the smallest value, a convenient starting point is [latex]6.05[/latex] ([latex]6.1 – 0.05 = 6.05[/latex]). We say that [latex]6.05[/latex] has more precision. If the value with the most decimal places is [latex]2.23[/latex] and the lowest value is [latex]1.5[/latex], a convenient starting point is [latex]1.495[/latex] ([latex]1.5 – 0.005 = 1.495[/latex]). If the value with the most decimal places is [latex]3.234[/latex] and the lowest value is [latex]1.0[/latex], a convenient starting point is [latex]0.9995[/latex] ([latex]1.0 – 0.0005 = 0.9995[/latex]). If all the data happen to be integers and the smallest value is two, then a convenient starting point is [latex]1.5[/latex] ([latex]2 – 0.5 = 1.5[/latex]). Also, when the starting point and other boundaries are carried to one additional decimal place, no data value will fall on a boundary. The next two examples go into detail about how to construct a histogram using continuous data and how to create a histogram using discrete data.

Watch the following video for an example of how to draw a histogram.

Example

The following data are the heights (in inches to the nearest half inch) of [latex]100[/latex] male semiprofessional soccer players. The heights are continuous data, since height is measured.

[latex]60[/latex]; [latex]60.5[/latex]; [latex]61[/latex]; [latex]61[/latex]; [latex]61.5[/latex]

[latex]63.5[/latex]; [latex]63.5[/latex]; [latex]63.5[/latex]

[latex]64[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]64.5[/latex]; [latex]64.5[/latex]; [latex]64.5[/latex]; [latex]64.5[/latex]; [latex]64.5[/latex]; [latex]64.5[/latex]; [latex]64.5[/latex]; [latex]64.566[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]66.5[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]67.5[/latex]; [latex]67.5[/latex]; [latex]67.5[/latex]; [latex]67.5[/latex]; [latex]67.5[/latex]; [latex]67.5[/latex]; [latex]67.5[/latex]

[latex]68[/latex]; [latex]68[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69.5[/latex]; [latex]69.5[/latex]; [latex]69.5[/latex]; [latex]69.5[/latex]; [latex]69.5[/latex]

[latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70.5[/latex]; [latex]70.5[/latex]; [latex]70.5[/latex]; [latex]71[/latex]; [latex]71[/latex]; [latex]71[/latex]

[latex]72[/latex]; [latex]72[/latex]; [latex]72[/latex]; [latex]72.5[/latex]; [latex]72.5[/latex]; [latex]73[/latex]; [latex]73.5[/latex]; [latex]74[/latex]

The smallest data value is [latex]60[/latex]. Since the data with the most decimal places has one decimal (for instance, [latex]61.5[/latex]), we want our starting point to have two decimal places. Since the numbers [latex]0.5[/latex], [latex]0.05[/latex], [latex]0.005[/latex], etc. are convenient numbers, use [latex]0.05[/latex] and subtract it from [latex]60[/latex], the smallest value, for the convenient starting point.

[latex]60 – 0.05 = 59.95[/latex] which is more precise than, say, [latex]61.5[/latex] by one decimal place. The starting point is, then, [latex]59.95[/latex].

The largest value is [latex]74[/latex], so [latex]74 + 0.05 = 74.05[/latex] is the ending value.

Next, calculate the width of each bar or class interval. To calculate this width, subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire). Suppose you choose eight bars.

[latex]\displaystyle\frac{{{74.05}-{59.95}}}{{8}}={1.76}[/latex]

Note

We will round up to two and make each bar or class interval two units wide. Rounding up to two is one way to prevent a value from falling on a boundary. Rounding to the next number is often necessary even if it goes against the standard rules of rounding. For this example, using [latex]1.76[/latex] as the width would also work. A guideline that is followed by some for the width of a bar or class interval is to take the square root of the number of data values and then round to the nearest whole number, if necessary. For example, if there are [latex]150[/latex] values of data, take the square root of [latex]150[/latex] and round to [latex]12[/latex] bars or intervals.

The boundaries are:

  • [latex]59.95[/latex]
  • [latex]59.95 + 2 = 61.95[/latex]
  • [latex]61.95 + 2 = 63.95[/latex]
  • [latex]63.95 + 2 = 65.95[/latex]
  • [latex]65.95 + 2 = 67.95[/latex]
  • [latex]67.95 + 2 = 69.95[/latex]
  • [latex]69.95 + 2 = 71.95[/latex]
  • [latex]71.95 + 2 = 73.95[/latex]
  • [latex]73.95 + 2 = 75.95[/latex]

The heights [latex]60[/latex] through [latex]61.5[/latex] inches are in the interval [latex]59.95–61.95[/latex]. The heights that are [latex]63.5[/latex] are in the interval [latex]61.95–63.95[/latex]. The heights that are [latex]64[/latex] through [latex]64.5[/latex] are in the interval [latex]63.95–65.95[/latex]. The heights [latex]66[/latex] through [latex]67.5[/latex] are in the interval [latex]65.95–67.95[/latex]. The heights [latex]68[/latex] through [latex]69.5[/latex] are in the interval [latex]67.95–69.95[/latex]. The heights [latex]70[/latex] through [latex]71[/latex] are in the interval [latex]69.95–71.95[/latex]. The heights [latex]72[/latex] through [latex]73.5[/latex] are in the interval [latex]71.95–73.95[/latex]. The height [latex]74[/latex] is in the interval [latex]73.95–75.95[/latex].

The following histogram displays the heights on the [latex]x[/latex]-axis and relative frequency on the [latex]y[/latex]-axis.

Histogram consists of 8 bars with the y-axis in increments of 0.05 from 0-0.4 and the x-axis in intervals of 2 from 59.95-75.95.

Try It

The following data are the shoe sizes of [latex]50[/latex] male students. The sizes are continuous data since shoe size is measured. Construct a histogram and calculate the width of each bar or class interval. Suppose you choose six bars.

[latex]9[/latex]; [latex]9[/latex]; [latex]9.5[/latex]; [latex]9.5[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]; [latex]10.5[/latex]

[latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11[/latex]; [latex]11.5[/latex]; [latex]11.5[/latex]; [latex]11.5[/latex]; [latex]11.5[/latex]; [latex]11.5[/latex]; [latex]11.5[/latex]; [latex]11.5[/latex]

[latex]12[/latex]; [latex]12[/latex]; [latex]12[/latex]; [latex]12[/latex]; [latex]12[/latex]; [latex]12[/latex]; [latex]12[/latex]; [latex]12.5[/latex]; [latex]12.5[/latex]; [latex]12.5[/latex]; [latex]12.5[/latex]; [latex]14[/latex]

Example

Create a histogram for the following data: the number of books bought by 50 part-time college students at ABC College. The number of books is discrete data, since books are counted.

[latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]

[latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]

[latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]

[latex]4[/latex]; [latex]4[/latex]; [latex]4[/latex]; [latex]4[/latex]; [latex]4[/latex]; [latex]4[/latex]

[latex]5[/latex]; [latex]5[/latex]; [latex]5[/latex]; [latex]5[/latex]; [latex]5[/latex]

[latex]6[/latex]; [latex]6[/latex]

Eleven students buy one book. Ten students buy two books. Sixteen students buy three books. Six students buy four books. Five students buy five books. Two students buy six books.

Because the data are integers, subtract [latex]0.5[/latex] from [latex]1[/latex], the smallest data value and add [latex]0.5[/latex] to [latex]6[/latex], the largest data value. Then the starting point is [latex]0.5[/latex] and the ending value is [latex]6.5[/latex].

Next, calculate the width of each bar or class interval. If the data are discrete and there are not too many different values, a width that places the data values in the middle of the bar or class interval is the most convenient. Since the data consist of the numbers [latex]1[/latex], [latex]2[/latex], [latex]3[/latex], [latex]4[/latex], [latex]5[/latex], [latex]6[/latex], and the starting point is [latex]0.5[/latex], a width of one places the [latex]1[/latex] in the middle of the interval from [latex]0.5[/latex] to [latex]1.5[/latex], the [latex]2[/latex] in the middle of the interval from [latex]1.5[/latex] to [latex]2.5[/latex], the [latex]3[/latex] in the middle of the interval from [latex]2.5[/latex] to [latex]3.5[/latex], the [latex]4[/latex] in the middle of the interval from _______ to _______, the [latex]5[/latex] in the middle of the interval from _______ to _______, and the _______ in the middle of the interval from _______ to _______ .

Calculate the number of bars as follows:

[latex]\displaystyle\frac{{{6.5}-{0.5}}}{{\text{number of bars}}}={1}[/latex]
 

where [latex]1[/latex] is the width of a bar. Therefore, bars = [latex]6[/latex].

The following histogram displays the number of books on the [latex]x[/latex]-axis and the frequency on the [latex]y[/latex]-axis.

Histogram consists of 6 bars with the y-axis in increments of 2 from 0-16 and the x-axis in intervals of 1 from 0.5-6.5.

USING THE TI-83, 83+, 84, 84+ CALCULATOR

Create the histogram for the previous example.

  • Press Y=. Press CLEAR to delete any equations.
  • Press STAT 1:EDIT. If L1 has data in it, arrow up into the name L1, press CLEAR and then arrow down. If necessary, do the same for L2.
  • Into L1, enter [latex]1[/latex], [latex]2[/latex], [latex]3[/latex], [latex]4[/latex], [latex]5[/latex], [latex]6[/latex].
  • Into L2, enter [latex]11[/latex], [latex]10[/latex], [latex]16[/latex], [latex]6[/latex], [latex]5[/latex], [latex]2[/latex].
  • Press WINDOW. Set Xmin = [latex].5[/latex], Xscl = [latex](6.5 – .5)/6[/latex], Ymin = [latex]–1[/latex], Ymax = [latex]20[/latex], Yscl = [latex]1[/latex], Xres = [latex]1[/latex].
  • Press 2nd Y=. Start by pressing 4:Plotsoff ENTER.
  • Press 2nd Y=. Press 1:Plot1. Press ENTER. Arrow down to TYPE. Arrow to the 3rd picture (histogram). Press ENTER.
  • Arrow down to Xlist: Enter L1 (2nd 1). Arrow down to Freq. Enter L2 (2nd 2).
  • Press GRAPH.
  • Use the TRACE key and the arrow keys to examine the histogram.

Try It

The following data are the number of sports played by 50 student athletes. The number of sports is discrete data since sports are counted.

[latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]; [latex]1[/latex]

[latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]2[/latex]

[latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]; [latex]3[/latex]

[latex]20[/latex] student athletes play one sport. [latex]22[/latex] student athletes play two sports. Eight student athletes play three sports.

Fill in the blanks for the following sentence. Since the data consist of the numbers [latex]1[/latex], [latex]2[/latex], [latex]3[/latex], and the starting point is [latex]0.5[/latex], a width of one places the [latex]1[/latex] in the middle of the interval [latex]0.5[/latex] to _____, the [latex]2[/latex] in the middle of the interval from _____ to _____, and the [latex]3[/latex] in the middle of the interval from _____ to _____.

Example

Using this data set, construct a histogram.

Number of Hours My Classmates Spent Playing Video Games on Weekends
[latex]9.95[/latex] [latex]10[/latex] [latex]2.25[/latex] [latex]16.75[/latex] [latex]0[/latex]
[latex]19.5[/latex] [latex]22.5[/latex] [latex]7.5[/latex] [latex]15[/latex] [latex]12.75[/latex]
[latex]5.5[/latex] [latex]11[/latex] [latex]10[/latex] [latex]20.75[/latex] [latex]17.5[/latex]
[latex]23[/latex] [latex]21.9[/latex] [latex]24[/latex] [latex]23.75[/latex] [latex]18[/latex]
[latex]20[/latex] [latex]15[/latex] [latex]22.9[/latex] [latex]18.8[/latex] [latex]20.5[/latex]

Try It

The following data represent the number of employees at various restaurants in New York City. Using this data, create a histogram.
[latex]22[/latex]; [latex]35[/latex]; [latex]15[/latex]; [latex]26[/latex]; [latex]40[/latex]; [latex]28[/latex]; [latex]18[/latex]; [latex]20[/latex]; [latex]25[/latex]; [latex]34[/latex]; [latex]39[/latex]; [latex]42[/latex]; [latex]24[/latex]; [latex]22[/latex]; [latex]19[/latex]; [latex]27[/latex]; [latex]22[/latex]; [latex]34[/latex]; [latex]40[/latex]; [latex]20[/latex]; [latex]38[/latex]; and [latex]28[/latex]
Use [latex]10–19[/latex] as the first interval.

COLLABORATIVE EXERCISE

Count the money (bills and change) in your pocket or purse. Your instructor will record the amounts. As a class, construct a histogram displaying the data. Discuss how many intervals you think is appropriate. You may want to experiment with the number of intervals.