{"id":101,"date":"2017-04-15T03:16:38","date_gmt":"2017-04-15T03:16:38","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/conceptstest1\/chapter\/interquartile-range-and-boxplots-3-of-3\/"},"modified":"2017-05-28T00:34:17","modified_gmt":"2017-05-28T00:34:17","slug":"interquartile-range-and-boxplots-3-of-3","status":"web-only","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/chapter\/interquartile-range-and-boxplots-3-of-3\/","title":{"raw":"Interquartile Range and Boxplots (3 of 3)","rendered":"Interquartile Range and Boxplots (3 of 3)"},"content":{"raw":"&nbsp;\r\n<div class=\"textbox learning-objectives\">\r\n<h3>Learning Objectives<\/h3>\r\n<ul>\r\n \t<li>Use a five-number summary and a boxplot to describe a distribution.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<h3>Comparing Distributions with Side-by-Side Boxplots<\/h3>\r\nIn the next two examples, we again use boxplots to compare two distributions. This time we focus on writing a description of the two distributions. We practiced writing descriptions in the earlier section, \"Distributions for Quantitative Data,\" using dotplots and histograms. Now we use boxplots. As before, we describe shape, center, spread, and outliers. But now we use the five-number summary to make our descriptions more precise.\r\n<div class=\"textbox examples\">\r\n<h3>Example<\/h3>\r\n<h2>Best Actor\/Actress Oscar Winners<\/h2>\r\nSo far we have examined the age distributions of Oscar winners for males and females separately.\r\n\r\nIt will be interesting to <em>compare<\/em> the age distributions of actors and actresses who won best acting Oscars. To do that, we look at side-by-side boxplots of the age distributions by gender.\r\n\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031635\/m2_summarizing_data_topic_2_3_boxplot3_boxplot7.gif\" alt=\"Side by side comparative boxplots comparing age distributions of Oscar winning actors and actresses\" width=\"417\" height=\"347\" \/>\r\n<ul>\r\n \t<li>Actors: Min = 31, Q1 = 37.75, M = 42.5, Q3 = 48.75, Max = 76<\/li>\r\n \t<li>Actresses: Min = 21, Q1 = 32, M = 35, Q3 = 41.5, Max = 80<\/li>\r\n<\/ul>\r\nBased on the graph and numerical measures, we can make the following comparison between the two distributions:\r\n\r\nNote: A good summary compares the two distributions using shape, center, spread, and outliers. Let\u2019s begin with observations about these characteristics of the distributions.\r\n\r\n<strong>Shape<\/strong>: The shape of a distribution can be hard to determine from the boxplot, but we can compare the variability in the upper half of the data (Max \u2212 Median) to the variability in the lower half of the data (Median \u2212 Min) to get a sense of shape. For the men, the distribution appears skewed to the right because the lower half of the data has less variability than the upper half. The lower half of the data has a range of 11.5 years (42.5 \u2212 31), compared to the upper half of the data with a range of 33.5 years (76 \u2212 42.5). The distribution for women also appears right-skewed. The lower half of the data has a range of 14 years (35 \u2212 21), compared to a range of 45 years for the upper half of the data (80 \u2212 35). In both cases, the shape suggests that the Oscar is awarded to younger actors and actresses.\r\n\r\n<strong>Center<\/strong>: Actresses tend to win the Oscar at a younger age than do actors. The median age for females (35) is lower than for the males (42.5). Note also that the third quartile of the females\u2019 distribution (41.5) is lower than the median age for males. It tells us that only 25% of the actresses were 41.5 years old or older when they won the Oscar, compared to 50% of the males who were 42.5 years old or older.\r\n\r\n<strong>Spread<\/strong>: Not only do actresses win at a younger age, but the Oscar is awarded more consistently to younger actresses, as we can see by comparing the interquartile ranges. There is less variability in the middle half of the actresses\u2019 ages (IQR = 9.5) than in the actors\u2019 ages (IQR = 11). On the other hand, the actresses have more variability in their overall ages (range = 59) compared to the actors (range = 45).\r\n\r\n<strong>Outliers<\/strong>: We see that we have outliers in both distributions. There is only one high outlier in the actors\u2019 distribution (76, Henry Fonda, On Golden Pond), compared with three high outliers in the actresses\u2019 distribution.\r\n\r\n<em>Now let\u2019s pull these observations together into a paragraph. A good paragraph compares the two distributions and uses observations about the distributions to support a central thesis.<\/em>\r\n\r\nIn general, actresses win the Best Actress Oscar at a younger age than do actors. The median age for actresses is 35, compared to 42.5 for actors. Not only do actresses win at a younger age, the Oscar is awarded more consistently to younger actresses, as seen when we compare the interquartile ranges. There is less variability in the middle half of the actresses\u2019 ages (IQR = 9.5) than in the actors\u2019 ages (IQR = 11). Both distributions have older winners that are outliers. These older winners are unusual and skew the distribution of ages to the right.\r\n\r\n<\/div>\r\n<div class=\"textbox examples\">\r\n<h3>Example<\/h3>\r\n<h2>Temperature of Pittsburgh vs. San Francisco<\/h2>\r\nTo compare the average high temperatures of Pittsburgh to those of San Francisco, we look at the following side-by-side boxplots and supplement the graph with the descriptive statistics of each of the two distributions.\r\n\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031637\/m2_summarizing_data_topic_2_3_boxplot3_boxplot8.gif\" alt=\"Side by side boxplots of Pittsburgh's and San Francisco's average high temperatures over three quarters. Pittsburgh has a higher variable range of temperatures.\" width=\"344\" height=\"372\" \/>\r\n\r\nWhen looking at the graph, the similarities and differences between the two distributions are striking. Both distributions have roughly the same center (medians are 61.4 for Pittsburgh and 62.7 for San Francisco). However, the temperatures in Pittsburgh have a much larger variability than the temperatures in San Francisco (Range: 49 vs. 12; IQR: 36.5 vs. 5).\r\n\r\nThe practical interpretation of the results we obtained is that the weather in San Francisco is much more consistent than the weather in Pittsburgh, which varies a lot during the year. Also, because the temperatures in San Francisco vary so little during the year, knowing that the median temperature is around 63 is actually very informative. On the other hand, knowing that the median temperature in Pittsburgh is around 61 is practically useless, since temperatures vary so much during the year and can get much warmer or much colder than in San Francisco.\r\n\r\nNote that this example provides more intuition about variability by interpreting small variability as consistency and large variability as lack of consistency. Also, through this example, we learned that the center of the distribution is more meaningful as a typical value for the distribution when there is little variability (or, as statisticians say, little \u201cnoise\u201d) around it. When there is large variability, the center loses its practical meaning as a typical value.\r\n\r\n<\/div>\r\n&nbsp;\r\n<h3><strong>Let\u2019s Summarize<\/strong><\/h3>\r\n<ul>\r\n \t<li>The range measures the variability of a distribution by looking at the interval covered by <em>all<\/em> the data. The IQR measures the variability of a distribution by giving us the interval covered by the <em>middle<\/em> 50% of the data.<\/li>\r\n \t<li>The five-number summary of a distribution consists of the minimum, quartile 1, median, quartile 3, and maximum.<\/li>\r\n \t<li>The IQR is the measure of spread we should use when using the median to measure center.<\/li>\r\n \t<li>When using the median and IQR to measure center and spread, a data point is considered an outlier if it satisfies one of the following conditions.\r\n<ul>\r\n \t<li>More than 1.5 IQRs greater than Q3 (i.e., the value is greater than Q3 + 1.5 * IQR).<\/li>\r\n \t<li>More than 1.5 IQRs less than Q1 (i.e., the value is less than Q1 - 1.5 * IQR).<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<ul>\r\n \t<li>The boxplot is a graphical representation of a data set. It displays the five-number summary and highlights any points that are considered outliers (using the 1.5 * IQR rule described in the previous bullet).<\/li>\r\n \t<li>Side-by-side boxplots are commonly used to compare two data sets.<\/li>\r\n<\/ul>\r\n<h3><\/h3>","rendered":"<p>&nbsp;<\/p>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<ul>\n<li>Use a five-number summary and a boxplot to describe a distribution.<\/li>\n<\/ul>\n<\/div>\n<h3>Comparing Distributions with Side-by-Side Boxplots<\/h3>\n<p>In the next two examples, we again use boxplots to compare two distributions. This time we focus on writing a description of the two distributions. We practiced writing descriptions in the earlier section, &#8220;Distributions for Quantitative Data,&#8221; using dotplots and histograms. Now we use boxplots. As before, we describe shape, center, spread, and outliers. But now we use the five-number summary to make our descriptions more precise.<\/p>\n<div class=\"textbox examples\">\n<h3>Example<\/h3>\n<h2>Best Actor\/Actress Oscar Winners<\/h2>\n<p>So far we have examined the age distributions of Oscar winners for males and females separately.<\/p>\n<p>It will be interesting to <em>compare<\/em> the age distributions of actors and actresses who won best acting Oscars. To do that, we look at side-by-side boxplots of the age distributions by gender.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031635\/m2_summarizing_data_topic_2_3_boxplot3_boxplot7.gif\" alt=\"Side by side comparative boxplots comparing age distributions of Oscar winning actors and actresses\" width=\"417\" height=\"347\" \/><\/p>\n<ul>\n<li>Actors: Min = 31, Q1 = 37.75, M = 42.5, Q3 = 48.75, Max = 76<\/li>\n<li>Actresses: Min = 21, Q1 = 32, M = 35, Q3 = 41.5, Max = 80<\/li>\n<\/ul>\n<p>Based on the graph and numerical measures, we can make the following comparison between the two distributions:<\/p>\n<p>Note: A good summary compares the two distributions using shape, center, spread, and outliers. Let\u2019s begin with observations about these characteristics of the distributions.<\/p>\n<p><strong>Shape<\/strong>: The shape of a distribution can be hard to determine from the boxplot, but we can compare the variability in the upper half of the data (Max \u2212 Median) to the variability in the lower half of the data (Median \u2212 Min) to get a sense of shape. For the men, the distribution appears skewed to the right because the lower half of the data has less variability than the upper half. The lower half of the data has a range of 11.5 years (42.5 \u2212 31), compared to the upper half of the data with a range of 33.5 years (76 \u2212 42.5). The distribution for women also appears right-skewed. The lower half of the data has a range of 14 years (35 \u2212 21), compared to a range of 45 years for the upper half of the data (80 \u2212 35). In both cases, the shape suggests that the Oscar is awarded to younger actors and actresses.<\/p>\n<p><strong>Center<\/strong>: Actresses tend to win the Oscar at a younger age than do actors. The median age for females (35) is lower than for the males (42.5). Note also that the third quartile of the females\u2019 distribution (41.5) is lower than the median age for males. It tells us that only 25% of the actresses were 41.5 years old or older when they won the Oscar, compared to 50% of the males who were 42.5 years old or older.<\/p>\n<p><strong>Spread<\/strong>: Not only do actresses win at a younger age, but the Oscar is awarded more consistently to younger actresses, as we can see by comparing the interquartile ranges. There is less variability in the middle half of the actresses\u2019 ages (IQR = 9.5) than in the actors\u2019 ages (IQR = 11). On the other hand, the actresses have more variability in their overall ages (range = 59) compared to the actors (range = 45).<\/p>\n<p><strong>Outliers<\/strong>: We see that we have outliers in both distributions. There is only one high outlier in the actors\u2019 distribution (76, Henry Fonda, On Golden Pond), compared with three high outliers in the actresses\u2019 distribution.<\/p>\n<p><em>Now let\u2019s pull these observations together into a paragraph. A good paragraph compares the two distributions and uses observations about the distributions to support a central thesis.<\/em><\/p>\n<p>In general, actresses win the Best Actress Oscar at a younger age than do actors. The median age for actresses is 35, compared to 42.5 for actors. Not only do actresses win at a younger age, the Oscar is awarded more consistently to younger actresses, as seen when we compare the interquartile ranges. There is less variability in the middle half of the actresses\u2019 ages (IQR = 9.5) than in the actors\u2019 ages (IQR = 11). Both distributions have older winners that are outliers. These older winners are unusual and skew the distribution of ages to the right.<\/p>\n<\/div>\n<div class=\"textbox examples\">\n<h3>Example<\/h3>\n<h2>Temperature of Pittsburgh vs. San Francisco<\/h2>\n<p>To compare the average high temperatures of Pittsburgh to those of San Francisco, we look at the following side-by-side boxplots and supplement the graph with the descriptive statistics of each of the two distributions.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031637\/m2_summarizing_data_topic_2_3_boxplot3_boxplot8.gif\" alt=\"Side by side boxplots of Pittsburgh's and San Francisco's average high temperatures over three quarters. Pittsburgh has a higher variable range of temperatures.\" width=\"344\" height=\"372\" \/><\/p>\n<p>When looking at the graph, the similarities and differences between the two distributions are striking. Both distributions have roughly the same center (medians are 61.4 for Pittsburgh and 62.7 for San Francisco). However, the temperatures in Pittsburgh have a much larger variability than the temperatures in San Francisco (Range: 49 vs. 12; IQR: 36.5 vs. 5).<\/p>\n<p>The practical interpretation of the results we obtained is that the weather in San Francisco is much more consistent than the weather in Pittsburgh, which varies a lot during the year. Also, because the temperatures in San Francisco vary so little during the year, knowing that the median temperature is around 63 is actually very informative. On the other hand, knowing that the median temperature in Pittsburgh is around 61 is practically useless, since temperatures vary so much during the year and can get much warmer or much colder than in San Francisco.<\/p>\n<p>Note that this example provides more intuition about variability by interpreting small variability as consistency and large variability as lack of consistency. Also, through this example, we learned that the center of the distribution is more meaningful as a typical value for the distribution when there is little variability (or, as statisticians say, little \u201cnoise\u201d) around it. When there is large variability, the center loses its practical meaning as a typical value.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<h3><strong>Let\u2019s Summarize<\/strong><\/h3>\n<ul>\n<li>The range measures the variability of a distribution by looking at the interval covered by <em>all<\/em> the data. The IQR measures the variability of a distribution by giving us the interval covered by the <em>middle<\/em> 50% of the data.<\/li>\n<li>The five-number summary of a distribution consists of the minimum, quartile 1, median, quartile 3, and maximum.<\/li>\n<li>The IQR is the measure of spread we should use when using the median to measure center.<\/li>\n<li>When using the median and IQR to measure center and spread, a data point is considered an outlier if it satisfies one of the following conditions.\n<ul>\n<li>More than 1.5 IQRs greater than Q3 (i.e., the value is greater than Q3 + 1.5 * IQR).<\/li>\n<li>More than 1.5 IQRs less than Q1 (i.e., the value is less than Q1 &#8211; 1.5 * IQR).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li>The boxplot is a graphical representation of a data set. It displays the five-number summary and highlights any points that are considered outliers (using the 1.5 * IQR rule described in the previous bullet).<\/li>\n<li>Side-by-side boxplots are commonly used to compare two data sets.<\/li>\n<\/ul>\n<h3><\/h3>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-101\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Concepts in Statistics. <strong>Provided by<\/strong>: Open Learning Initiative. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/oli.cmu.edu\">http:\/\/oli.cmu.edu<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":163,"menu_order":18,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Concepts in Statistics\",\"author\":\"\",\"organization\":\"Open Learning Initiative\",\"url\":\"http:\/\/oli.cmu.edu\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"8cdda8d8-5533-43a7-9f34-87a8300df5aa","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-101","chapter","type-chapter","status-web-only","hentry"],"part":43,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/101","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/users\/163"}],"version-history":[{"count":6,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/101\/revisions"}],"predecessor-version":[{"id":1326,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/101\/revisions\/1326"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/parts\/43"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/101\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/media?parent=101"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapter-type?post=101"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/contributor?post=101"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/license?post=101"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}