{"id":74,"date":"2017-04-15T03:16:01","date_gmt":"2017-04-15T03:16:01","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/conceptstest1\/chapter\/histograms-4-of-4\/"},"modified":"2017-05-28T00:13:48","modified_gmt":"2017-05-28T00:13:48","slug":"histograms-4-of-4","status":"web-only","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/chapter\/histograms-4-of-4\/","title":{"raw":"Histograms (4 of 4)","rendered":"Histograms (4 of 4)"},"content":{"raw":"&nbsp;\r\n<div class=\"textbox learning-objectives\">\r\n<h3>Learning Objectives<\/h3>\r\n<ul>\r\n \t<li>Describe the distribution of quantitative data using a histogram.<\/li>\r\n<\/ul>\r\n<\/div>\r\nWe now use histograms to compare the distributions of a quantitative variable for two groups of individuals. Previously, we did a similar comparison using dotplots. As before, our descriptions focus on the overall pattern (shape, center, and spread) as well as deviations from the pattern (outliers). We also use percentages to describe and compare different intervals of variable values, since histograms make it easy to do so.\r\n<div class=\"textbox examples\">\r\n<h3>Example<\/h3>\r\n<h2>Smoking and Birth Weight<\/h2>\r\nDoes smoking during pregnancy have an impact on birth weight? To investigate this question, doctors collected data on 189 new mothers who gave birth at a hospital in Massachusetts during the 1980s.\r\n\r\nHere we use histograms to compare the distribution of birth weights for mothers who smoked during pregnancy with mothers who did not smoke. The table shows the numbers of mothers with babies in each interval of birth weights. (Left endpoints are included in the bin, so a 1,000-gram baby is in the interval 1,000\u20131,500 grams.)\r\n\r\nNote: For easy and more accurate visual comparisons, both histograms have the same horizontal scale and bin width. Also, the scale on the vertical axis is the same. So we can directly compare the heights of the bars to compare the number of mothers with babies in each interval of birth weights.\r\n\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031559\/m2_summarizing_data_topic_2_1_Topic2_1Histograms4of4_image1.png\" alt=\"Histograms showing birth weights of babies born to smoking and non-smoking mothers. Non smokers' columns skew to the left, and smokers' columns skew to the right\" width=\"484\" height=\"180\" \/>\r\n\r\nFollowing are some observations about the shape, center, and spread:\r\n\r\n<strong>Nonsmokers: <\/strong>The distribution of birth weights for the nonsmokers appears skewed slightly to the left. We estimate that birth weights for this group fall between approximately 1,000 and 5,000 grams for an overall range of approximately 4,000 grams. For nonsmokers, nearly half of the babies have a birth weight between 3,000 and 4,000 grams (29\u00a0+\u00a027\u00a0=\u00a056, 56\/115\u00a0=\u00a048.7%) with fewer babies in the lower weight ranges.\r\n\r\n<strong>Smokers: <\/strong>The distribution of birth weights for the smokers appears slightly skewed to the right. We estimate the birth weights for this group fall between approximately 500 and 4,500 grams for an overall range of approximately 4,000 grams. For smokers, nearly half of the babies have a birth weight between 2,000 and 3,000 grams (16\u00a0+\u00a022\u00a0=\u00a038, 38\u00a0\/\u00a074\u00a0=\u00a051%) with fewer babies in heavier weight ranges.\r\n\r\nComment: As we have seen, the choice of bin width can affect the shape of a histogram. We also cannot make precise statements about center and spread because our sense of \u201ctypical\u201d range is also affected by the choice of bin width.\r\n\r\nAnother strategy for comparing distributions is to use a <strong>benchmark<\/strong>. Here are some examples:\r\n<ol>\r\n \t<li>Doctors define <em>low birth weight<\/em> as a birth weight below 2,500 grams. Calculate and compare the percentage of smokers and nonsmokers with low-birth-weight babies by this definition.Nonsmokers: Of babies born to mothers who did not smoke, 3\u00a0+\u00a08\u00a0+\u00a018\u00a0=\u00a029 weighed less than 2,500 grams, so 25.2% (29 of 115) of the babies born to nonsmokers fit the definition of low birth weight.Smokers: Of babies born to mothers who smoked, 1\u00a0+\u00a01\u00a0+\u00a06\u00a0+\u00a022\u00a0=\u00a030 weighed less than 2,500 grams, so 40.5% (30 of 74) of the babies born to smokers fit the definition of low birth weight.<\/li>\r\n \t<li>A condition called <em>macrosomia<\/em> (also known as big baby syndrome) is defined as a birth weight of 4,000 grams or more. Calculate and compare the percentage of smokers and nonsmokers with babies that fit the definition of macrosomia.Nonsmokers: Of babies born to mothers who did not smoke, 6\u00a0+\u00a02\u00a0=\u00a08 weighed 4,000 grams or more, so 7.0% (8 of 115) of the babies born to nonsmokers fit the definition of macrosomia.Smokers: Of babies born to mothers who smoked, only 1 weighed 4,000 grams or more, so 1.4% (1 of 74) of the babies born to smokers fit the definition of macrosomia.<\/li>\r\n<\/ol>\r\n<strong>Now we synthesize these observations into a paragraph.<\/strong>\r\n\r\nTip: Be sure to emphasize the comparison of the groups. Develop a thesis statement if appropriate.\r\n\r\nIn this observational study, we compared mothers who smoked during pregnancy to mothers who did not smoke during pregnancy. The variable is the birth weights of their babies. Both groups had a lot of variability in birth weights, with identical overall range estimates of 4,000 grams.\r\n\r\nThere was also a lot of overlap in the distributions. Nonsmokers had babies that weighed between approximately 1,000 and 5,000 grams. Smokers had babies that weighed between approximately 500 and 4500 grams.\r\n\r\nHowever, we also observe some important differences in the typical ranges of birth weights for the two groups. For nonsmokers, nearly half of the babies have a birth weight between 3,000 and 4,000 grams (56 out of 115, 48.7%) with fewer babies in the lower weight ranges. For smokers, nearly half of the babies have a birth weight between 2,000 and 3,000 grams (40 of 74, 54%) with fewer babies in heavier weight ranges.\r\n\r\nIf we use the medical definition of low birth weight (under 2,500 grams), we see that smokers in this study have a much higher incidence of low birth weights: 25.2% (29 of 115) of the babies born to nonsmokers fit the definition of low birth weight, compared to 40.5% (30 of 74) of the babies born to smokers. In this study, smoking is associated with lower birth weights, though the variability in the data suggests that other variables also contribute to birth weight.\r\n\r\n<\/div>\r\n<div class=\"textbox exercises\">\r\n<h3>Learn By Doing<\/h3>\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3838\r\n\r\n<\/div>\r\n&nbsp;\r\n\r\n&nbsp;\r\n<h3><strong>Let\u2019s Summarize<\/strong><\/h3>\r\nIn \"Distributions for Quantitative Data,\" we focused on describing the <em>distribution of a quantitative variable<\/em>.\r\n<ul>\r\n \t<li>In a graph that summarizes the distribution of a quantitative variable, we can see\r\n<ul>\r\n \t<li>the possible values of the variable.<\/li>\r\n \t<li>the number of individuals with each variable value or interval of values.<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>To analyze the distribution of a quantitative variable, we described the <em>overall pattern of the data<\/em> (shape, center, spread), and any <em>deviations from the pattern<\/em> (outliers).\r\n<ul>\r\n \t<li>We described the <em>shape <\/em>of a distribution as left-skewed, right-skewed, symmetric with a central peak (bell-shaped), or uniform. Not all distributions have a simple shape that fits into one of these categories.<\/li>\r\n \t<li>The <em>center <\/em>of a distribution is a typical value that represents the group. We discuss ways to identify the center of a distribution in \"Measures of Center.\"<\/li>\r\n \t<li>The <em>spread <\/em>of a distribution is a description of how the data varies. One measurement of spread is the overall range of the data (largest value \u2013 smallest value). We also looked at a typical range of values. We discuss ways to identify a typical range in \"Quantifying Variability Relative to the Median\" and \"Quantifying Variability Relative to the Mean.\"<\/li>\r\n \t<li><em>Outliers <\/em>are data points that fall outside the overall pattern of the distribution.<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>We used two types of graphs to analyze the distribution of a quantitative variable:\r\n<ul>\r\n \t<li>Dotplots<\/li>\r\n \t<li>Histograms<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>Following are some observations about <em>dotplots:<\/em>\r\n<ul>\r\n \t<li>Individual variable values are visible, particularly when the data set is small.<\/li>\r\n \t<li>Descriptions of shape, center, and spread are not affected by how the dotplot is constructed.<\/li>\r\n \t<li>We can accurately calculate the overall range (largest value \u2013 smallest value).<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>Following are some observations about <em>histograms:<\/em>\r\n<ul>\r\n \t<li>Individual variable values are not visible.<\/li>\r\n \t<li>Grouping individuals into bins of equal-sized intervals is particularly useful when analyzing large data sets.<\/li>\r\n \t<li>We can easily use percentages, also called relative frequencies, to describe the distribution.<\/li>\r\n \t<li>Descriptions of shape, center, and spread are affected by how the bins are defined.<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>How do we decide when to use a dotplot and when to use a histogram? There are no rules here. Each type of graph can highlight different aspects of the data.<\/li>\r\n<\/ul>\r\n<h3><\/h3>","rendered":"<p>&nbsp;<\/p>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<ul>\n<li>Describe the distribution of quantitative data using a histogram.<\/li>\n<\/ul>\n<\/div>\n<p>We now use histograms to compare the distributions of a quantitative variable for two groups of individuals. Previously, we did a similar comparison using dotplots. As before, our descriptions focus on the overall pattern (shape, center, and spread) as well as deviations from the pattern (outliers). We also use percentages to describe and compare different intervals of variable values, since histograms make it easy to do so.<\/p>\n<div class=\"textbox examples\">\n<h3>Example<\/h3>\n<h2>Smoking and Birth Weight<\/h2>\n<p>Does smoking during pregnancy have an impact on birth weight? To investigate this question, doctors collected data on 189 new mothers who gave birth at a hospital in Massachusetts during the 1980s.<\/p>\n<p>Here we use histograms to compare the distribution of birth weights for mothers who smoked during pregnancy with mothers who did not smoke. The table shows the numbers of mothers with babies in each interval of birth weights. (Left endpoints are included in the bin, so a 1,000-gram baby is in the interval 1,000\u20131,500 grams.)<\/p>\n<p>Note: For easy and more accurate visual comparisons, both histograms have the same horizontal scale and bin width. Also, the scale on the vertical axis is the same. So we can directly compare the heights of the bars to compare the number of mothers with babies in each interval of birth weights.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031559\/m2_summarizing_data_topic_2_1_Topic2_1Histograms4of4_image1.png\" alt=\"Histograms showing birth weights of babies born to smoking and non-smoking mothers. Non smokers' columns skew to the left, and smokers' columns skew to the right\" width=\"484\" height=\"180\" \/><\/p>\n<p>Following are some observations about the shape, center, and spread:<\/p>\n<p><strong>Nonsmokers: <\/strong>The distribution of birth weights for the nonsmokers appears skewed slightly to the left. We estimate that birth weights for this group fall between approximately 1,000 and 5,000 grams for an overall range of approximately 4,000 grams. For nonsmokers, nearly half of the babies have a birth weight between 3,000 and 4,000 grams (29\u00a0+\u00a027\u00a0=\u00a056, 56\/115\u00a0=\u00a048.7%) with fewer babies in the lower weight ranges.<\/p>\n<p><strong>Smokers: <\/strong>The distribution of birth weights for the smokers appears slightly skewed to the right. We estimate the birth weights for this group fall between approximately 500 and 4,500 grams for an overall range of approximately 4,000 grams. For smokers, nearly half of the babies have a birth weight between 2,000 and 3,000 grams (16\u00a0+\u00a022\u00a0=\u00a038, 38\u00a0\/\u00a074\u00a0=\u00a051%) with fewer babies in heavier weight ranges.<\/p>\n<p>Comment: As we have seen, the choice of bin width can affect the shape of a histogram. We also cannot make precise statements about center and spread because our sense of \u201ctypical\u201d range is also affected by the choice of bin width.<\/p>\n<p>Another strategy for comparing distributions is to use a <strong>benchmark<\/strong>. Here are some examples:<\/p>\n<ol>\n<li>Doctors define <em>low birth weight<\/em> as a birth weight below 2,500 grams. Calculate and compare the percentage of smokers and nonsmokers with low-birth-weight babies by this definition.Nonsmokers: Of babies born to mothers who did not smoke, 3\u00a0+\u00a08\u00a0+\u00a018\u00a0=\u00a029 weighed less than 2,500 grams, so 25.2% (29 of 115) of the babies born to nonsmokers fit the definition of low birth weight.Smokers: Of babies born to mothers who smoked, 1\u00a0+\u00a01\u00a0+\u00a06\u00a0+\u00a022\u00a0=\u00a030 weighed less than 2,500 grams, so 40.5% (30 of 74) of the babies born to smokers fit the definition of low birth weight.<\/li>\n<li>A condition called <em>macrosomia<\/em> (also known as big baby syndrome) is defined as a birth weight of 4,000 grams or more. Calculate and compare the percentage of smokers and nonsmokers with babies that fit the definition of macrosomia.Nonsmokers: Of babies born to mothers who did not smoke, 6\u00a0+\u00a02\u00a0=\u00a08 weighed 4,000 grams or more, so 7.0% (8 of 115) of the babies born to nonsmokers fit the definition of macrosomia.Smokers: Of babies born to mothers who smoked, only 1 weighed 4,000 grams or more, so 1.4% (1 of 74) of the babies born to smokers fit the definition of macrosomia.<\/li>\n<\/ol>\n<p><strong>Now we synthesize these observations into a paragraph.<\/strong><\/p>\n<p>Tip: Be sure to emphasize the comparison of the groups. Develop a thesis statement if appropriate.<\/p>\n<p>In this observational study, we compared mothers who smoked during pregnancy to mothers who did not smoke during pregnancy. The variable is the birth weights of their babies. Both groups had a lot of variability in birth weights, with identical overall range estimates of 4,000 grams.<\/p>\n<p>There was also a lot of overlap in the distributions. Nonsmokers had babies that weighed between approximately 1,000 and 5,000 grams. Smokers had babies that weighed between approximately 500 and 4500 grams.<\/p>\n<p>However, we also observe some important differences in the typical ranges of birth weights for the two groups. For nonsmokers, nearly half of the babies have a birth weight between 3,000 and 4,000 grams (56 out of 115, 48.7%) with fewer babies in the lower weight ranges. For smokers, nearly half of the babies have a birth weight between 2,000 and 3,000 grams (40 of 74, 54%) with fewer babies in heavier weight ranges.<\/p>\n<p>If we use the medical definition of low birth weight (under 2,500 grams), we see that smokers in this study have a much higher incidence of low birth weights: 25.2% (29 of 115) of the babies born to nonsmokers fit the definition of low birth weight, compared to 40.5% (30 of 74) of the babies born to smokers. In this study, smoking is associated with lower birth weights, though the variability in the data suggests that other variables also contribute to birth weight.<\/p>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<p>\t<iframe id=\"lumen_assessment_3838\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3838&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3838\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h3><strong>Let\u2019s Summarize<\/strong><\/h3>\n<p>In &#8220;Distributions for Quantitative Data,&#8221; we focused on describing the <em>distribution of a quantitative variable<\/em>.<\/p>\n<ul>\n<li>In a graph that summarizes the distribution of a quantitative variable, we can see\n<ul>\n<li>the possible values of the variable.<\/li>\n<li>the number of individuals with each variable value or interval of values.<\/li>\n<\/ul>\n<\/li>\n<li>To analyze the distribution of a quantitative variable, we described the <em>overall pattern of the data<\/em> (shape, center, spread), and any <em>deviations from the pattern<\/em> (outliers).\n<ul>\n<li>We described the <em>shape <\/em>of a distribution as left-skewed, right-skewed, symmetric with a central peak (bell-shaped), or uniform. Not all distributions have a simple shape that fits into one of these categories.<\/li>\n<li>The <em>center <\/em>of a distribution is a typical value that represents the group. We discuss ways to identify the center of a distribution in &#8220;Measures of Center.&#8221;<\/li>\n<li>The <em>spread <\/em>of a distribution is a description of how the data varies. One measurement of spread is the overall range of the data (largest value \u2013 smallest value). We also looked at a typical range of values. We discuss ways to identify a typical range in &#8220;Quantifying Variability Relative to the Median&#8221; and &#8220;Quantifying Variability Relative to the Mean.&#8221;<\/li>\n<li><em>Outliers <\/em>are data points that fall outside the overall pattern of the distribution.<\/li>\n<\/ul>\n<\/li>\n<li>We used two types of graphs to analyze the distribution of a quantitative variable:\n<ul>\n<li>Dotplots<\/li>\n<li>Histograms<\/li>\n<\/ul>\n<\/li>\n<li>Following are some observations about <em>dotplots:<\/em>\n<ul>\n<li>Individual variable values are visible, particularly when the data set is small.<\/li>\n<li>Descriptions of shape, center, and spread are not affected by how the dotplot is constructed.<\/li>\n<li>We can accurately calculate the overall range (largest value \u2013 smallest value).<\/li>\n<\/ul>\n<\/li>\n<li>Following are some observations about <em>histograms:<\/em>\n<ul>\n<li>Individual variable values are not visible.<\/li>\n<li>Grouping individuals into bins of equal-sized intervals is particularly useful when analyzing large data sets.<\/li>\n<li>We can easily use percentages, also called relative frequencies, to describe the distribution.<\/li>\n<li>Descriptions of shape, center, and spread are affected by how the bins are defined.<\/li>\n<\/ul>\n<\/li>\n<li>How do we decide when to use a dotplot and when to use a histogram? There are no rules here. Each type of graph can highlight different aspects of the data.<\/li>\n<\/ul>\n<h3><\/h3>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-74\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Concepts in Statistics. <strong>Provided by<\/strong>: Open Learning Initiative. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/oli.cmu.edu\">http:\/\/oli.cmu.edu<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":163,"menu_order":11,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Concepts in Statistics\",\"author\":\"\",\"organization\":\"Open Learning Initiative\",\"url\":\"http:\/\/oli.cmu.edu\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"3880a53a-a158-489e-9e22-6ab866dde55f","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-74","chapter","type-chapter","status-web-only","hentry"],"part":43,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/chapters\/74","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/wp\/v2\/users\/163"}],"version-history":[{"count":5,"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/chapters\/74\/revisions"}],"predecessor-version":[{"id":1319,"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/chapters\/74\/revisions\/1319"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/parts\/43"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/chapters\/74\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/wp\/v2\/media?parent=74"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/pressbooks\/v2\/chapter-type?post=74"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/wp\/v2\/contributor?post=74"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/atd-herkimer-statisticssocsci\/wp-json\/wp\/v2\/license?post=74"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}