{"id":218,"date":"2017-04-15T03:19:39","date_gmt":"2017-04-15T03:19:39","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/conceptstest1\/chapter\/assessing-the-fit-of-a-line-3-of-4\/"},"modified":"2022-08-01T16:04:36","modified_gmt":"2022-08-01T16:04:36","slug":"assessing-the-fit-of-a-line-3-of-4","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/chapter\/assessing-the-fit-of-a-line-3-of-4\/","title":{"raw":"Assessing the Fit of a Line (3 of 4)","rendered":"Assessing the Fit of a Line (3 of 4)"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>Learning OUTCOMES<\/h3>\r\n<ul>\r\n \t<li>Use residuals, standard error, and <em>r<\/em><sup>2<\/sup> to assess the fit of a linear model.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<h2>Introduction<\/h2>\r\nHere we continue our discussion of the question, <em>How good is the best-fit line?<\/em>\r\n\r\nLet\u2019s summarize what we have done so far to address this question. We began by looking at how the predictions from the least-squares regression line compare to observed data. We defined a residual to be the amount of error in a prediction. Next, we created residual plots. A residual plot with no pattern reassures us that our linear model is a good summary of the data.\r\n\r\nBut how do we know if the explanatory variable we chose is really the best predictor of the response variable?\r\n\r\nThe regression line does not take into account other variables that might also be good predictors. So let\u2019s investigate the question, <em>What proportion of the variation in the response variable does our regression line explain?<\/em>\r\n\r\nWe begin our investigation with a scatterplot of the daily high temperature (\u00b0F) in New York City from January 1 to June 1. We have 4 years of data (2002, 2003, 2005, and 2006). The least-squares regression line has the equation <em>y<\/em> = 36.29 + 0.25<em>x<\/em>, where <em>x<\/em> is the number of days after January 1. Therefore, January 1 corresponds to <em>x = 0<\/em>, and June 1 corresponds to <em>x<\/em> = 151.\r\n\r\n<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031936\/m3_examining_relationships_topic_3_3_asses_fit_line_assessingfitline3a.gif\" alt=\"Scatterplot of New York City temperatures\" \/>\r\n\r\nTwo things stand out as we look at this picture. First, we see a clear, positive linear relationship tracked by the regression line. As the days progress, there is an associated increase in temperature. Second, we see a substantial scattering of points around the regression line. We are looking at 4 years of data, and we see a lot of variation in temperature, so the day of the year only partially explains the increase in temperature. Other variables also influence the temperature, but the line accounts only for the relationship between the day of the year and temperature.\r\n\r\nNow we ask the question, <em>Given the natural variation in temperature, what proportion of that variation does our linear model explain?<\/em>\r\n\r\nThe answer, which is surprisingly easy to calculate, is just the square of the correlation coefficient.\r\n\r\n<strong>The value of <em>r<\/em><sup>2<\/sup> is the proportion of the variation in the response variable that is explained by the least-squares regression line.<\/strong>\r\n\r\nIn the present case, we have <em>r<\/em> = 0.73; therefore, [latex]\\frac{\\text{explained variation}}{\\text{total variation}}={\\text{0.73}}^{\\text{2}}=\\text{0.53}[\/latex]. And so we say that our linear regression model explains 53% of the total variation in the response variable. Consequently, 47% of the total variation remains unexplained.\r\n<div class=\"textbox exercises\">\r\n<h3>Example<\/h3>\r\n<h2>Highway Sign Visibility<\/h2>\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031938\/m3_examining_relationships_topic_3_3_asses_fit_line_linear19a.gif\" alt=\"Scatterplot of driver age and highway sign reading distance\" width=\"406\" height=\"308\" \/>\r\n\r\nRecall that the least-squares regression line is Distance = 576 \u2212 3 * Age. The correlation coefficient for the highway sign data set is \u22120.793, so <em>r<\/em><sup>2<\/sup> = (\u22120.793)<sup>2<\/sup> = 0.63.\r\n\r\nOur linear model uses age to predict maximum distance at which a driver can read a highway sign. Other variables may also influence reading distance. We can say the linear relationship between age and maximum reading distance accounts for 63% of the variation in maximum reading distance.\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Try It<\/h3>\r\nhttps:\/\/assess.lumenlearning.com\/practice\/7bcccb8e-95bc-434a-bde5-1b290ed0536e\r\n\r\nhttps:\/\/assess.lumenlearning.com\/practice\/9307fd2d-543d-4ece-9429-84599150ba8d\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Try It<\/h3>\r\nhttps:\/\/assess.lumenlearning.com\/practice\/2af85796-a92c-4462-9d46-08a70f1a0dac\r\n\r\n<\/div>\r\n<h2>Contribute!<\/h2><div style=\"margin-bottom: 8px;\">Did you have an idea for improving this content? We\u2019d love your input.<\/div><a href=\"https:\/\/docs.google.com\/document\/d\/1581uRmZE9XQ1V3uGB182mQZ5zqyPHeEAnFC8hoA_E_U\" target=\"_blank\" style=\"font-size: 10pt; font-weight: 600; color: #077fab; text-decoration: none; border: 2px solid #077fab; border-radius: 7px; padding: 5px 25px; text-align: center; cursor: pointer; line-height: 1.5em;\">Improve this page<\/a><a style=\"margin-left: 16px;\" target=\"_blank\" href=\"https:\/\/docs.google.com\/document\/d\/1vy-T6DtTF-BbMfpVEI7VP_R7w2A4anzYZLXR8Pk4Fu4\">Learn More<\/a>","rendered":"<div class=\"textbox learning-objectives\">\n<h3>Learning OUTCOMES<\/h3>\n<ul>\n<li>Use residuals, standard error, and <em>r<\/em><sup>2<\/sup> to assess the fit of a linear model.<\/li>\n<\/ul>\n<\/div>\n<h2>Introduction<\/h2>\n<p>Here we continue our discussion of the question, <em>How good is the best-fit line?<\/em><\/p>\n<p>Let\u2019s summarize what we have done so far to address this question. We began by looking at how the predictions from the least-squares regression line compare to observed data. We defined a residual to be the amount of error in a prediction. Next, we created residual plots. A residual plot with no pattern reassures us that our linear model is a good summary of the data.<\/p>\n<p>But how do we know if the explanatory variable we chose is really the best predictor of the response variable?<\/p>\n<p>The regression line does not take into account other variables that might also be good predictors. So let\u2019s investigate the question, <em>What proportion of the variation in the response variable does our regression line explain?<\/em><\/p>\n<p>We begin our investigation with a scatterplot of the daily high temperature (\u00b0F) in New York City from January 1 to June 1. We have 4 years of data (2002, 2003, 2005, and 2006). The least-squares regression line has the equation <em>y<\/em> = 36.29 + 0.25<em>x<\/em>, where <em>x<\/em> is the number of days after January 1. Therefore, January 1 corresponds to <em>x = 0<\/em>, and June 1 corresponds to <em>x<\/em> = 151.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031936\/m3_examining_relationships_topic_3_3_asses_fit_line_assessingfitline3a.gif\" alt=\"Scatterplot of New York City temperatures\" \/><\/p>\n<p>Two things stand out as we look at this picture. First, we see a clear, positive linear relationship tracked by the regression line. As the days progress, there is an associated increase in temperature. Second, we see a substantial scattering of points around the regression line. We are looking at 4 years of data, and we see a lot of variation in temperature, so the day of the year only partially explains the increase in temperature. Other variables also influence the temperature, but the line accounts only for the relationship between the day of the year and temperature.<\/p>\n<p>Now we ask the question, <em>Given the natural variation in temperature, what proportion of that variation does our linear model explain?<\/em><\/p>\n<p>The answer, which is surprisingly easy to calculate, is just the square of the correlation coefficient.<\/p>\n<p><strong>The value of <em>r<\/em><sup>2<\/sup> is the proportion of the variation in the response variable that is explained by the least-squares regression line.<\/strong><\/p>\n<p>In the present case, we have <em>r<\/em> = 0.73; therefore, [latex]\\frac{\\text{explained variation}}{\\text{total variation}}={\\text{0.73}}^{\\text{2}}=\\text{0.53}[\/latex]. And so we say that our linear regression model explains 53% of the total variation in the response variable. Consequently, 47% of the total variation remains unexplained.<\/p>\n<div class=\"textbox exercises\">\n<h3>Example<\/h3>\n<h2>Highway Sign Visibility<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031938\/m3_examining_relationships_topic_3_3_asses_fit_line_linear19a.gif\" alt=\"Scatterplot of driver age and highway sign reading distance\" width=\"406\" height=\"308\" \/><\/p>\n<p>Recall that the least-squares regression line is Distance = 576 \u2212 3 * Age. The correlation coefficient for the highway sign data set is \u22120.793, so <em>r<\/em><sup>2<\/sup> = (\u22120.793)<sup>2<\/sup> = 0.63.<\/p>\n<p>Our linear model uses age to predict maximum distance at which a driver can read a highway sign. Other variables may also influence reading distance. We can say the linear relationship between age and maximum reading distance accounts for 63% of the variation in maximum reading distance.<\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>Try It<\/h3>\n<p>\t<iframe id=\"assessment_practice_7bcccb8e-95bc-434a-bde5-1b290ed0536e\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/7bcccb8e-95bc-434a-bde5-1b290ed0536e?iframe_resize_id=assessment_practice_id_7bcccb8e-95bc-434a-bde5-1b290ed0536e\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"assessment_practice_9307fd2d-543d-4ece-9429-84599150ba8d\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/9307fd2d-543d-4ece-9429-84599150ba8d?iframe_resize_id=assessment_practice_id_9307fd2d-543d-4ece-9429-84599150ba8d\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>Try It<\/h3>\n<p>\t<iframe id=\"assessment_practice_2af85796-a92c-4462-9d46-08a70f1a0dac\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/2af85796-a92c-4462-9d46-08a70f1a0dac?iframe_resize_id=assessment_practice_id_2af85796-a92c-4462-9d46-08a70f1a0dac\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<h2>Contribute!<\/h2>\n<div style=\"margin-bottom: 8px;\">Did you have an idea for improving this content? We\u2019d love your input.<\/div>\n<p><a href=\"https:\/\/docs.google.com\/document\/d\/1581uRmZE9XQ1V3uGB182mQZ5zqyPHeEAnFC8hoA_E_U\" target=\"_blank\" style=\"font-size: 10pt; font-weight: 600; color: #077fab; text-decoration: none; border: 2px solid #077fab; border-radius: 7px; padding: 5px 25px; text-align: center; cursor: pointer; line-height: 1.5em;\">Improve this page<\/a><a style=\"margin-left: 16px;\" target=\"_blank\" href=\"https:\/\/docs.google.com\/document\/d\/1vy-T6DtTF-BbMfpVEI7VP_R7w2A4anzYZLXR8Pk4Fu4\">Learn More<\/a><\/p>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-218\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Concepts in Statistics. <strong>Provided by<\/strong>: Open Learning Initiative. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/oli.cmu.edu\">http:\/\/oli.cmu.edu<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":163,"menu_order":24,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Concepts in Statistics\",\"author\":\"\",\"organization\":\"Open Learning Initiative\",\"url\":\"http:\/\/oli.cmu.edu\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"1bbe606f-ad84-4a6e-ad4e-3bef550b5ba7, 597f0232-291a-44c0-ba94-3d78d316f2fe","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-218","chapter","type-chapter","status-publish","hentry"],"part":140,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/218","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/users\/163"}],"version-history":[{"count":7,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/218\/revisions"}],"predecessor-version":[{"id":2740,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/218\/revisions\/2740"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/parts\/140"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/218\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/media?parent=218"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapter-type?post=218"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/contributor?post=218"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/license?post=218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}