{"id":3866,"date":"2022-03-15T23:22:16","date_gmt":"2022-03-15T23:22:16","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/?post_type=chapter&#038;p=3866"},"modified":"2022-06-02T00:17:43","modified_gmt":"2022-06-02T00:17:43","slug":"corequisite-support-activity-for-6-d","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/corequisite-support-activity-for-6-d\/","title":{"raw":"Corequisite Support Activity for 6.D: Using Residual Plots with a Linear Regression Model","rendered":"Corequisite Support Activity for 6.D: Using Residual Plots with a Linear Regression Model"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>what you'll need to know<\/h3>\r\nIn this support activity you\u2019ll become familiar with the following:\r\n<ul>\r\n \t<li>Explain ways to determine that a linear model is appropriate for a scatterplot of bivariate data.<\/li>\r\n \t<li>Explain what information is provided by [latex]r[\/latex].<\/li>\r\n \t<li>Explain what information is provided by [latex]R^{2}[\/latex].<\/li>\r\n \t<li>Use a scatterplot to determine the relationship between the residual and the proximity and location of a particular data point to the line of best fit.<\/li>\r\n \t<li>Interpret the sign and relative size of a residual in the context of any data point's location relative to the line of best fit.<\/li>\r\n<\/ul>\r\n<\/div>\r\nIn the next preview assignment and in the next class, you will need to be able to compute and interpret residuals. Recall that a residual represents the vertical error between the calculated line of best fit and an individual data point in a scatterplot of bivariate data. In this corequisite support activity, you'll build a deeper understanding of how residuals are calculated.\r\n<h2>Revisiting Residuals<\/h2>\r\nIn the activities that follow, we will be concerned with deciding when a line of best fit is an appropriate way to model the relationship between two variables. In order to make that decision, we\u2019ll need to be able to examine the residuals, which you first saw mentioned in [WTK 6A]. Here, we\u2019ll be investigating more thoroughly how to calculate residuals and interpret their meaning.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 1<\/h3>\r\nGiven a scatterplot of bivariate data, what are some ways that you would be able to tell that a linear model is appropriate? Feel free to use sketches to illustrate your thought process.\r\n\r\n[reveal-answer q=\"347477\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"347477\"]What is a usual shape and spread of data that is highly linearly related?[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 2<\/h3>\r\nWhat information does [latex]r[\/latex] give you? What information does [latex]R^2[\/latex] give you?\r\n\r\n[reveal-answer q=\"254065\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"254065\"]Recall from the previous section what you learned about these. What two measures do [latex]r[\/latex] and [latex]R^{2}[\/latex] represent?[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 3<\/h3>\r\nWhat information is missing from the list you made in Question 2 that could help you decide if the linear model is good?\r\n\r\n[reveal-answer q=\"874683\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"874683\"]Considering what [latex]r[\/latex] and [latex]R^{2}[\/latex] explicitly state, what don't they tell you about?[\/hidden-answer]\r\n\r\n<\/div>\r\nYou have learned about the correlation coefficient r and the coefficient of determination [latex]R^2[\/latex], which are tools we have for determining whether the line of best fit is a useful model and how well the line fits the data. You've also seen that they don't give the full picture of a model's usefulness.\r\n\r\nAnother tool we have is the analysis of <strong>residuals<\/strong>. When we fit a line to the data, one thing we are interested in is how similar the linear model\u2019s prediction is to the observed data\u2014in other words, we want to know how closely the model matches the data. The <strong>residual<\/strong> for a data point is the difference between the observed value of the response variable and the linear model\u2019s prediction.\r\n<p style=\"text-align: center;\">Residual = observed value \u2013 predicted value<\/p>\r\n<p style=\"text-align: center;\">Residual = [latex]y-\\hat{y}[\/latex]<\/p>\r\n\r\n<div class=\"textbox\">\r\n\r\n<strong>Vocabulary:<\/strong> The word \u201cresidual\u201d means \u201cleft over\u201d or \u201cremaining.\u201d One way to relate the term \u201cresidual\u201d to the previous concept is to think of the residual as the quantity left over that can\u2019t be explained by the linear relationship between the response variable and the explanatory variable.\r\n\r\n<\/div>\r\nThe following are different ways of expressing the same idea:\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">The residual is the <strong>difference between<\/strong> the observed value and the predicted value.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">The residual is the <strong>vertical distance<\/strong> between the observed value and the predicted value.<\/li>\r\n<\/ul>\r\nIn all cases, these sentences are telling you that in order to calculate the residual, you must subtract the predicted value from the observed value.\r\n<div class=\"textbox exercises\">\r\n<h3>example<\/h3>\r\nThe goal of this activity will be to understand how to calculate the residual for a data point given its location on a plot and proximity to the line of best fit for the data set containing the data point. Use the questions below to gain familiarity with the process before answering Questions 4 - 8.\r\n\r\nRefer to the following scatterplot of bivariate data to answer the questions below.\r\n\r\n<img class=\"aligncenter wp-image-4809\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/03\/01233502\/Residuals_Plot2.jpg\" alt=\"A scatterplot showing 19 data points in a roughly linear configuration and a line of best fit with a y-intercept of approximately 10 and slope of approximately 3.5. A single data point is highlighted at approximately (8, 50).\" width=\"800\" height=\"439\" \/>\r\n\r\n&nbsp;\r\n\r\nLocate the data point highlighted on the plot, located above the line of best fit. This represents an actual observation in the data set. Use this point and the line of best fit to answer the following questions.\r\n<ol>\r\n \t<li>What is the value of the explanatory variable [latex]x[\/latex]for the data point highlighted on the plot?<\/li>\r\n \t<li>What is the value of the response variable [latex]y[\/latex]for the data point highlighted on the plot?<\/li>\r\n \t<li>Using the line of best fit, what appears visually to be the predicted value of the response variable\u00a0[latex]\\hat{y}[\/latex] for this value of the explanatory variable?<\/li>\r\n \t<li>If the equation of the line of best fit is [latex]\\hat{y}=10.02 + 3.48x[\/latex], what is the predicted value of [latex]\\hat{y}[\/latex] for this value of the explanatory variable?<\/li>\r\n \t<li>Is the actual value of the response variable greater or lower than the predicted value?<\/li>\r\n \t<li>Locate a data point that lies below the line of best fit and estimate its [latex]\\left(x,y\\right)[\/latex] coordinates.<\/li>\r\n<\/ol>\r\nThe difference between the observed value of the response variable [latex]y[\/latex] and the predicted value of the response variable [latex]\\hat{y}[\/latex] represents the\u00a0<strong>residual<\/strong> for the actual data point. Some data points lie above the line of best fit and some lie below the line of best fit.\r\n<ul>\r\n \t<li>7. What is the residual for the highlighted data point in the scatterplot above?<\/li>\r\n \t<li>8. Calculate the residual for the point you choose located below the line of best fit. What process did you follow?<\/li>\r\n \t<li>9. Was the residual for a point below the line of best fit negative or positive?<\/li>\r\n \t<li>10. Can you locate any points on the plot with a residual of zero?<\/li>\r\n<\/ul>\r\n[reveal-answer q=\"865170\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"865170\"]\r\n<ol>\r\n \t<li>8<\/li>\r\n \t<li>50<\/li>\r\n \t<li>38<\/li>\r\n \t<li>37.86<\/li>\r\n \t<li>50 is greater than the predicted value<\/li>\r\n \t<li>Some approximate examples include (0.5, 8), (1.2, 10.2), (1.8, 4), (4.5, 20), etc.<\/li>\r\n \t<li>12<\/li>\r\n \t<li>Sample answer: I followed the line over to the vertical axis to locate the predicted value of the response variable [latex]\\hat{y}[\/latex], then took the difference between the actual value [latex]y[\/latex] and the predicted value. Example, for the point (0.5, 8), the predicted value of the response variable appears to be 11. I would take the difference [latex]y-\\hat{y}=\\text{residual}[\/latex]. In this case, [latex]8 - 11 = -3[latex].<\/li>\r\n \t<li>The residual below the line of best fit was negative.<\/li>\r\n \t<li>Yes, approximately (6, 20) and (9, 42) are located on the line of best fit. In both cases\u00a0[latex]y-\\hat{y}=0[\/latex]<\/li>\r\n<\/ol>\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Residuals and the Bad Drivers Dataset<\/h3>\r\nThroughout the rest of this corequisite support activity, we\u2019ll be focusing on the \u201cBad Drivers\u201d dataset again. This dataset reports information about car crashes and contains entries corresponding to all 50 states, as well as Washington, DC. In this case, we\u2019ll be focusing on two variables: losses (in dollars) incurred by insurance companies for collisions per insured driver and insurance premiums (in dollars).\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 4<\/h3>\r\nConsider this plot of a dataset that reports information about car crashes in the United States. Each data point represents the intersection of losses by insurance companies for collisions per insured driver and insurance premiums. Both variables are in units of US dollars.\r\n\r\n<img class=\"alignnone wp-image-1284\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12202542\/Picture168-300x146.png\" alt=\"A scatterplot with a regression line of best fit. The horizontal axis is labeled &quot;losses&quot; and the vertical axis is labeled &quot;insurance_premiums.&quot; One of the points is labeled &quot;state: New Jersey&quot; and is located at approximately (159, 1300). The equation for the line of best fit is also given as y = 285 + 4.47x.\" width=\"1276\" height=\"621\" \/>\r\n\r\nPart A: Using the plot above, what were the approximate losses incurred by insurance companies for collisions per insured driver in New Jersey?\r\n\r\n[reveal-answer q=\"505492\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"505492\"]Locate the labeled data point on the plot and its corresponding loss value.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: About how much do insurance premiums cost in New Jersey?\r\n\r\n[reveal-answer q=\"295037\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"295037\"]Locate the labeled data point on the plot and its corresponding insurance premium value.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart C: Based on the losses in New Jersey and the line of best fit, what is the predicted average cost of insurance premiums in New Jersey?\r\n\r\n[reveal-answer q=\"338074\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"338074\"]What does the line of best fit indicate the intersection of losses and premiums should be for New Jersey?[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart D: What is the residual for New Jersey?\r\n\r\n[reveal-answer q=\"995881\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"995881\"]What is the difference between the vertical value of the actual data and the predicted vertical value?[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart E: Fill in the blanks to interpret the residual for New Jersey.\r\n\r\nNew Jersey\u2019s actual ____________ is _________ (greater\/lower) than predicted.\r\n\r\n[reveal-answer q=\"715848\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"715848\"]What do <em>you<\/em> think? See the Example in the text above for guidance.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 5<\/h3>\r\nIn words, describe the process for finding the residual.\r\n\r\n[reveal-answer q=\"21809\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"21809\"]Describe your process as you did in the Example above.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 6<\/h3>\r\nThe following table lists information about four of the states in the dataset. Complete the table. Round to the nearest cent. Then, label the points corresponding to each of these states on the scatterplot that follows.\r\n\r\n[reveal-answer q=\"614119\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"614119\"]Use the process you described in the Example and in Question 5 above. Remember that some residuals can be negative.[\/hidden-answer]\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>State<\/strong><\/td>\r\n<td><strong>Losses ($)<\/strong><\/td>\r\n<td><strong>Observed Insurance Premiums ($)<\/strong><\/td>\r\n<td><strong>Predicted Insurance Premiums ($)<\/strong><\/td>\r\n<td><strong>Residual\r\n($)<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Louisiana<\/td>\r\n<td>$194.78<\/td>\r\n<td>$1,281.50<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Idaho<\/td>\r\n<td>$82.75<\/td>\r\n<td>$642.00<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Montana<\/td>\r\n<td>$85.15<\/td>\r\n<td>$816.20<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Oklahoma<\/td>\r\n<td>$178.86<\/td>\r\n<td>$881.50<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<img class=\"alignnone wp-image-1285\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12202548\/Picture169-300x149.png\" alt=\"A scatterplot with a regression line whose slope is labeled as y = 285 + 4.47x. The horizontal axis is labeled &quot;losses&quot; and the vertical axis is labeled &quot;insurance_premiums.&quot; \" width=\"1069\" height=\"531\" \/>\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 7<\/h3>\r\nFor a given state, what is the relationship between the sign of the residual and how the observed insurance premium value compares to the predicted insurance premium value?\r\n\r\n[reveal-answer q=\"294647\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"294647\"]What do <em>you\u00a0<\/em>think?[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 8<\/h3>\r\nOf the states in the table, which state has its data point closest to the line of best fit? How can you tell from the residual?\r\n\r\n[reveal-answer q=\"22978\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"22978\"]What do <em>you\u00a0<\/em>think?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Interpreting Residuals<\/h3>\r\nThere are a few different perspectives of the residual, mathematically. Let's take a look at these in Questions 9 - 12 below.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 9<\/h3>\r\n<span style=\"background-color: #ffff00;\">[There may be a better format for this question in OHM]<\/span>\r\n\r\nIf the observed data point lies above the line of best fit, then:\r\n<p style=\"padding-left: 30px;\">a. the residual is <span style=\"text-decoration: underline;\">positive\/negative<\/span> (circle one);<\/p>\r\n<p style=\"padding-left: 30px;\">b. [latex]y &lt; \\text{ or } &gt; \\text{ or } = \\hat{y}[\/latex] (circle one);<\/p>\r\n<p style=\"padding-left: 30px;\">c. the predicted value of the response variable is greater\/less (circle one) than the observed value of the response variable; and<\/p>\r\n<p style=\"padding-left: 30px;\">d. [latex]y-\\hat{y} &lt; \\text{ or } &gt; \\text{ or } = 0[\/latex] (circle one).<\/p>\r\n[reveal-answer q=\"588151\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"588151\"]See the example above along with Question 7 for guidance. Recall also that [latex]y[\/latex] represents the actual value of the response variable and [latex]\\hat{y}[\/latex] represents the predicted value.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 10<\/h3>\r\n<span style=\"background-color: #ffff00;\">[There may be a better format for this question in OHM]<\/span>\r\n\r\nIf the observed data point lies below the line of best fit, then:\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">the residual is <span style=\"text-decoration: underline;\">positive\/negative<\/span> (circle one);<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">[latex]y &lt; \\text{ or } &gt; \\text{ or } = \\hat{y}[\/latex]\u00a0 (circle one);<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">the predicted value of the response variable is greater\/less (circle one) than the observed value of the response variable; and<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">[latex]y-\\hat{y} &lt; \\text{ or } &gt; \\text{ or } = 0[\/latex]\u00a0 (circle one).<\/li>\r\n<\/ul>\r\n[reveal-answer q=\"921619\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"921619\"]See the example above along with Question 7 for guidance. Recall also that [latex]y[\/latex] represents the actual value of the response variable and [latex]\\hat{y}[\/latex] represents the predicted value.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 11<\/h3>\r\nIf the observed data point lies close to the line of best fit, then the residual is _______ zero.\r\n<ol>\r\n \t<li>a) Close to<\/li>\r\n \t<li>b) Far from<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"573663\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"573663\"]See Question 8 for guidance.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 12<\/h3>\r\nIf the observed data point lies far from the line of best fit, then the residual is _______ zero.\r\n<ol>\r\n \t<li>a) Close to<\/li>\r\n \t<li>b) Far from<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"250275\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"250275\"]See Question 8 for guidance.[\/hidden-answer]\r\n\r\n<\/div>\r\nYou've spent some good time developing an understanding of how residuals are interpreted and calculated. It's time to move to the course materials in this section to put your understanding to good use!","rendered":"<div class=\"textbox learning-objectives\">\n<h3>what you&#8217;ll need to know<\/h3>\n<p>In this support activity you\u2019ll become familiar with the following:<\/p>\n<ul>\n<li>Explain ways to determine that a linear model is appropriate for a scatterplot of bivariate data.<\/li>\n<li>Explain what information is provided by [latex]r[\/latex].<\/li>\n<li>Explain what information is provided by [latex]R^{2}[\/latex].<\/li>\n<li>Use a scatterplot to determine the relationship between the residual and the proximity and location of a particular data point to the line of best fit.<\/li>\n<li>Interpret the sign and relative size of a residual in the context of any data point&#8217;s location relative to the line of best fit.<\/li>\n<\/ul>\n<\/div>\n<p>In the next preview assignment and in the next class, you will need to be able to compute and interpret residuals. Recall that a residual represents the vertical error between the calculated line of best fit and an individual data point in a scatterplot of bivariate data. In this corequisite support activity, you&#8217;ll build a deeper understanding of how residuals are calculated.<\/p>\n<h2>Revisiting Residuals<\/h2>\n<p>In the activities that follow, we will be concerned with deciding when a line of best fit is an appropriate way to model the relationship between two variables. In order to make that decision, we\u2019ll need to be able to examine the residuals, which you first saw mentioned in [WTK 6A]. Here, we\u2019ll be investigating more thoroughly how to calculate residuals and interpret their meaning.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 1<\/h3>\n<p>Given a scatterplot of bivariate data, what are some ways that you would be able to tell that a linear model is appropriate? Feel free to use sketches to illustrate your thought process.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q347477\">Hint<\/span><\/p>\n<div id=\"q347477\" class=\"hidden-answer\" style=\"display: none\">What is a usual shape and spread of data that is highly linearly related?<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 2<\/h3>\n<p>What information does [latex]r[\/latex] give you? What information does [latex]R^2[\/latex] give you?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q254065\">Hint<\/span><\/p>\n<div id=\"q254065\" class=\"hidden-answer\" style=\"display: none\">Recall from the previous section what you learned about these. What two measures do [latex]r[\/latex] and [latex]R^{2}[\/latex] represent?<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 3<\/h3>\n<p>What information is missing from the list you made in Question 2 that could help you decide if the linear model is good?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q874683\">Hint<\/span><\/p>\n<div id=\"q874683\" class=\"hidden-answer\" style=\"display: none\">Considering what [latex]r[\/latex] and [latex]R^{2}[\/latex] explicitly state, what don&#8217;t they tell you about?<\/div>\n<\/div>\n<\/div>\n<p>You have learned about the correlation coefficient r and the coefficient of determination [latex]R^2[\/latex], which are tools we have for determining whether the line of best fit is a useful model and how well the line fits the data. You&#8217;ve also seen that they don&#8217;t give the full picture of a model&#8217;s usefulness.<\/p>\n<p>Another tool we have is the analysis of <strong>residuals<\/strong>. When we fit a line to the data, one thing we are interested in is how similar the linear model\u2019s prediction is to the observed data\u2014in other words, we want to know how closely the model matches the data. The <strong>residual<\/strong> for a data point is the difference between the observed value of the response variable and the linear model\u2019s prediction.<\/p>\n<p style=\"text-align: center;\">Residual = observed value \u2013 predicted value<\/p>\n<p style=\"text-align: center;\">Residual = [latex]y-\\hat{y}[\/latex]<\/p>\n<div class=\"textbox\">\n<p><strong>Vocabulary:<\/strong> The word \u201cresidual\u201d means \u201cleft over\u201d or \u201cremaining.\u201d One way to relate the term \u201cresidual\u201d to the previous concept is to think of the residual as the quantity left over that can\u2019t be explained by the linear relationship between the response variable and the explanatory variable.<\/p>\n<\/div>\n<p>The following are different ways of expressing the same idea:<\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The residual is the <strong>difference between<\/strong> the observed value and the predicted value.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The residual is the <strong>vertical distance<\/strong> between the observed value and the predicted value.<\/li>\n<\/ul>\n<p>In all cases, these sentences are telling you that in order to calculate the residual, you must subtract the predicted value from the observed value.<\/p>\n<div class=\"textbox exercises\">\n<h3>example<\/h3>\n<p>The goal of this activity will be to understand how to calculate the residual for a data point given its location on a plot and proximity to the line of best fit for the data set containing the data point. Use the questions below to gain familiarity with the process before answering Questions 4 &#8211; 8.<\/p>\n<p>Refer to the following scatterplot of bivariate data to answer the questions below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4809\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/03\/01233502\/Residuals_Plot2.jpg\" alt=\"A scatterplot showing 19 data points in a roughly linear configuration and a line of best fit with a y-intercept of approximately 10 and slope of approximately 3.5. A single data point is highlighted at approximately (8, 50).\" width=\"800\" height=\"439\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Locate the data point highlighted on the plot, located above the line of best fit. This represents an actual observation in the data set. Use this point and the line of best fit to answer the following questions.<\/p>\n<ol>\n<li>What is the value of the explanatory variable [latex]x[\/latex]for the data point highlighted on the plot?<\/li>\n<li>What is the value of the response variable [latex]y[\/latex]for the data point highlighted on the plot?<\/li>\n<li>Using the line of best fit, what appears visually to be the predicted value of the response variable\u00a0[latex]\\hat{y}[\/latex] for this value of the explanatory variable?<\/li>\n<li>If the equation of the line of best fit is [latex]\\hat{y}=10.02 + 3.48x[\/latex], what is the predicted value of [latex]\\hat{y}[\/latex] for this value of the explanatory variable?<\/li>\n<li>Is the actual value of the response variable greater or lower than the predicted value?<\/li>\n<li>Locate a data point that lies below the line of best fit and estimate its [latex]\\left(x,y\\right)[\/latex] coordinates.<\/li>\n<\/ol>\n<p>The difference between the observed value of the response variable [latex]y[\/latex] and the predicted value of the response variable [latex]\\hat{y}[\/latex] represents the\u00a0<strong>residual<\/strong> for the actual data point. Some data points lie above the line of best fit and some lie below the line of best fit.<\/p>\n<ul>\n<li>7. What is the residual for the highlighted data point in the scatterplot above?<\/li>\n<li>8. Calculate the residual for the point you choose located below the line of best fit. What process did you follow?<\/li>\n<li>9. Was the residual for a point below the line of best fit negative or positive?<\/li>\n<li>10. Can you locate any points on the plot with a residual of zero?<\/li>\n<\/ul>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q865170\">Show Answer<\/span><\/p>\n<div id=\"q865170\" class=\"hidden-answer\" style=\"display: none\">\n<ol>\n<li>8<\/li>\n<li>50<\/li>\n<li>38<\/li>\n<li>37.86<\/li>\n<li>50 is greater than the predicted value<\/li>\n<li>Some approximate examples include (0.5, 8), (1.2, 10.2), (1.8, 4), (4.5, 20), etc.<\/li>\n<li>12<\/li>\n<li>Sample answer: I followed the line over to the vertical axis to locate the predicted value of the response variable [latex]\\hat{y}[\/latex], then took the difference between the actual value [latex]y[\/latex] and the predicted value. Example, for the point (0.5, 8), the predicted value of the response variable appears to be 11. I would take the difference [latex]y-\\hat{y}=\\text{residual}[\/latex]. In this case, [latex]8 - 11 = -3[latex].<\/li>\n<li>The residual below the line of best fit was negative.<\/li>\n<li>Yes, approximately (6, 20) and (9, 42) are located on the line of best fit. In both cases\u00a0[latex]y-\\hat{y}=0[\/latex]<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<\/div>\n<h3>Residuals and the Bad Drivers Dataset<\/h3>\n<p>Throughout the rest of this corequisite support activity, we\u2019ll be focusing on the \u201cBad Drivers\u201d dataset again. This dataset reports information about car crashes and contains entries corresponding to all 50 states, as well as Washington, DC. In this case, we\u2019ll be focusing on two variables: losses (in dollars) incurred by insurance companies for collisions per insured driver and insurance premiums (in dollars).<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 4<\/h3>\n<p>Consider this plot of a dataset that reports information about car crashes in the United States. Each data point represents the intersection of losses by insurance companies for collisions per insured driver and insurance premiums. Both variables are in units of US dollars.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1284\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12202542\/Picture168-300x146.png\" alt=\"A scatterplot with a regression line of best fit. The horizontal axis is labeled &quot;losses&quot; and the vertical axis is labeled &quot;insurance_premiums.&quot; One of the points is labeled &quot;state: New Jersey&quot; and is located at approximately (159, 1300). The equation for the line of best fit is also given as y = 285 + 4.47x.\" width=\"1276\" height=\"621\" \/><\/p>\n<p>Part A: Using the plot above, what were the approximate losses incurred by insurance companies for collisions per insured driver in New Jersey?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q505492\">Hint<\/span><\/p>\n<div id=\"q505492\" class=\"hidden-answer\" style=\"display: none\">Locate the labeled data point on the plot and its corresponding loss value.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: About how much do insurance premiums cost in New Jersey?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q295037\">Hint<\/span><\/p>\n<div id=\"q295037\" class=\"hidden-answer\" style=\"display: none\">Locate the labeled data point on the plot and its corresponding insurance premium value.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part C: Based on the losses in New Jersey and the line of best fit, what is the predicted average cost of insurance premiums in New Jersey?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q338074\">Hint<\/span><\/p>\n<div id=\"q338074\" class=\"hidden-answer\" style=\"display: none\">What does the line of best fit indicate the intersection of losses and premiums should be for New Jersey?<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part D: What is the residual for New Jersey?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q995881\">Hint<\/span><\/p>\n<div id=\"q995881\" class=\"hidden-answer\" style=\"display: none\">What is the difference between the vertical value of the actual data and the predicted vertical value?<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part E: Fill in the blanks to interpret the residual for New Jersey.<\/p>\n<p>New Jersey\u2019s actual ____________ is _________ (greater\/lower) than predicted.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q715848\">Hint<\/span><\/p>\n<div id=\"q715848\" class=\"hidden-answer\" style=\"display: none\">What do <em>you<\/em> think? See the Example in the text above for guidance.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 5<\/h3>\n<p>In words, describe the process for finding the residual.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q21809\">Hint<\/span><\/p>\n<div id=\"q21809\" class=\"hidden-answer\" style=\"display: none\">Describe your process as you did in the Example above.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 6<\/h3>\n<p>The following table lists information about four of the states in the dataset. Complete the table. Round to the nearest cent. Then, label the points corresponding to each of these states on the scatterplot that follows.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q614119\">Hint<\/span><\/p>\n<div id=\"q614119\" class=\"hidden-answer\" style=\"display: none\">Use the process you described in the Example and in Question 5 above. Remember that some residuals can be negative.<\/div>\n<\/div>\n<table>\n<tbody>\n<tr>\n<td><strong>State<\/strong><\/td>\n<td><strong>Losses ($)<\/strong><\/td>\n<td><strong>Observed Insurance Premiums ($)<\/strong><\/td>\n<td><strong>Predicted Insurance Premiums ($)<\/strong><\/td>\n<td><strong>Residual<br \/>\n($)<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Louisiana<\/td>\n<td>$194.78<\/td>\n<td>$1,281.50<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Idaho<\/td>\n<td>$82.75<\/td>\n<td>$642.00<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Montana<\/td>\n<td>$85.15<\/td>\n<td>$816.20<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Oklahoma<\/td>\n<td>$178.86<\/td>\n<td>$881.50<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1285\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12202548\/Picture169-300x149.png\" alt=\"A scatterplot with a regression line whose slope is labeled as y = 285 + 4.47x. The horizontal axis is labeled &quot;losses&quot; and the vertical axis is labeled &quot;insurance_premiums.&quot;\" width=\"1069\" height=\"531\" \/><\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 7<\/h3>\n<p>For a given state, what is the relationship between the sign of the residual and how the observed insurance premium value compares to the predicted insurance premium value?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q294647\">Hint<\/span><\/p>\n<div id=\"q294647\" class=\"hidden-answer\" style=\"display: none\">What do <em>you\u00a0<\/em>think?<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 8<\/h3>\n<p>Of the states in the table, which state has its data point closest to the line of best fit? How can you tell from the residual?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q22978\">Hint<\/span><\/p>\n<div id=\"q22978\" class=\"hidden-answer\" style=\"display: none\">What do <em>you\u00a0<\/em>think?<\/div>\n<\/div>\n<\/div>\n<h3>Interpreting Residuals<\/h3>\n<p>There are a few different perspectives of the residual, mathematically. Let's take a look at these in Questions 9 - 12 below.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 9<\/h3>\n<p><span style=\"background-color: #ffff00;\">[There may be a better format for this question in OHM]<\/span><\/p>\n<p>If the observed data point lies above the line of best fit, then:<\/p>\n<p style=\"padding-left: 30px;\">a. the residual is <span style=\"text-decoration: underline;\">positive\/negative<\/span> (circle one);<\/p>\n<p style=\"padding-left: 30px;\">b. [latex]y < \\text{ or } > \\text{ or } = \\hat{y}[\/latex] (circle one);<\/p>\n<p style=\"padding-left: 30px;\">c. the predicted value of the response variable is greater\/less (circle one) than the observed value of the response variable; and<\/p>\n<p style=\"padding-left: 30px;\">d. [latex]y-\\hat{y} < \\text{ or } > \\text{ or } = 0[\/latex] (circle one).<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q588151\">Hint<\/span><\/p>\n<div id=\"q588151\" class=\"hidden-answer\" style=\"display: none\">See the example above along with Question 7 for guidance. Recall also that [latex]y[\/latex] represents the actual value of the response variable and [latex]\\hat{y}[\/latex] represents the predicted value.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 10<\/h3>\n<p><span style=\"background-color: #ffff00;\">[There may be a better format for this question in OHM]<\/span><\/p>\n<p>If the observed data point lies below the line of best fit, then:<\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">the residual is <span style=\"text-decoration: underline;\">positive\/negative<\/span> (circle one);<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">[latex]y < \\text{ or } > \\text{ or } = \\hat{y}[\/latex]\u00a0 (circle one);<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">the predicted value of the response variable is greater\/less (circle one) than the observed value of the response variable; and<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">[latex]y-\\hat{y} < \\text{ or } > \\text{ or } = 0[\/latex]\u00a0 (circle one).<\/li>\n<\/ul>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q921619\">Hint<\/span><\/p>\n<div id=\"q921619\" class=\"hidden-answer\" style=\"display: none\">See the example above along with Question 7 for guidance. Recall also that [latex]y[\/latex] represents the actual value of the response variable and [latex]\\hat{y}[\/latex] represents the predicted value.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 11<\/h3>\n<p>If the observed data point lies close to the line of best fit, then the residual is _______ zero.<\/p>\n<ol>\n<li>a) Close to<\/li>\n<li>b) Far from<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q573663\">Hint<\/span><\/p>\n<div id=\"q573663\" class=\"hidden-answer\" style=\"display: none\">See Question 8 for guidance.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 12<\/h3>\n<p>If the observed data point lies far from the line of best fit, then the residual is _______ zero.<\/p>\n<ol>\n<li>a) Close to<\/li>\n<li>b) Far from<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q250275\">Hint<\/span><\/p>\n<div id=\"q250275\" class=\"hidden-answer\" style=\"display: none\">See Question 8 for guidance.<\/div>\n<\/div>\n<\/div>\n<p>You've spent some good time developing an understanding of how residuals are interpreted and calculated. It's time to move to the course materials in this section to put your understanding to good use!<\/p>\n","protected":false},"author":428269,"menu_order":17,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-3866","chapter","type-chapter","status-publish","hentry"],"part":4241,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3866","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/users\/428269"}],"version-history":[{"count":10,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3866\/revisions"}],"predecessor-version":[{"id":4811,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3866\/revisions\/4811"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/parts\/4241"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3866\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/media?parent=3866"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapter-type?post=3866"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/contributor?post=3866"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/license?post=3866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}