{"id":3878,"date":"2022-03-15T23:25:07","date_gmt":"2022-03-15T23:25:07","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/?post_type=chapter&#038;p=3878"},"modified":"2022-06-14T19:39:16","modified_gmt":"2022-06-14T19:39:16","slug":"forming-connections-in-6-e","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/forming-connections-in-6-e\/","title":{"raw":"Forming Connections in 6.E: Calculating Predicted Values of the Response Variable","rendered":"Forming Connections in 6.E: Calculating Predicted Values of the Response Variable"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>Objectives for this activity<\/h3>\r\nDuring this activity you will:\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Use the line of best fit for prediction.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Identify for which range(s) of the explanatory variable the line should not be used to make predictions.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Calculate a residual.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Use a residual to determine if the line overpredicted or underpredicted the value of the response for a given observation.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Calculate the standard error of the residuals.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Use the standard error of the residuals to evaluate the accuracy of predictions from the line of best fit.<\/li>\r\n<\/ul>\r\n<\/div>\r\nIn the\u00a0<em>What to Know<\/em> assignment preceding this activity, you summarized everything you've learned so far about linear regression analysis by performing one to predict the price of a house based on its size. In this activity, we'll use the line of best fit to make predictions in a given scenario while learning about some new techniques. You will understand through this activity that a<span style=\"font-size: 1em;\">\u00a0line of best fit can be used to predict the value of the response variable for a given value of the explanatory variable but that s<\/span><span style=\"font-size: 1em;\">ometimes there are values of the explanatory variable in which the line of best fit should not be used for prediction, since predicting for these values would entail extrapolation. You'll see that t<\/span><span style=\"font-size: 1em;\">here is some error in each prediction; the line overpredicts for some observations and underpredicts for others. And you'll learn that the s<\/span><span style=\"font-size: 1em;\">tandard error of the residuals can be used to evaluate the accuracy of predictions from the line as part of the overall assessment of the usefulness of the line for the data.<\/span>\r\n<h2>Movie Ratings<\/h2>\r\n<img class=\"alignnone wp-image-1307\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12203611\/Picture189-300x201.jpg\" alt=\"People smiling and laughing in a movie theater\" width=\"1019\" height=\"683\" \/>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 1<\/h3>\r\nWhen deciding if you want to watch a movie, do you rely more on what professional movie critics think about a movie or what other regular moviegoers think about a movie?\r\n\r\n[reveal-answer q=\"864252\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"864252\"]What do <em>you\u00a0<\/em>think?[\/hidden-answer]\r\n\r\n<\/div>\r\nIn this in-class activity, we will use data from the movie ratings website <em>Rotten Tomatoes<\/em> (rottentomatoes.com). On this website, movie critics write reviews and regular moviegoers submit ratings (1\u20135 stars) for movies and TV shows. In this activity, we'll focus on 125 movies from the website. We\u2019ll use the following variables during today\u2019s activity.\r\n<p style=\"padding-left: 30px;\"><em>tomatometer:<\/em> The \u201cTomatometer\u201d score calculated as the percentage of professional movie and TV critics who write a positive review for the movie; the original name of this variable is <em>rottentomatoes<\/em><\/p>\r\n<p style=\"padding-left: 30px;\"><em>audience_score:<\/em> The percentage of the general public (regular moviegoers) who rate the movie 3.5 stars or higher (out of 5 stars); the original name of this variable is <em>rottentomatos_user.<\/em><\/p>\r\n\r\n<div class=\"textbox tryit\">\r\n<h3>Guidance<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[Intro: What was your answer to Question 1? Do you tend to care more about the technical qualities of a highly acclaimed film or do you just like what you like and want to hear what other people thought of a movie? Either way, a site like\u00a0<em>Rotten Tomatoes<\/em>\u00a0can help because it provides scores from both critics and regular moviegoers. You may want to try it out for yourself by going to rottentomatoes.com and searching for a movie you are interested in. If you do, what is the Tomatometer score? What is the audience score? For example, for the 2019 live-action remake of\u00a0<em>The Lion King<\/em>, the Tomatometer score is 52% while the audience score is 88%. Why do you think there is such a large discrepancy between the critics' score and the audience score? What types of factors do critics evaluate? How about audiences? Questions 2 and 3 below are a review of what you've already learned during this module. Answer them briefly among your group to assess your comfort level with these ideas. You should not spend much time on them.]<\/span>\r\n\r\n<\/div>\r\n<h3>\u00a0Line of Best Fit<\/h3>\r\nCritics often see and review a movie before it's released to the general public, so you want to use the line of best fit to predict how the general public (including you and your friends) will like a movie based on what the critics think.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 2<\/h3>\r\nWhat are the explanatory and response variables? Briefly explain how you made this determination.\r\n\r\n[reveal-answer q=\"465587\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"465587\"]Use what you know about these variable types to answer this question.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 3<\/h3>\r\nMake a scatterplot of the two variables using the <em>DCMP Linear Regression<\/em> tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/\">https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/<\/a> and select the \u201cMovie Ratings\u201d dataset.\r\n\r\n&nbsp;\r\n\r\nPart A: Use the scatterplot to describe the relationship between the Tomatometer and audience scores.\r\n\r\n[reveal-answer q=\"512984\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"512984\"]Use what you know about the shape and spread of a plot to answer this question.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: Use the tool to calculate the line of best fit. Write the equation of the line using customized variable names.\r\n\r\n[reveal-answer q=\"670338\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"670338\"]Use what you know from previous experience to answer this question.[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Extrapolation<\/h3>\r\nRecall that you learned about extrapolation in\u00a0<em>Forming Connections [<span style=\"background-color: #ffff00;\">6B<\/span>]<\/em>, where it was defined as\u00a0is the prediction of a response value using an explanatory variable value that is outside the range of the original data. Use this idea to answer Question 4 below.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 4<\/h3>\r\nYou and your friends want to watch a movie, and you\u2019re considering five movies recommended by your peers. You have the Tomatometer score for each movie, but you want to get an idea of how a regular moviegoer might enjoy the movie to help you decide. To help figure this out, you decide to use your line of best fit to predict what the audience score is based on the Tomatometer score.\r\n\r\nWhen calculating predicted values using a line of best fit, we should use it to calculate the predicted response for values of the explanatory variable within the range of values that are in the dataset. Using the model to predict for values of the explanatory variable far outside the range in our data is called<strong> extrapolation<\/strong>. We were introduced to extrapolation in <em>Forming Connections [6B]\u00a0<\/em>when determining if it was reasonable to interpret the estimated y-intercept. We should avoid extrapolation in practice, since it is unreliable to assume the same line will best describe the relationship between the explanatory and response variables outside the range of our data.\r\n\r\nPart A: If we used the line of best fit from the previous question to calculate predicted audience scores, for which Tomatometer scores would the estimates be considered extrapolation?\r\n[reveal-answer q=\"9034\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"9034\"]Use the scatterplot to describe the range of the Tomatometer values. What is the smallest value given? The largest? [\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: The following table shows the five movies you and your friends are considering, along with their Tomatometer scores. For which movie would making a prediction be considered an extrapolation?\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>Movie<\/strong><\/td>\r\n<td><strong>Tomatometer<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Aladdin (2019)<\/td>\r\n<td>57<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Fantastic Four (2015)<\/td>\r\n<td>9<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Parasite<\/td>\r\n<td>98<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>The Grinch<\/td>\r\n<td>58<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Avengers: Age of Ultron<\/td>\r\n<td>75<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Chaos Walking<\/td>\r\n<td>22<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"784032\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"784032\"]You may mark the scores of interest in the table above on the scatterplot. Are any of the scores of interest outside the range of Tomatometer scores shown on the scatterplot?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Prediction<\/h3>\r\nWithin the range of the explanatory variable, we can use the line of best fit to make predictions. Do this to answer Question 5. You'll see in Question 6 that the line of best fit may over- or under-predict the value of the response variable for a given observation.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 5<\/h3>\r\nYou are interested in calculating the predicted audience scores given the Tomatometer scores for the remaining five movies. In the following questions, you will calculate predictions and evaluate the accuracy of these predictions.\r\n\r\nYou can use the equation of the line directly to find the predicted values or allow the\u00a0<em>DCMP Linear Regression<\/em> tool to perform the calculation for you. Under Regression Options, click Find Predicted Value and enter the Tomatometer score as the x-Value.\r\n\r\n&nbsp;\r\n\r\nPart A: Complete the following table by calculating the predicted audience scores given the Tomatometer scores. Round your answer to 3 decimal places.\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>Movie<\/strong><\/td>\r\n<td><strong>Tomatometer<\/strong><\/td>\r\n<td><strong>Predicted audience score<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Aladdin (2019)<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Parasite<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>The Grinch<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Avengers: Age of Ultron<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Chaos Walking<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"498201\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"498201\"]Retrieve the Tomatometer score for the movies from the previous question.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: Based on the predicted audience scores, which movie will you and your friends watch?\r\n\r\n[reveal-answer q=\"281756\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"281756\"]Use the data to answer this question, not your personal preference.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 6<\/h3>\r\nThe actual audience scores for each movie are shown in the following table.\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>Movie<\/strong><\/td>\r\n<td><strong>Tomatometer<\/strong><\/td>\r\n<td><strong>Audience score<\/strong><\/td>\r\n<td><strong>Letter on scatterplot<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Aladdin (2019)<\/td>\r\n<td>57<\/td>\r\n<td>94<\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Parasite<\/td>\r\n<td>98<\/td>\r\n<td>90<\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>The Grinch<\/td>\r\n<td>58<\/td>\r\n<td>50<\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Avengers: Age of Ultron<\/td>\r\n<td>75<\/td>\r\n<td>83<\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Chaos Walking<\/td>\r\n<td>22<\/td>\r\n<td>72<\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nPart A: The following is a scatterplot of the audience score versus the Tomatometer. The movies from the previous question are red and labeled A through E on the plot. Fill in the previous table with the letter corresponding to each movie.\r\n\r\n<img class=\"alignnone wp-image-1308\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12203616\/Picture190-300x158.png\" alt=\"A scatterplot of &quot;Audience Score vs. Tomatometer with new observations.&quot; The horizontal axis is labeled &quot;Tomatometer&quot; and is labeled in increments of 20, starting at 20 and going up to 100. The vertical axis is labeled &quot;Audience Score&quot; and is also numbered in increments of 20, starting at 40 and going to 80. There are five points on the graph labeled with letters. Point A is at (58, 50), Point B is at (57, 94), Point C is at (22, 72), Point D is at (98, 90), and Point E is at (75, 83). There is a line of best fit that extends from approximately (20, 38) to approximately (100, 84). It travels above point A and below all the other labeled points.\" width=\"1108\" height=\"584\" \/>\r\n\r\n[reveal-answer q=\"794015\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"794015\"]Locate the point on the plot that corresponds to the information in the table for Tomatometer and actual Audience Score.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: Did the line of best fit overpredict or underpredict the audience score for your movie? Explain using the scatterplot.\r\n\r\n[reveal-answer q=\"677835\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"677835\"]Was the actual Audience Score higher (an underprediction) or lower (an overprediction)?[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart C: Look at the movies where the line underpredicted the audience score versus those where the line overpredicted the audience score. Are these results surprising? Explain.\r\n\r\n[reveal-answer q=\"378397\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"378397\"]What do <em>you\u00a0<\/em>think?[\/hidden-answer]\r\n\r\n<\/div>\r\nWe\u2019ve seen how the line of best fit can be used to calculate predicted values, so now we want to make a general assessment of the accuracy of predictions from the line. To do so, we\u2019ll look at the distribution of residuals specifically focusing on the variability.\r\n<h3>Residual Standard Error<\/h3>\r\nThe<strong> residual standard error<\/strong>, [latex]s_e[\/latex], is a measure of the variability in the residuals. It is the typical error we expect in predictions using the line of best fit. It is a way to quantify the spread of the points around the line of best fit on the scatterplot.\r\n\r\nA large residual standard error indicates there is a lot of spread in the scatter of the points around the line of best fit and thus more variability in the residuals.\r\n\r\nIf all the data points fit perfectly on the line, the line is a perfect fit for the data and the residual standard error will be zero. This scenario almost never occurs in practice, since there is rarely data with observations that fall in a perfect line.\r\n\r\nOne thing to keep in mind is that the regression standard error has the same units as the response variable. Therefore, you want to keep the response variable, units, and context of the data in mind as you use the residual standard error to evaluate how well the line fits the data.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 7<\/h3>\r\nThe formula for the residual standard error is:\r\n\r\n[latex]s_e = \\sqrt{\\dfrac{1}{n-2}\\left(y_i-\\hat{y}_i\\right)^{2}}[\/latex]\r\n\r\nIn practice, you will use technology to calculate this value.\r\n\r\nPart A: Use the regression tool to calculate the residual standard error. This can be found in the \u201cModel Summary\u201d under the value of the coefficient of determination.\r\n\r\n[reveal-answer q=\"806179\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"806179\"]Locate the value of the Residual Standard Deviation in the data analysis tool.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: Based on the residual standard error, would you recommend using this line of best fit to predict the audience score for a movie based on the Tomatometer? Explain.\r\n\r\n[reveal-answer q=\"422645\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"422645\"]Consider the magnitude of the residual standard error. Do you think it is large with respect to the context of the data? Provide your reasoning. There is no right or wrong answer to this question. You must support your decision carefully.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox tryit\">\r\n<h3>Guidance<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[Wrap-up: Were you surprised by any of the predictions from the line of best fit? How do you think you can reasonably use the line to make predictions within a dataset such as this one? What does the line tell you about the relationship between how critics like a move versus how general audiences feel about a move? Again, we see that statistics provides us mathematical tools for analyzing and understanding a relationship present in the data, but it is up to us to make any decisions based on that relationship only after careful and thorough consideration. ]<\/span>\r\n\r\n<\/div>\r\n&nbsp;","rendered":"<div class=\"textbox learning-objectives\">\n<h3>Objectives for this activity<\/h3>\n<p>During this activity you will:<\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Use the line of best fit for prediction.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Identify for which range(s) of the explanatory variable the line should not be used to make predictions.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Calculate a residual.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Use a residual to determine if the line overpredicted or underpredicted the value of the response for a given observation.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Calculate the standard error of the residuals.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Use the standard error of the residuals to evaluate the accuracy of predictions from the line of best fit.<\/li>\n<\/ul>\n<\/div>\n<p>In the\u00a0<em>What to Know<\/em> assignment preceding this activity, you summarized everything you&#8217;ve learned so far about linear regression analysis by performing one to predict the price of a house based on its size. In this activity, we&#8217;ll use the line of best fit to make predictions in a given scenario while learning about some new techniques. You will understand through this activity that a<span style=\"font-size: 1em;\">\u00a0line of best fit can be used to predict the value of the response variable for a given value of the explanatory variable but that s<\/span><span style=\"font-size: 1em;\">ometimes there are values of the explanatory variable in which the line of best fit should not be used for prediction, since predicting for these values would entail extrapolation. You&#8217;ll see that t<\/span><span style=\"font-size: 1em;\">here is some error in each prediction; the line overpredicts for some observations and underpredicts for others. And you&#8217;ll learn that the s<\/span><span style=\"font-size: 1em;\">tandard error of the residuals can be used to evaluate the accuracy of predictions from the line as part of the overall assessment of the usefulness of the line for the data.<\/span><\/p>\n<h2>Movie Ratings<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1307\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12203611\/Picture189-300x201.jpg\" alt=\"People smiling and laughing in a movie theater\" width=\"1019\" height=\"683\" \/><\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 1<\/h3>\n<p>When deciding if you want to watch a movie, do you rely more on what professional movie critics think about a movie or what other regular moviegoers think about a movie?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q864252\">Hint<\/span><\/p>\n<div id=\"q864252\" class=\"hidden-answer\" style=\"display: none\">What do <em>you\u00a0<\/em>think?<\/div>\n<\/div>\n<\/div>\n<p>In this in-class activity, we will use data from the movie ratings website <em>Rotten Tomatoes<\/em> (rottentomatoes.com). On this website, movie critics write reviews and regular moviegoers submit ratings (1\u20135 stars) for movies and TV shows. In this activity, we&#8217;ll focus on 125 movies from the website. We\u2019ll use the following variables during today\u2019s activity.<\/p>\n<p style=\"padding-left: 30px;\"><em>tomatometer:<\/em> The \u201cTomatometer\u201d score calculated as the percentage of professional movie and TV critics who write a positive review for the movie; the original name of this variable is <em>rottentomatoes<\/em><\/p>\n<p style=\"padding-left: 30px;\"><em>audience_score:<\/em> The percentage of the general public (regular moviegoers) who rate the movie 3.5 stars or higher (out of 5 stars); the original name of this variable is <em>rottentomatos_user.<\/em><\/p>\n<div class=\"textbox tryit\">\n<h3>Guidance<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[Intro: What was your answer to Question 1? Do you tend to care more about the technical qualities of a highly acclaimed film or do you just like what you like and want to hear what other people thought of a movie? Either way, a site like\u00a0<em>Rotten Tomatoes<\/em>\u00a0can help because it provides scores from both critics and regular moviegoers. You may want to try it out for yourself by going to rottentomatoes.com and searching for a movie you are interested in. If you do, what is the Tomatometer score? What is the audience score? For example, for the 2019 live-action remake of\u00a0<em>The Lion King<\/em>, the Tomatometer score is 52% while the audience score is 88%. Why do you think there is such a large discrepancy between the critics&#8217; score and the audience score? What types of factors do critics evaluate? How about audiences? Questions 2 and 3 below are a review of what you&#8217;ve already learned during this module. Answer them briefly among your group to assess your comfort level with these ideas. You should not spend much time on them.]<\/span><\/p>\n<\/div>\n<h3>\u00a0Line of Best Fit<\/h3>\n<p>Critics often see and review a movie before it&#8217;s released to the general public, so you want to use the line of best fit to predict how the general public (including you and your friends) will like a movie based on what the critics think.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 2<\/h3>\n<p>What are the explanatory and response variables? Briefly explain how you made this determination.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q465587\">Hint<\/span><\/p>\n<div id=\"q465587\" class=\"hidden-answer\" style=\"display: none\">Use what you know about these variable types to answer this question.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 3<\/h3>\n<p>Make a scatterplot of the two variables using the <em>DCMP Linear Regression<\/em> tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/\">https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/<\/a> and select the \u201cMovie Ratings\u201d dataset.<\/p>\n<p>&nbsp;<\/p>\n<p>Part A: Use the scatterplot to describe the relationship between the Tomatometer and audience scores.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q512984\">Hint<\/span><\/p>\n<div id=\"q512984\" class=\"hidden-answer\" style=\"display: none\">Use what you know about the shape and spread of a plot to answer this question.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: Use the tool to calculate the line of best fit. Write the equation of the line using customized variable names.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q670338\">Hint<\/span><\/p>\n<div id=\"q670338\" class=\"hidden-answer\" style=\"display: none\">Use what you know from previous experience to answer this question.<\/div>\n<\/div>\n<\/div>\n<h3>Extrapolation<\/h3>\n<p>Recall that you learned about extrapolation in\u00a0<em>Forming Connections [<span style=\"background-color: #ffff00;\">6B<\/span>]<\/em>, where it was defined as\u00a0is the prediction of a response value using an explanatory variable value that is outside the range of the original data. Use this idea to answer Question 4 below.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 4<\/h3>\n<p>You and your friends want to watch a movie, and you\u2019re considering five movies recommended by your peers. You have the Tomatometer score for each movie, but you want to get an idea of how a regular moviegoer might enjoy the movie to help you decide. To help figure this out, you decide to use your line of best fit to predict what the audience score is based on the Tomatometer score.<\/p>\n<p>When calculating predicted values using a line of best fit, we should use it to calculate the predicted response for values of the explanatory variable within the range of values that are in the dataset. Using the model to predict for values of the explanatory variable far outside the range in our data is called<strong> extrapolation<\/strong>. We were introduced to extrapolation in <em>Forming Connections [6B]\u00a0<\/em>when determining if it was reasonable to interpret the estimated y-intercept. We should avoid extrapolation in practice, since it is unreliable to assume the same line will best describe the relationship between the explanatory and response variables outside the range of our data.<\/p>\n<p>Part A: If we used the line of best fit from the previous question to calculate predicted audience scores, for which Tomatometer scores would the estimates be considered extrapolation?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q9034\">Hint<\/span><\/p>\n<div id=\"q9034\" class=\"hidden-answer\" style=\"display: none\">Use the scatterplot to describe the range of the Tomatometer values. What is the smallest value given? The largest? <\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: The following table shows the five movies you and your friends are considering, along with their Tomatometer scores. For which movie would making a prediction be considered an extrapolation?<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Movie<\/strong><\/td>\n<td><strong>Tomatometer<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Aladdin (2019)<\/td>\n<td>57<\/td>\n<\/tr>\n<tr>\n<td>Fantastic Four (2015)<\/td>\n<td>9<\/td>\n<\/tr>\n<tr>\n<td>Parasite<\/td>\n<td>98<\/td>\n<\/tr>\n<tr>\n<td>The Grinch<\/td>\n<td>58<\/td>\n<\/tr>\n<tr>\n<td>Avengers: Age of Ultron<\/td>\n<td>75<\/td>\n<\/tr>\n<tr>\n<td>Chaos Walking<\/td>\n<td>22<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q784032\">Hint<\/span><\/p>\n<div id=\"q784032\" class=\"hidden-answer\" style=\"display: none\">You may mark the scores of interest in the table above on the scatterplot. Are any of the scores of interest outside the range of Tomatometer scores shown on the scatterplot?<\/div>\n<\/div>\n<\/div>\n<h3>Prediction<\/h3>\n<p>Within the range of the explanatory variable, we can use the line of best fit to make predictions. Do this to answer Question 5. You&#8217;ll see in Question 6 that the line of best fit may over- or under-predict the value of the response variable for a given observation.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 5<\/h3>\n<p>You are interested in calculating the predicted audience scores given the Tomatometer scores for the remaining five movies. In the following questions, you will calculate predictions and evaluate the accuracy of these predictions.<\/p>\n<p>You can use the equation of the line directly to find the predicted values or allow the\u00a0<em>DCMP Linear Regression<\/em> tool to perform the calculation for you. Under Regression Options, click Find Predicted Value and enter the Tomatometer score as the x-Value.<\/p>\n<p>&nbsp;<\/p>\n<p>Part A: Complete the following table by calculating the predicted audience scores given the Tomatometer scores. Round your answer to 3 decimal places.<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Movie<\/strong><\/td>\n<td><strong>Tomatometer<\/strong><\/td>\n<td><strong>Predicted audience score<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Aladdin (2019)<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Parasite<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>The Grinch<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Avengers: Age of Ultron<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Chaos Walking<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q498201\">Hint<\/span><\/p>\n<div id=\"q498201\" class=\"hidden-answer\" style=\"display: none\">Retrieve the Tomatometer score for the movies from the previous question.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: Based on the predicted audience scores, which movie will you and your friends watch?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q281756\">Hint<\/span><\/p>\n<div id=\"q281756\" class=\"hidden-answer\" style=\"display: none\">Use the data to answer this question, not your personal preference.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 6<\/h3>\n<p>The actual audience scores for each movie are shown in the following table.<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Movie<\/strong><\/td>\n<td><strong>Tomatometer<\/strong><\/td>\n<td><strong>Audience score<\/strong><\/td>\n<td><strong>Letter on scatterplot<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Aladdin (2019)<\/td>\n<td>57<\/td>\n<td>94<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Parasite<\/td>\n<td>98<\/td>\n<td>90<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>The Grinch<\/td>\n<td>58<\/td>\n<td>50<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Avengers: Age of Ultron<\/td>\n<td>75<\/td>\n<td>83<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Chaos Walking<\/td>\n<td>22<\/td>\n<td>72<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Part A: The following is a scatterplot of the audience score versus the Tomatometer. The movies from the previous question are red and labeled A through E on the plot. Fill in the previous table with the letter corresponding to each movie.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1308\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12203616\/Picture190-300x158.png\" alt=\"A scatterplot of &quot;Audience Score vs. Tomatometer with new observations.&quot; The horizontal axis is labeled &quot;Tomatometer&quot; and is labeled in increments of 20, starting at 20 and going up to 100. The vertical axis is labeled &quot;Audience Score&quot; and is also numbered in increments of 20, starting at 40 and going to 80. There are five points on the graph labeled with letters. Point A is at (58, 50), Point B is at (57, 94), Point C is at (22, 72), Point D is at (98, 90), and Point E is at (75, 83). There is a line of best fit that extends from approximately (20, 38) to approximately (100, 84). It travels above point A and below all the other labeled points.\" width=\"1108\" height=\"584\" \/><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q794015\">Hint<\/span><\/p>\n<div id=\"q794015\" class=\"hidden-answer\" style=\"display: none\">Locate the point on the plot that corresponds to the information in the table for Tomatometer and actual Audience Score.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: Did the line of best fit overpredict or underpredict the audience score for your movie? Explain using the scatterplot.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q677835\">Hint<\/span><\/p>\n<div id=\"q677835\" class=\"hidden-answer\" style=\"display: none\">Was the actual Audience Score higher (an underprediction) or lower (an overprediction)?<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part C: Look at the movies where the line underpredicted the audience score versus those where the line overpredicted the audience score. Are these results surprising? Explain.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q378397\">Hint<\/span><\/p>\n<div id=\"q378397\" class=\"hidden-answer\" style=\"display: none\">What do <em>you\u00a0<\/em>think?<\/div>\n<\/div>\n<\/div>\n<p>We\u2019ve seen how the line of best fit can be used to calculate predicted values, so now we want to make a general assessment of the accuracy of predictions from the line. To do so, we\u2019ll look at the distribution of residuals specifically focusing on the variability.<\/p>\n<h3>Residual Standard Error<\/h3>\n<p>The<strong> residual standard error<\/strong>, [latex]s_e[\/latex], is a measure of the variability in the residuals. It is the typical error we expect in predictions using the line of best fit. It is a way to quantify the spread of the points around the line of best fit on the scatterplot.<\/p>\n<p>A large residual standard error indicates there is a lot of spread in the scatter of the points around the line of best fit and thus more variability in the residuals.<\/p>\n<p>If all the data points fit perfectly on the line, the line is a perfect fit for the data and the residual standard error will be zero. This scenario almost never occurs in practice, since there is rarely data with observations that fall in a perfect line.<\/p>\n<p>One thing to keep in mind is that the regression standard error has the same units as the response variable. Therefore, you want to keep the response variable, units, and context of the data in mind as you use the residual standard error to evaluate how well the line fits the data.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 7<\/h3>\n<p>The formula for the residual standard error is:<\/p>\n<p>[latex]s_e = \\sqrt{\\dfrac{1}{n-2}\\left(y_i-\\hat{y}_i\\right)^{2}}[\/latex]<\/p>\n<p>In practice, you will use technology to calculate this value.<\/p>\n<p>Part A: Use the regression tool to calculate the residual standard error. This can be found in the \u201cModel Summary\u201d under the value of the coefficient of determination.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q806179\">Hint<\/span><\/p>\n<div id=\"q806179\" class=\"hidden-answer\" style=\"display: none\">Locate the value of the Residual Standard Deviation in the data analysis tool.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: Based on the residual standard error, would you recommend using this line of best fit to predict the audience score for a movie based on the Tomatometer? Explain.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q422645\">Hint<\/span><\/p>\n<div id=\"q422645\" class=\"hidden-answer\" style=\"display: none\">Consider the magnitude of the residual standard error. Do you think it is large with respect to the context of the data? Provide your reasoning. There is no right or wrong answer to this question. You must support your decision carefully.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox tryit\">\n<h3>Guidance<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[Wrap-up: Were you surprised by any of the predictions from the line of best fit? How do you think you can reasonably use the line to make predictions within a dataset such as this one? What does the line tell you about the relationship between how critics like a move versus how general audiences feel about a move? Again, we see that statistics provides us mathematical tools for analyzing and understanding a relationship present in the data, but it is up to us to make any decisions based on that relationship only after careful and thorough consideration. ]<\/span><\/p>\n<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"author":428269,"menu_order":24,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-3878","chapter","type-chapter","status-publish","hentry"],"part":4241,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3878","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/users\/428269"}],"version-history":[{"count":12,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3878\/revisions"}],"predecessor-version":[{"id":4869,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3878\/revisions\/4869"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/parts\/4241"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3878\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/media?parent=3878"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapter-type?post=3878"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/contributor?post=3878"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/license?post=3878"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}