Overview
- Students will use a line of best fit to calculate predicted values of a response variable given values of the explanatory variable. They will also use the standard error of the residuals to evaluate the expected accuracy of the model predictions and the usefulness of the line.
- This activity builds upon previous activities from the regression unit, in particular the activity on introducing residuals. Students will practice calculating predictions and identifying cases of extrapolation.
- Data from the movie rating website Rotten Tomatoes will be used. The dataset is pulled from the “Fandango” dataset in the FiveThirtyEight R package.[1] It contains ratings for 125 movies with a Tomatometer score of 20 or higher.
- This activity connects back to simple linear regression and residuals, and prepares students for multiple linear regression.
- [a list of tags like S2, O1, B1, V3] ← Link to EBTP descriptions
Prerequisite assumptions
Students should be able to do each of the following after completing the What to Know assignment.
- Use a scatterplot to describe bivariate relationships.
- Approximate predicted values from a scatterplot.
- Calculate predictions using the line of best fit.
- Assess reliability of a prediction calculated using the line of best fit.
Students should recall each of the following skills from previous areas in this course.
- Use technology to make a scatterplot. ([Section 5A])
- Identify the explanatory and response variables for a given scenario. ([Section 6A])
- Use technology to calculate a line of best fit. ([Section 6A])
- Interpret the slope and intercept. ([Section 6B])
Intended goals for this activity
After completing this activity, students should understand that a line of best fit can be used to predict the value of the response variable for a given value of the explanatory variable, but that there are values that should not be used for prediction since it would result in extrapolation. They should understand that there is error in each prediction as the line over- or under-predicts for some observations, and that the standard error of the residuals can be used to evaluate the accuracy of predictions from the line. They should be able to use the line of best fit for prediction and identify for which range(s) of the explanatory variable the line should not be used to make predictions. They should be able to calculate a residual and determine if the line over- or under-predicted the value of the response for a given observations, and they should be able to calculate the standard error of the residuals to evaluate the accuracy of predictions from the line of best fit.
Synchronous Delivery and Activity Flow
The sample activity delivery below assumes a face-to-face class meeting but can be adapted to a fully online or hybrid delivery by using break-out rooms for pairs and small groups.
Frame the activity (7 minutes)
- Question 1 — Whole Class Discussion S4, C3, V1, O1, B2, B4
- Have students read Question 1 independently and ask a few to share their responses.
- Show students the Rotten Tomatoes website and explain the variables used in today’s activity.
- Ask the class for a movie suggestion and go to the Rotten Tomatoes page for that movie. Use that site to explain the Tomatometer and audience scores. Here is an example using the 2019 live-action remake of The Lion King:[2]
- The Tomatometer score is 52%. This means 52% of professional movie critics wrote a positive review.
- The audience score is 88%. This means 88% of regular moviegoers (who also rate movies on Rotten Tomatoes) gave the movie a score of 3.5 stars or higher (out of 5 stars).
- If time permits, ask the class why they think there is such a large discrepancy between the critics’ score and the audience score. Think about what types of factors critics consider when evaluating a movie versus what factors the general audience might consider.
- Transition to the in-class activity by briefly discussing the Objectives for the activity.
Activity Flow (15 minutes)
- Questions 1–3 — Working in Groups V1, V4, O3, S2, C6
- These questions are a review of work students have done throughout the unit. Students should only spend about five minutes on these questions to leave ample time for the questions about prediction and extrapolation.
- You can use a quick poll to make sure groups have correctly identified the explanatory and response variables.
- Question 4 — Working in groups with direct instruction as needed C5, O1, O2
- Students will revisit the idea of extrapolation. As you circulate the room, ask groups their predictions for which values of Tomatometer scores would be extrapolation.
- If students are having trouble figuring out the range of Tomatometer values in the dataset, ask them to describe the range of values using the scatterplot.
- If multiple groups seem to have trouble with extrapolation you can pause the class for instruction on the topic.
- Project the scatterplot from Question 2 (or ask students to view it on their devices) and ask students to describe the range of Tomatometer scores on the x-axis.
- Mark the Tomatometer score for the five movies in the worksheet.
- Ask students which movie has a Tomatometer score that is far away from the points on the scatterplot. This movie is Fantastic Four, with a Tomatometer score of 9. Predicting for this movie would be extrapolation.
- Explain to students that it would be unreliable to use the line of best fit to predict for this movie, since there were no data in the original dataset to inform the model about the relationship between the Tomatometer and audience scores for movies with very low Tomatometer scores.
- Question 5 — Working in groups with direct instruction as needed C5, O1, O2
- Students may calculate the predicted values using the equation of the line of best fit or obtain the predictions directly from software.
- If students are struggling with how to calculate the predicted value, you can use the following brief explanation:
- Given a value of the explanatory variable, the line of best fit can be used to calculate a predicted value of the response. To do so, input the value of the explanatory variable in the equation and solve to get the predicted response. For example, suppose you have the following equation: [latex]\hat{y}=5+3.4x[/latex]
- Based on this model, when the explanatory variable [latex]x=6[/latex], the predicted value of the response variable [latex]y[/latex] will be [latex]5+3.4*6=20.4[/latex].
- Question 6 — Working in groups V1, V4, O3, S2, C6
- As you circulate the room, ask groups to share the letters they assigned to one or two movies and if the line overpredicted or underpredicted.
- You can also ask students how they know if the model overpredicted or underpredicted from looking at the scatterplot.
- Question 7 — Working in groups V1, V4, O3, S2, C6
- This question briefly introduces residual standard error, [latex]s_{e}[/latex]. The goal is for students to use this value to consider the general accuracy in the predictions produced by the line of best fit.
- As they answer Part B, they should be considering the magnitude of [latex]s_{e}[/latex] and whether they think this value is large given the context of the data. There is no single correct answer; the important part is their reasoning for their response.
- If you paused the activity for whole class instruction on extrapolation, there may not be enough time in class for students to work on these questions. If this is the case, you can assign these questions as part of the practice activity following class.
Wrap-up/transition (3 minutes)
- Close the activity by asking a few groups to share if they were surprised by any of the predictions from the line. You can also ask groups to share what this line tells them about the relationship between how critics like a movie versus how general audiences feel about a movie.
- If groups finished Question 7, you could ask them to share whether they think this model is a good fit and useful for predicting the audience score based on the Tomatometer.
- Have students refer back to the Objectives for the activity and check the ones they recognize.
- Assign the homework or Practice and any What to Know pages for the Forming Connections activities you plan to complete in the next class meeting. C2