Corequisite Support Activity for 6.E: Calculating Predicted Values of the Response Variable

What you’ll need to know

In this support activity you’ll become familiar with the following:

  • Use technology to explore a linear relationship in bivariate data.
  • Use technology to determine whether the line of best fit is an appropriate fit for the data.

In the upcoming material and activity, you will need to be able to use technology to make scatterplots to visualize bivariate relationships, calculate a line of best fit, and interpret the slope and intercept for the line. You should be familiar with these skills after having completed the previous sections in this module. Use this corequisite activity to assess  your own understanding and get help with the skills as needed.

In this activity, we’ll use data collected during a study of the use of a hiking trail. You’ll use a data analysis tool to practice making a scatterplot, calculating a line of best fit, and interpreting its slope and intercept.

Users of Massachusetts Trail

The objective of this analysis is to explore the relationship between the daily high temperature and the number of users on a trail in Florence, Massachusetts. The information in the dataset was collected over 90 days between April 5, 2005 to November 15, 2005 by the Pioneer Valley Planning Commission (PVPC). The number of trail users was collected by a laser sensor set up at a data collection station. A user was recorded each time there was a break in the laser beam.

Linear Relationships

The dataset is called “Rail Trail” and is available in the DCMP Linear Regression tool at  https://dcmathpathways.shinyapps.io/LinearRegression/.

We will use the following variables in this corequisite support activity:

  • hightemp: Daily high temperature in degrees Fahrenheit
  • volume: Estimated number of trail users that day (calculated as number of breaks recorded)

We would like to use a line of best fit that can be used to predict the number of trail users on a given day based on the high temperature.

question 1

What is the response variable? Explain.

question 2

Let’s explore the relationship between the high temperature and the number of trail users.

 

Part A: Use the tool to make a scatterplot and describe the relationship between the two variables.

 

Part B: Use the tool to calculate a line of best fit for the data. Write the equation of the line using customized names for the variables (e.g., no x and y).

 

Part C: Interpret the slope in the context of the data.

 

Part D: Does the intercept have a meaningful interpretation? If so, interpret the intercept in the context of the data. Otherwise, explain why not.

Part E: Suppose the high temperature tomorrow is expected to be 60 degrees Fahrenheit. What would be your best guess for the number of users to expect on the trail tomorrow? Explain your response using the line of best fit.

Evaluate the Line of Best Fit

Let’s evaluate if the line is an appropriate fit for the data. Use the Fitted Values and Residual Analysis tab to make a scatterplot of the residuals versus predicted or “Fitted” values.

Use the tab at the top of the tool to change your view to Fitted Values and Residual Analysis. Choose “Versus Fitted Values” under Plot Residuals. Use the resulting residual plot to determine if the line is appropriate for the data.

question 3

Is the line an appropriate fit for the data?

Hopefully this activity provided you a good opportunity to assess your skills using the technology to explore bivariate data.