Putting It Together: Linear Regression and Correlation

Let’s Summarize

  • If high values of [latex]x[/latex] (independent variable) are associated with high values of [latex]y[/latex] (dependent variable), there is a positive relationship. If high values of [latex]x[/latex] (independent variable) are associated with low values of [latex]y[/latex] (dependent variable), there is a negative relationship.
  • The strength of a relationship can be seen based on how close the points are to a line or a curve.
  • The line that makes the sum of the squared residuals as small as possible is known as the least-squares regression line.
  • CAUTION: A least-squares regression line should not be used to make predictions outside the original range of the values of the independent variable.
  • The slope of the best-fit line tells us how the dependent variable [latex](y)[/latex] changes for every one-unit increase in the independent [latex](x)[/latex] variable, on average.
  • The correlation coefficient or [latex]r[/latex] shows the strength of a linear relationship.
  • The sign [latex](+ \ \mathrm{or} \ -)[/latex] of the correlation coefficient is the same as the sign of the slope of the least-squares regression line.
  • The coefficient of determination is the square of the correlation coefficient and it is the percentage of the dependent variable that can be explained by the variation of the independent variable by using the least-squares regression line.
  • The null hypothesis is the population correlation coefficient is not significantly different from zero. This means there is not a significant linear relationship (correlation) between [latex]x[/latex] and [latex]y[/latex]. The line should not be used for making predictions.
  • The alternate hypothesis is the population correlation coefficient is significantly different from zero. This means there is a significant linear relationship (correlation) between [latex]x[/latex] and [latex]y[/latex] in the population. The line should be used for making predictions.
  • To make a prediction for a given value of [latex]x[/latex], the value of [latex]x[/latex] is substituted into the least-squares regression equation.
  • The standard deviation of the residuals is the typical prediction error. It tells us approximately how far off the prediction could be if we use the least-squares regression line.