Summary: The Regression Equation

Key Concepts

  • An estimated value of y for a given value of x is called y-hat, [latex]\hat{y}[/latex].
  • The line that makes the sum of the squared residuals as small as possible is known as the least-squares regression line.
  • CAUTION: A least-squares regression line should not be used to make predictions outside the original range of the values of the independent variable.
  • The slope of the best-fit line tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average.
  • The correlation coefficient or r shows the strength of a linear relationship.
  • The sign (+ or -) of the correlation coefficient is the same as the sign of the slope of the least-squares regression line.
  • The coefficient of determination is the square of the correlation coefficient and it is the percentage of dependent variable that can be explained by the variation of the independent variable from using the least-squares regression line.

Glossary

Coefficient of Correlation: a measure developed by Karl Pearson (early 1900s) that gives the strength of association between the independent variable and the dependent variable; the formula is

[latex]\LARGE r = \frac{n \sum{(xy)} - (\sum{x})(\sum{y})}{\sqrt{[n \sum{x^2} - (\sum{x})^2 ] [ n \sum{y^2} - (\sum{y})^2 ]}}[/latex]

where [latex]n[/latex] is the number of data points. The coefficient cannot be more than 1 or less than –1. The closer the coefficient is to ±1, the stronger the evidence of a significant linear relationship between x and y.

Residual: an observed value of y minus the predicted value. The residual for a given point is written as [latex]y- \hat{y}[/latex].