The Coefficient of Determination

Learning Outcomes

  • Interpret the coefficient of determination in context

The Coefficient of Determination

The variable [latex]\mathbf{r^2}[/latex] is called the coefficient of determination and is the square of the correlation coefficient, but is usually stated as a percent, rather than in decimal form. It has an interpretation in the context of the data:

  • r2, when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.
  • 1 – r2, when expressed as a percentage, represents the percent of variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.

Consider the example (example 2 aka the Third Exam vs Final Exam Example) introduced in the previous section:

  • The line of best fit is [latex]\displaystyle\hat{{y}}=-{173.51}+{4.83}{x}[/latex]
  • The correlation coefficient is r = 0.6631
  • The coefficient of determination is r2 = 0.66312 = 0.4397
  • Interpretation of r2 in the context of this example:
  • Approximately 44% of the variation (0.4397 is approximately 0.44) in the final-exam grades can be explained by the variation in the grades on the third exam using the best-fit regression line.
  • Therefore, approximately 56% of the variation (1 – 0.44 = 0.56) in the final exam grades can NOT be explained by the variation in the grades on the third exam using the best-fit regression line. (This is seen as the scattering of the points about the line).