Learning Outcomes
- Interpret the coefficient of determination in context
The Coefficient of Determination
The variable [latex]\mathbf{r^2}[/latex] is called the coefficient of determination and is the square of the correlation coefficient, but is usually stated as a percent, rather than in decimal form. It has an interpretation in the context of the data:
- r2, when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.
- 1 – r2, when expressed as a percentage, represents the percent of variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.
Consider the example (example 2 aka the Third Exam vs Final Exam Example) introduced in the previous section:
- The line of best fit is [latex]\displaystyle\hat{{y}}=-{173.51}+{4.83}{x}[/latex]
- The correlation coefficient is r = 0.6631
- The coefficient of determination is r2 = 0.66312 = 0.4397
- Interpretation of r2 in the context of this example:
- Approximately 44% of the variation (0.4397 is approximately 0.44) in the final-exam grades can be explained by the variation in the grades on the third exam using the best-fit regression line.
- Therefore, approximately 56% of the variation (1 – 0.44 = 0.56) in the final exam grades can NOT be explained by the variation in the grades on the third exam using the best-fit regression line. (This is seen as the scattering of the points about the line).