Summary: Testing the Significance of the Correlation Coefficient

Key Concepts

  • The null hypothesis is the population correlation coefficient is not significantly different from zero. This means there is not a significant linear relationship (correlation) between x and y. The line should not be used for making predictions.
  • The alternate hypothesis is the population correlation coefficient is significantly different from zero. This means there is a significant linear relationship (correlation) between x and y in the population. The line should be used for making predictions.
  • The p-value is calculated using [latex]n - 2[/latex] degrees of freedom and the test statistic is [latex]t = \frac{r \sqrt{n-2}}{\sqrt{1-r^2}}[/latex].
  • A critical value approach is an alternative method to doing a test of significance for a correlation coefficient.
    • There are assumptions that need to be verified before doing the test of significance for a correlation coefficient. They are as follows:
      • The underlying relationship is a linear relationship.
      • The y values for any particular x value are normally distributed about the line.
      • The standard deviations of the population y values about the line are equal for each value of x. There is no pattern in a plot of the residuals.
      • The data are produced from a well-designed, random sample or randomized experiment.

Glossary

Coefficient of Correlation: a measure developed by Karl Pearson (early 1900s) that gives the strength of association between the independent variable and the dependent variable; the formula is

[latex]\LARGE r = \frac{n \sum{(xy)} - (\sum{x})(\sum{y})}{\sqrt{[n \sum{x^2} - (\sum{x})^2 ] [ n \sum{y^2} - (\sum{y})^2 ]}} [/latex]

where [latex]n[/latex] is the number of data points. The coefficient cannot be more than 1 or less than –1. The closer the coefficient is to ±1, the stronger the evidence of a significant linear relationship between x and y.