Learning Outcomes
- Describe the strength and direction of a linear relationship from a correlation coefficient
Recall: Summation
The symbol Σ (Sigma) means to “add up” or sum everything that follows. For example, Σ(x) means to add all of the variables, x.
The Correlation Coefficient r
Besides looking at the scatter plot and seeing that a line seems reasonable, how can you tell if the line is a good predictor? Use the correlation coefficient as another indicator (besides the scatterplot) of the strength of the relationship between x and y.
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
The correlation coefficient is calculated as
[latex]{r}=\dfrac{{ {n}\sum{({x}{y})}-{(\sum{x})}{(\sum{y})} }} {{ \sqrt{\left[{n}\sum{x}^{2}-(\sum{x})^2\right]\left[{n}\sum{y}^{2}-(\sum{y})^2\right]}}}[/latex]
where n = the number of data points.
Recall: ORDER OF OPERATIONS
Please | Excuse | My | Dear | Aunt | Sally |
parentheses | exponents | multiplication | division | addition | subtraction |
[latex]( \ )[/latex] | [latex]x^2[/latex] | [latex]\times \ \mathrm{or} \ \div[/latex] | [latex]+ \ \mathrm{or} \ -[/latex] |
1st find the numerator, calculate [latex]n \sum (xy)[/latex] and [latex](\sum x)(\sum y)[/latex], then subtract them.
Step 1: To calculate [latex]n \sum (xy)[/latex] work inside the parentheses first by multiplying each data point, the [latex]x[/latex] multiplied by the [latex]y[/latex], this is called the product. Then add the product of each data point. Then, multiply by [latex]n[/latex], the number of data points.
Step 2: To calculate [latex](\sum x)(\sum y)[/latex] sum all of the independent [latex](x)[/latex] variables then sum all of the dependent variables and multiply these two sums together
Step 3: Subtract, Step 1 – Step 2.
2nd find the denominator, you will end up taking the square root of the entire bottom, a square root can be understood as a parenthesis.
Step 4: To calculate [latex][ n \sum (x^2)- (\sum x)^2][/latex], you can use some numbers found in the first step, you have already calculated [latex](\sum x)[/latex] so square the second number and subtract it from the first number, [latex]n \sum (x^2)[/latex], which is the sum of every independent variable squared and then multiplied by [latex]n[/latex], the number of data points.
Step 5: Then calculate [latex][ n \sum (y^2)-(\sum y)^2][/latex] , repeat the same process in step 4, but with the dependent variables instead.
Step 6: Multiply the value you got in Step 4 and Step 5.
Step 7: Find the square root of the value you found in step 6.
3rd take the numerator and divide by the denominator.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
- The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
- The size of the correlation r indicates the strength of the linear relationship between x and y. Values of r close to –1 or to +1 indicate a stronger linear relationship between x and y.
- If r = 0 there is absolutely no linear relationship between x and y (no linear correlation).
- If r = 1, there is perfect positive correlation. If r = –1, there is perfect negative correlation. In both these cases, all of the original data points lie in a straight line. Of course, in the real world, this will not generally happen.
What the SIGN of r tells us:
- A positive value of r means that when x increases, y tends to increase, and when x decreases, y tends to decrease (positive correlation).
- A negative value of r means that when x increases, y tends to decrease, and when x decreases, y tends to increase (negative correlation).
- The sign of r is the same as the sign of the slope, b, of the best-fit line.
Note
A strong correlation does not suggest that x causes y or y causes x. We say “correlation does not imply causation.”
(a) A scatter plot showing data with a positive correlation. 0 < r < 1
(b) A scatter plot showing data with a negative correlation. –1 < r < 0
(c) A scatter plot showing data with zero correlation. r = 0
The formula for r looks formidable. However, computer spreadsheets, statistical software, and many calculators can quickly calculate r. The correlation coefficient r is the bottom item in the output screens for the LinRegTTest on the TI-83, TI-83+, or TI-84+ calculator (see the previous section for instructions).