Learning Outcomes
- Conduct a one-way ANOVA and interpret the conclusion in context
Here are some facts about the F distribution.
- The curve is not symmetrical but skewed to the right.
- There is a different curve for each set of dfs.
- The F statistic is greater than or equal to zero.
- As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.
- Other uses for the F distribution include comparing two variances and two-way Analysis of Variance. Two-Way Analysis is beyond the scope of this chapter.
data:image/s3,"s3://crabby-images/3a623/3a6235d149f538718a9e9a8e1195fdf2dd568346" alt="This graph has an unmarked Y axis and then an X axis that ranges from 0.00 to 4.00. It has three plot lines. The plot line labelled F subscript 1, 5 starts near the top of the Y axis at the extreme left of the graph and drops quickly to near the bottom at 0.50, at which point is slowly decreases in a curved fashion to the 4.00 mark on the X axis. The plot line labelled F subscript 100, 100 remains at Y = 0 for much of its length, except for a distinct peak between 0.50 and 1.50. The peak is a smooth curve that reaches about half way up the Y axis at its peak. The plot line labeled F subscript 5, 10 increases slightly as it progresses from 0.00 to 0.50, after which it peaks and slowly decreases down the remainder of the X axis. The peak only reaches about one fifth up the height of the Y axis."
Example 1
Let’s return to the slicing tomato exercise in the last section (try it 1). The means of the tomato yields under the five mulching conditions are represented by μ1, μ2, μ3, μ4, μ5. We will conduct a hypothesis test to determine if all means are the same or at least one is different. Using a significance level of 5%, test the null hypothesis that there is no difference in mean yields among the five groups against the alternative hypothesis that at least one mean is different from the rest.
Show Answer
The null and alternative hypotheses are:
H0: μ1 = μ2 = μ3 = μ4 = μ5
Ha: μi ≠ μj some i ≠ j
The one-way ANOVA results are shown below.
Source of Variation |
Sum of Squares (SS) |
Degrees of Freedom (df) |
Mean Square (MS) |
F |
Factor (Between) |
36,648,561 |
5 – 1 = 4 |
[latex]\displaystyle\frac{{{36},{648},{561}}}{{4}}={9},{162},{140}[/latex] |
[latex]\displaystyle\frac{{{9},{162},{140}}}{{{2},{044},{672.6}}}={4.4810}[/latex] |
Error (Within) |
20,446,726 |
15 – 5 = 10 |
[latex]\displaystyle\frac{{{20},{446},{726}}}{{10}}={2},{044},{672.6}[/latex] |
|
Total |
57,095,287 |
15 – 1 = 14 |
|
|
Distribution for the test: F4,10
df(num) = 5 – 1 = 4
df(denom) = 15 – 5 = 10
Test statistic: F = 4.4810
Graph:
data:image/s3,"s3://crabby-images/6b8c7/6b8c75eaa8f3b43a0f09e04e6977cdaf061726d3" alt="This graph shows a nonsymmetrical F distribution curve. The horizontal axis extends from 0 - 5, and the vertical axis ranges from 0 - 0.7. The curve is strongly skewed to the right."
Probability Statement: p-value = P(F > 4.481) = 0.0248.
Compare α and the p-value: α = 0.05, p-value = 0.0248
Make a decision: Since α > p-value, we reject H0.
Conclusion: At the 5% significance level, we have reasonably strong evidence that differences in mean yields for slicing tomato plants grown under different mulching conditions are unlikely to be due to chance alone. We may conclude that at least some of the mulches led to different mean yields.
USING THE TI-83, 83+, 84, 84+ CALCULATOR
- Press STAT. Press 1:EDIT. Put the data into the lists L1, L2, L3, L4, L5.
- Press STAT, arrow over to TESTS, and arrow down to ANOVA. Press ENTER, and then enter L1, L2, L3, L4, L5).
- Press ENTER. You will see that the values in the foregoing ANOVA table are easily produced by the calculator, including the test statistic and the p-value of the test.
- The calculator displays:
- F = 4.4810
- p = 0.0248 (p-value)
- Factor df = 4
- SS = 36648560.9
- MS = 9162140.23
- Error df = 10
- SS = 20446726
- MS = 2044672.6
try it 1
MRSA, or Staphylococcus aureus, can cause serious bacterial infections in hospital patients. This table shows various colony counts from different patients who may or may not have MRSA.
Conc = 0.6 |
Conc = 0.8 |
Conc = 1.0 |
Conc = 1.2 |
Conc = 1.4 |
9 |
16 |
22 |
30 |
27 |
66 |
93 |
147 |
199 |
168 |
98 |
82 |
120 |
148 |
132 |
Plot of the data for the different concentrations:
data:image/s3,"s3://crabby-images/59dd9/59dd9dae7202f54146c26a172a85a7c1c6125484" alt="This graph is a scatterplot for the data provided. The horizontal axis is labeled 'Colony counts' and extends from 0 - 200. The vertical axis is labeled 'Tryptone concentrations' and extends from 0.6 - 1.4."
Test whether the mean number of colonies is the same or different. Construct the ANOVA table (by hand or by using a TI-83, 83+, or 84+ calculator), find the p-value, and state your conclusion. Use a 5% significance level.
Show Answer
While there are differences in the spreads between the groups, the differences do not appear to be big enough to cause concern.
We test for the equality of the mean number of colonies:
H0 : μ1 = μ2 = μ3 = μ4 = μ5Ha: μi ≠ μj some i ≠ j
The one-way ANOVA table results are shown below.
Source of Variation |
Sum of Squares (SS) |
Degrees of Freedom (df) |
Mean Square (MS) |
F |
Factor (Between) |
10,233 |
5 – 1 = 4 |
[latex]\displaystyle\frac{{{10},{233}}}{{4}}={2},{558.25}[/latex] |
[latex]\displaystyle\frac{{{2},{558.25}}}{{{4},{194.9}}}={0.6099}[/latex] |
Error (Within) |
41,949 |
15 – 5 = 10 |
|
|
Total |
52,182 |
15 – 1 = 14 |
[latex]\displaystyle\frac{{{41},{949}}}{{10}}={4},{194.9}[/latex] |
|
Graph:
data:image/s3,"s3://crabby-images/8a4f8/8a4f86c0c27c5f441b3786d28fe3407aa538c1bb" alt="This graph shows a nonsymmetrical F distribution curve. The curve is skewed to the right. A vertical upward line extends from 0.6649 to the curve. This line is just to the right of the graph's peak and the region to the right of the line is shaded to represent the p-value."
Distribution for the test: F4,10
Probability Statement: p-value = P(F > 0.6099) = 0.6649.
Compare α and the p-value: α = 0.05, p-value = 0.669, α < p-value
Make a decision: Since α < p-value, we do not reject H0.
Conclusion: At the 5% significance level, there is insufficient evidence from these data that different levels of tryptone will cause a significant difference in the mean number of bacterial colonies formed.
Example 2
Four sororities took a random sample of sisters regarding their grade means for the past term. The results are shown in the table.
Mean Grades for Four Sororities
Sorority 1 |
Sorority 2 |
Sorority 3 |
Sorority 4 |
2.17 |
2.63 |
2.63 |
3.79 |
1.85 |
1.77 |
3.78 |
3.45 |
2.83 |
3.25 |
4.00 |
3.08 |
1.69 |
1.86 |
2.55 |
2.26 |
3.33 |
2.21 |
2.45 |
3.18 |
Using a significance level of 1%, is there a difference in mean grades among the sororities?
Show Answer
Let μ1, μ2, μ3, μ4 be the population means of the sororities. Remember that the null hypothesis claims that the sorority groups are from the same normal distribution. The alternate hypothesis says that at least two of the sorority groups come from populations with different normal distributions. Notice that the four sample sizes are each five.
Note
This is an example of a balanced design because each factor (i.e., sorority) has the same number of observations.
H0: μ1 = μ2 = μ3 = μ4
Ha: Not all of the means μ1, μ2, μ3, μ4 are equal.
Distribution for the test: F3,16
where k = 4 groups and n = 20 samples in total
df(num)= k – 1 = 4 – 1 = 3
df(denom) = n – k = 20 – 4 = 16
Calculate the test statistic: F = 2.23
Graph:
data:image/s3,"s3://crabby-images/c4bfe/c4bfe928195bca2c043add5d96fb7c2d914048cb" alt="This graph shows a nonsymmetrical F distribution curve with values of 0 and 2.23 on the x-axis representing the test statistic of sorority grade averages. The curve is slightly skewed to the right, but is approximately normal. A vertical upward line extends from 2.23 to the curve and the area to the right of this is shaded to represent the p-value."
Probability statement: p-value = P(F > 2.23) = 0.1241
Compare α and the p-value: α = 0.01
p-value = 0.1241
α < p-value
Make a decision: Since α < p-value, you cannot reject H0.
Conclusion: There is not sufficient evidence to conclude that there is a difference among the mean grades for the sororities.
USING THE TI-83, 83+, 84, 84+ CALCULATOR
- Put the data into lists L1, L2, L3, and L4. Press
STAT
and arrow over to TESTS
. Arrow down to F:ANOVA
. Press ENTER
and Enter (L1,L2,L3,L4
).
- The calculator displays the F statistic, the p-value and the values for the one-way ANOVA table:
- F = 2.2303
- p = 0.1241 (p-value)
- Factor df = 3
- SS = 2.88732
- MS = 0.96244
- Error df = 16
- SS = 6.9044
- MS = 0.431525
try it 2
Four sports teams took a random sample of players regarding their GPAs for the last year. The results are shown below:
GPAs for Four Sports Teams
Basketball |
Baseball |
Hockey |
Lacrosse |
3.6 |
2.1 |
4.0 |
2.0 |
2.9 |
2.6 |
2.0 |
3.6 |
2.5 |
3.9 |
2.6 |
3.9 |
3.3 |
3.1 |
3.2 |
2.7 |
3.8 |
3.4 |
3.2 |
2.5 |
Use a significance level of 5%, and determine if there is a difference in GPA among the teams.
Show Answer
With a p-value of 0.9271, we decline to reject the null hypothesis. There is not sufficient evidence to conclude that there is a difference among the GPAs for the sports teams.
Example 3
A fourth-grade class is studying the environment. One of the assignments is to grow bean plants in different soils. Tommy chose to grow his bean plants in soil found outside his classroom mixed with dryer lint. Tara chose to grow her bean plants in potting soil bought at the local nursery. Nick chose to grow his bean plants in soil from his mother’s garden. No chemicals were used on the plants, only water. They were grown inside the classroom next to a large window. Each child grew five plants. At the end of the growing period, each plant was measured, producing the data (in inches) in this table.
Tommy’s Plants |
Tara’s Plants |
Nick’s Plants |
24 |
25 |
23 |
21 |
31 |
27 |
23 |
23 |
22 |
30 |
20 |
30 |
23 |
28 |
20 |
Does it appear that the three media in which the bean plants were grown produce the same mean height? Test at a 3% level of significance.
Show Answer
This time, we will perform the calculations that lead to the F’statistic. Notice that each group has the same number of plants, so we will use the formula [latex]\displaystyle{F}'={\dfrac{n \cdot {s_{\overline x}}^{2}}{{s^2}_{pooled}}}[/latex].
First, calculate the sample mean and sample variance of each group.
|
Tommy’s Plants |
Tara’s Plants |
Nick’s Plants |
Sample Mean |
24.2 |
25.4 |
24.4 |
Sample Variance |
11.7 |
18.3 |
16.3 |
Next, calculate the variance of the three group means (Calculate the variance of 24.2, 25.4, and 24.4). Variance of the group means = 0.413 = [latex]{s_{\overline x}}^{2}[/latex]
Then [latex]\displaystyle{M}{S}_{{\text{between}}}={n}{s_{\overline x}}^{2}={({5})}{({0.413})}[/latex] where [latex]n = {5}[/latex] is the sample size (number of plants each child grew).
Calculate the mean of the three sample variances (Calculate the mean of 11.7, 18.3, and 16.3). Mean of the sample variances = 15.433 = s2pooled
Then [latex]\displaystyle{M}{S}_{{\text{within}}}[/latex] = s2pooled = 15.433.
The F statistic (or F ratio) is [latex]{\text{F}}[/latex] = [latex]{\dfrac{MS_{between}}{MS_{within}}}[/latex] = [latex]\dfrac{n{s_{\overline x}}^2}{{s^2}_{pooled}}[/latex] = [latex]{\dfrac{(5)(0.413)}{15.433}}[/latex] = 0.134
The dfs for the numerator = the number of groups – 1 = 3 – 1 = 2.
The dfs for the denominator = the total number of samples – the number of groups = 15 – 3 = 12
The distribution for the test is F2,12 and the F statistic is F = 0.134
The p-value is P(F > 0.134) = 0.8759.
Decision: Since α = 0.03 and the p-value = 0.8759, do not reject H0. (Why?)
Conclusion: With a 3% level of significance, from the sample data, the evidence is not sufficient to conclude that the mean heights of the bean plants are different.
USING THE TI-83, 83+, 84, 84+ CALCULATOR
To calculate the p-value:
- Press
2nd DISTR
- Arrow down to
Fcdf
(and pressENTER
.
- Enter 0.134,
E99
, 2, 12)
- Press
ENTER
- The p-value is 0.8759.
try it 3
Another fourth-grader also grew bean plants, but this time in a jelly-like mass. The heights were (in inches) 24, 28, 25, 30, and 32. Do a one-way ANOVA test on the four groups. Are the heights of the bean plants different? Use the same method as shown in Example 3.
Show Answer
- F = 0.9496
- p-value = 0.4402
From the sample data, the evidence is not sufficient to conclude that the mean heights of the bean plants are different.
From the class, create four groups of the same size as follows: identified as men under 22, identified as men at least 22, identified as women under 22, identified as women at least 22. Have each member of each group record the number of states in the United States he or she has visited. Run an ANOVA test to determine if the average number of states visited in the four groups is the same. Test at a 1% level of significance.
Candela Citations
CC licensed content, Shared previously