Preparing for the next class
In the next in-class activity, you will need to be able to calculate the mean of a difference and identify the differences between independent and dependent samples.
Dependent Samples vs. Independent Samples
Previously, you learned how to create confidence intervals and conduct hypothesis tests with a single variable. You also learned how to compare means or proportions from two samples. Some statistical studies use samples from more than one population. In order to compare the difference between two populations, it is important to identify if the samples are dependent (paired) or independent. Dependent and independent sample hypothesis tests are used to answer questions about the difference between two population means.
For dependent (paired) samples, the same variable is recorded for each sample, and there is a logical way to pair the observations from one sample with the observations in the other sample. In contrast, when samples are independently selected, the same variable is measured for both samples, but there is no logical way to pair an observation from one sample with a particular observation from the other sample.
For an example of paired samples, consider an investigation on the effectiveness of hypnosis in reducing pain. The variable could be the pain level of a patient, and it could be measured “before” hypnosis and then again “after” hypnosis for the same patient. This would result in two samples, one “before” pain measurement and one “after” pain measurement, and there would be a logical pairing of the “before” measurement with the “after” measurement for the same person. This form of pairing, often referred to as “pre/post,” is not the only situation where paired samples can be used. Other cases involve using “natural pairs,” such as twins, siblings, or couples. In either case, it is not reasonable that the measurement from one sample is not related to the measurement in the second sample.
Questions 1–7: Use the previous information to determine if the following situations would result in dependent or independent samples.
Question 1
1) A company that creates fishing accessories is researching two of their most popular fishing rods. The company collects a random sample of the number of sales for each fishing rod from 100 of their stores.
Question 2
2) The North Carolina Zoo is researching whether their animals are more active in the morning or in the evening. An employee at the zoo visits each habitat in the zoo and collects information for the study. The employee counts how many of each species is visible in the morning and then visits a second time to count how many of each species is visible during the evening.
Question 3
3) A company that creates blood pressure medicine is researching the effectiveness of their new blood pressure medicine. The company conducts a study in which volunteers are randomly assigned to two groups. One group is given the new medication and the other group continues to take their current blood pressure medicine.
Question 4
4) The same company that creates blood pressure medicine is still researching the effectiveness of their new blood pressure medicine. The company conducts a second study in which volunteers are all given the new medication. The blood pressure of each patient is measured before the study begins. The patients are all given the new medication for six weeks. The blood pressure of each patient is measured after the six-week period.
Question 5
5) A psychologist wants to know if children’s levels of anxiety are different if their parents are divorced. The psychologist decides to study 100 children from divorced parents and 100 children from non-divorced parents.
Question 6
6) The quality control manager at a manufacturing plant is investigating the production rate of two machines that were built with the same materials and the same design but were manufactured at two different plants.
Question 7
7) A statistics teacher wants to know if a curriculum is effective. The teacher conducts a pre-test, implements the curriculum, and then conducts a post-test on the same group of students. The scores on the pre-tests and post-tests are used to compare the difference in understanding of statistics before and after students completed the curriculum.
Question 8
8) Suppose you want to study the effectiveness of a diet. Suppose that eight people were randomly selected to participate in your study. The weight (lb) of each of the eight participants is recorded before and after the diet in the following table. You know from past studies that body weight is approximately normally distributed.
| Patient | Before | After |
| 1 | 150 | 146 |
| 2 | 160 | 159 |
| 3 | 200 | 200 |
| 4 | 178 | 174 |
| 5 | 190 | 189 |
| 6 | 167 | 160 |
| 7 | 151 | 148 |
| 8 | 210 | 198 |
| Mean [latex](\mu)[/latex] |
a) What is the average weight before and after the diet? Fill in your answers in the table.
b) On average, how many pounds did the participants lose? In other words, what is the estimated difference between the mean weight before and after the diet?
c) Are the two samples independent or dependent?
Question 9
9) Consider the previous example using this new table:
| Patient | Before | After | Difference [latex](d)[/latex] |
| 1 | 150 | 146 | 146–150 = −4 |
| 2 | 160 | 159 | 159–160 = −1 |
| 3 | 200 | 200 | |
| 4 | 178 | 174 | |
| 5 | 190 | 189 | |
| 6 | 167 | 160 | |
| 7 | 151 | 148 | |
| 8 | 210 | 198 | |
| Mean [latex](\mu)[/latex] |
a) How much weight did each individual lose? Complete the table by finding the difference in each participant’s weight (after−before).
b) Consider ONLY the difference variable. What is the average weight loss for the eight participants?
c) How does your answer to Question 9, Part B compare to your answer from Question 8, Part B?
Comparing Means from Two Dependent (Paired) Samples
We will use the individual differences, [latex]d[/latex], between each pair as our sample. A dependent or paired t-test compares the mean of the differences, [latex]\mu_{d}[/latex], to a hypothesized value, which is often 0. Thus, a dependent t-test is the same as a one sample t-test performed on the difference variable, [latex]d[/latex].
When thinking about the difference variable, we need to use a different calculation for the standard deviation of the estimate. The standard deviation of the difference in the sample means, [latex]\bar{x}_{1}-\bar{x}_{2}[/latex] is NOT the same as the standard deviation of the difference variable, denoted using [latex]s_{d}[/latex].
Since a dependent t-test is the same as a one-sample t-test on the mean of the difference variable, the assumptions for a paired t-test are the same as those discussed in In-Class Activity 13.B for a single sample hypothesis test for means.
Conditions for a One-Sample t-Test
- The sample is a random sample from the population of interest or it is reasonable to regard the sample as random. It is reasonable to regard the sample as a random sample if it was selected in a way that should result in a sample that is representative of the population.
- For each population, the distribution of the variable that was measured is approximately normal, or the sample size for the sample from that population is large. Usually, a sample of size 30 or more is considered to be “large.” If a sample size is less than 30, you should look at a plot of the data from that sample (a dotplot, a boxplot, or, if the sample size isn’t really small, a histogram) to make sure that the distribution looks approximately symmetric and that there are no outliers.
In summary, where [latex]k[/latex] is the value of the null hypothesis, we have:
| Null Hypothesis for Independent Samples | Null Hypothesis for Dependent Samples |
| [latex]H_{0}:\mu_{1}-\mu_{2}=k[/latex] | [latex]H_{0}:\mu_{d}=k[/latex] |
| Alternative Hypothesis for Independent Samples | Alternative Hypothesis for Dependent Samples |
| [latex]H_{A}:\mu_{1}-\mu_{2}>k[/latex] | [latex]H_{A}:\mu_{d}>k[/latex] |
[latex]H_{A}:\mu_{1}-\mu_{2}| [latex]H_{A}:\mu_{d} | |
| [latex]H_{A}:\mu_{1}-\mu_{2}\neq k[/latex] | [latex]H_{A}:\mu_{d}\neq k[/latex] |
The notations for the summary statistics used to compare paired populations/samples are shown in the following table. We will use[latex]d[/latex] to represent the difference variable.
| Summary Statistics | Notation |
| Population Mean of Difference | [latex]\mu_{d}[/latex] |
| Sample Mean of Difference | [latex]\bar{d}[/latex] |
| Population Standard Deviation of Difference | [latex]\sigma_{d}[/latex] |
| Sample Standard Deviation of Difference | [latex]s_{d}[/latex] |
Question 10
10) It is a common belief that using higher-octane fuel will improve the gas mileage of a vehicle. In order to test this claim, a mechanic randomly selects 12 customers to participate in a study. The mechanic puts 10 gallons of fuel in each participant’s car and asks participants to circle a racetrack until they run out of gas. Each participant is asked to perform this action two times, once with 87-octane fuel and another time with 92-octane fuel. The differences in miles driven (miles driven with 87-octane fuel and miles driven with 92-octane fuel) are calculated and recorded. The participants do not know which fuels they are using while they are driving around the racetrack.
What are the appropriate null and alternative hypotheses for this scenario?