13C Preview

Preparing for the next class 

In the next in-class activity, you will need to be able to apply the steps for a hypothesis  test to compare two population means and compare results of a hypothesis test to the  corresponding confidence interval using appropriate notation.

The Maternal Smoking Study 

There are many studies that link maternal smoking to lower birth weights, premature  births, and miscarriages. Researchers in the early 1960s collected birth weights, dates,  and gestational periods as part of the Child Health and Development Studies organization in 1961 and 1962. Information about the babies’ parents—age, education,  height, weight, and whether the mother smoked—was also recorded. The variables included in the dataset are:

gestation: Length of gestation (in days)

wt: Weight (in ounces)

age: Mother’s age in years at termination of pregnancy

smoke: Does mother smoke? (never, smokes now, until current pregnancy, once  did, not now)

smoke_now: Does mother currently smoke?

“Yes” includes “smokes now”

“No” includes responses of “until current pregnancy,” “once did,” “not  now,” and “never.”

Ten observations are presented in the following table. The full dataset is found in spreadsheet DCMP_STAT_13C_Maternal_Smoke.

gestation smoke smoke_now age wt
284 never No 27 120
282 never No 33 113
279 now Yes 28 128
282 now Yes 23 108
286 until current pregnancy No 25 136
244 never No 33 138
245 never No 23 132
289 never No 25 120
299 now Yes 30 143
351 once did, not now No 27 140

Question 1

1) Suppose we wanted to study the difference in birth weight of babies born to mothers  who smoked during pregnancy (smoke_now = yes) and mothers who did not smoke  during pregnancy.

a) Clearly define the two populations of interest.

Recall that when we are interested in estimating a difference in population means,  we usually start with data from a sample from each of the populations of interest.

There are two different strategies for selecting the two samples. One strategy is to select a sample from one population and then independently select a sample from  the second population. Using this strategy results in two samples where the  individuals selected for the first sample do not influence the individuals selected for  the second sample.

This would be the case if you take a random sample from each population. Samples  selected in this way are said to be independent samples.

b) Can the samples defined in this study be considered independent? Explain.

Question 2

2) Using the DCMP Describing and Exploring Quantitative Variables – Several Groups tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/, describe the mean,  standard deviation, and sample sizes for each group defined in Question 1.

Complete the following table, which represents notation that we can use to help us  distinguish the sample mean, standard deviation, and sample size for the subjects in  Group 1 vs. the subjects in Group 2.

Group 1:

smoke_now = Yes

Mothers who

smoked during

pregnancy

Group 2:

smoke_now = No

Mothers who did not  smoke during

pregnancy

Sample Mean [latex]\bar{x}_{1}=[/latex] _____ [latex]\bar{x}_{2}=[/latex] _____
Sample Standard  Deviation [latex]s_{1}=[/latex] ______ [latex]s_{2}=[/latex] ______
Sample Size [latex]n_{1}=[/latex] _____ [latex]n_{1}=[/latex] _____

Hint: Use spreadsheet DCMP_STAT_13C_Maternal_Smoke!

Question 3

3) One way to compare the means of two groups is by looking at the difference of the means.

a) Write an expression that represents the difference between the two sample  means using the notation in the table you completed in Question 2.

b) Write an expression to represent the difference between the population  means.

c) What would be the value of the difference between the population means if  there was no difference between the groups?

Hint: If there was no difference between the population means, [latex]\mu_{1}=\mu_{2}[/latex]. Think about  the value you would get if you subtracted [latex]\mu_{2}[/latex] from[latex]\mu_{1}[/latex].

d) Describe, in the context of the study, what it means if there was no difference between the two groups.

When we are interested in estimating a difference in population means using data from  independent samples, we will use a two-sample t confidence interval (In-Class Activity  12.D) or a two-sample t-test.

The conditions that you need to check for the two-sample t-test are the same as a two sample t confidence interval, presented in Preview Assignment 12.D:

  1. The samples are independent.
  2. Each sample is a random sample from the corresponding population of interest  or it is reasonable to regard the sample as random. It is reasonable to regard the  sample as a random sample if it was selected in a way that should result in a  sample that is representative of the population. If the data are from an  experiment, we just need to check that there was random assignment to  experimental groups—this substitutes for the random sample condition  and also results in independent samples.
  3. For each population, the distribution of the variable that was measured is  approximately normal, or the sample size for the sample from that population is  large. Usually, a sample of size 30 or more is considered to be “large.” If a  sample size is less than 30, you should look at a plot of the data from that  sample (a dotplot, a boxplot, or, if the sample size isn’t really small, a histogram)  to make sure that the distribution looks approximately symmetric and that there  are no outliers.

Question 4

4) Does the maternal smoking study satisfy the conditions for a two-sample t-test?

Question 5

5) We can use a hypothesis test to determine if the observed difference in sample  means is consistent with a hypothesized difference in population means.

To do this, we use what we know about the sampling distribution of [latex]\bar{x}_{1}-\bar{x}_{2}[/latex] and, in particular, its estimated standard deviation (the standard error). Recall from In-Class  Activity 12.D that you learned that the difference in the sample means, [latex]\bar{x}_{1}-\bar{x}_{2}[/latex], also has an approximately normal distribution, centered at the difference of the population means, [latex]\bar{x}_{1}-\bar{x}_{2}[/latex]. The standard deviation is given by the following formula:

[latex]\sqrt{\frac{\sigma^{2}_{1}}{n_{1}}+\frac{\sigma^{2}_{2}}{n_{2}}}[/latex]

In practice, we will have to estimate the standard deviation because it depends on  the unknown population standard deviations. Replacing [latex]\sigma_{1}[/latex] and [latex]\sigma_{2}[/latex] by the sample standard deviations [latex]s_{1}[/latex] and [latex]s_{2}[/latex], we get the standard error of the difference:

[latex]standard\;error\;of\;\bar{x}_{1}-\bar{x}_{2}=\sqrt{\frac{s^{2}_{1}}{n_{1}}+\frac{s^{2}_{2}}{n_{2}}}[/latex]

a) Calculate the estimated difference in the means in Question 2.

b) Calculate the standard error for the distribution using the statistics from Question 2. Round your answer to the nearest hundredth.

Hint: Use the formula [latex]SE=\sqrt{\frac{s^{2}_{1}}{n_{1}}+\frac{s^{2}_{2}}{n_{2}}}[/latex]

c) Interpret the meaning of this value.

Question 6

6) Use the DCMP Describing and Exploring Quantitative Variables – Several Groups tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/ to visualize the  difference in means between the two groups defined in Question 2 using histograms.

Question 7

7) Briefly describe the difference (or lack thereof) between the two groups. Do you think  there is a significant difference between the birth weights of babies born to mothers  who smoked during pregnancy versus those who did not? Be prepared to share your  conclusions in class.

This analysis uses descriptive statistics only. How can we make an inference about the  difference when the population refers to all pregnant women? We will answer this  question in the next in-class activity.