13C Preview

Preparing for the next class

In the next in-class activity, you will need to be able to apply the steps for a hypothesis test to compare two population means and compare results of a hypothesis test to the corresponding confidence interval using appropriate notation.

The Maternal Smoking Study

There are many studies that link maternal smoking to lower birth weights, premature births, and miscarriages. Researchers in the early 1960s collected birth weights, dates, and gestational periods as part of the Child Health and Development Studies organization in 1961 and 1962. Information about the babies’ parents—age, education, height, weight, and whether the mother smoked—was also recorded. The variables included in the dataset are:

gestation: Length of gestation (in days)

wt: Weight (in ounces)

age: Mother’s age in years at termination of pregnancy

smoke: Does mother smoke? (never, smokes now, until current pregnancy, once did, not now)

smoke_now: Does mother currently smoke?

“Yes” includes “smokes now”

“No” includes responses of “until current pregnancy,” “once did,” “not now,” and “never.”

Ten observations are presented in the following table. The full dataset is found in spreadsheet DCMP_STAT_13C_Maternal_Smoke.

gestation	smoke	smoke_now	age	wt
284	never	No	27	120
282	never	No	33	113
279	now	Yes	28	128
282	now	Yes	23	108
286	until current pregnancy	No	25	136
244	never	No	33	138
245	never	No	23	132
289	never	No	25	120
299	now	Yes	30	143
351	once did, not now	No	27	140

Question 1

1) Suppose we wanted to study the difference in birth weight of babies born to mothers who smoked during pregnancy (smoke_now = yes) and mothers who did not smoke during pregnancy.

a) Clearly define the two populations of interest.

Recall that when we are interested in estimating a difference in population means, we usually start with data from a sample from each of the populations of interest.

There are two different strategies for selecting the two samples. One strategy is to select a sample from one population and then independently select a sample from the second population. Using this strategy results in two samples where the individuals selected for the first sample do not influence the individuals selected for the second sample.

This would be the case if you take a random sample from each population. Samples selected in this way are said to be independent samples.

b) Can the samples defined in this study be considered independent? Explain.

Question 2

2) Using the DCMP Describing and Exploring Quantitative Variables – Several Groups tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/, describe the mean, standard deviation, and sample sizes for each group defined in Question 1.

Complete the following table, which represents notation that we can use to help us distinguish the sample mean, standard deviation, and sample size for the subjects in Group 1 vs. the subjects in Group 2.

	Group 1: smoke_now = Yes Mothers who smoked during pregnancy	Group 2: smoke_now = No Mothers who did not smoke during pregnancy
Sample Mean	[latex]\bar{x}_{1}=[/latex] _____	[latex]\bar{x}_{2}=[/latex] _____
Sample Standard Deviation	[latex]s_{1}=[/latex] ______	[latex]s_{2}=[/latex] ______
Sample Size	[latex]n_{1}=[/latex] _____	[latex]n_{1}=[/latex] _____

Hint: Use spreadsheet DCMP_STAT_13C_Maternal_Smoke!

Question 3

3) One way to compare the means of two groups is by looking at the difference of the means.

a) Write an expression that represents the difference between the two sample means using the notation in the table you completed in Question 2.

b) Write an expression to represent the difference between the population means.

c) What would be the value of the difference between the population means if there was no difference between the groups?

Hint: If there was no difference between the population means, [latex]\mu_{1}=\mu_{2}[/latex]. Think about the value you would get if you subtracted [latex]\mu_{2}[/latex] from[latex]\mu_{1}[/latex].

d) Describe, in the context of the study, what it means if there was no difference between the two groups.

When we are interested in estimating a difference in population means using data from independent samples, we will use a two-sample t confidence interval (In-Class Activity 12.D) or a two-sample t-test.

The conditions that you need to check for the two-sample t-test are the same as a two sample t confidence interval, presented in Preview Assignment 12.D:

The samples are independent.
Each sample is a random sample from the corresponding population of interest or it is reasonable to regard the sample as random. It is reasonable to regard the sample as a random sample if it was selected in a way that should result in a sample that is representative of the population. If the data are from an experiment, we just need to check that there was random assignment to experimental groups—this substitutes for the random sample condition and also results in independent samples.
For each population, the distribution of the variable that was measured is approximately normal, or the sample size for the sample from that population is large. Usually, a sample of size 30 or more is considered to be “large.” If a sample size is less than 30, you should look at a plot of the data from that sample (a dotplot, a boxplot, or, if the sample size isn’t really small, a histogram) to make sure that the distribution looks approximately symmetric and that there are no outliers.

Question 4

4) Does the maternal smoking study satisfy the conditions for a two-sample t-test?

Question 5

5) We can use a hypothesis test to determine if the observed difference in sample means is consistent with a hypothesized difference in population means.

To do this, we use what we know about the sampling distribution of [latex]\bar{x}_{1}-\bar{x}_{2}[/latex] and, in particular, its estimated standard deviation (the standard error). Recall from In-Class Activity 12.D that you learned that the difference in the sample means, [latex]\bar{x}_{1}-\bar{x}_{2}[/latex], also has an approximately normal distribution, centered at the difference of the population means, [latex]\bar{x}_{1}-\bar{x}_{2}[/latex]. The standard deviation is given by the following formula:

[latex]\sqrt{\frac{\sigma^{2}_{1}}{n_{1}}+\frac{\sigma^{2}_{2}}{n_{2}}}[/latex]

In practice, we will have to estimate the standard deviation because it depends on the unknown population standard deviations. Replacing [latex]\sigma_{1}[/latex] and [latex]\sigma_{2}[/latex] by the sample standard deviations [latex]s_{1}[/latex] and [latex]s_{2}[/latex], we get the standard error of the difference:

[latex]standard\;error\;of\;\bar{x}_{1}-\bar{x}_{2}=\sqrt{\frac{s^{2}_{1}}{n_{1}}+\frac{s^{2}_{2}}{n_{2}}}[/latex]

a) Calculate the estimated difference in the means in Question 2.

b) Calculate the standard error for the distribution using the statistics from Question 2. Round your answer to the nearest hundredth.

Hint: Use the formula [latex]SE=\sqrt{\frac{s^{2}_{1}}{n_{1}}+\frac{s^{2}_{2}}{n_{2}}}[/latex]

c) Interpret the meaning of this value.

Question 6

6) Use the DCMP Describing and Exploring Quantitative Variables – Several Groups tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/ to visualize the difference in means between the two groups defined in Question 2 using histograms.

Question 7

7) Briefly describe the difference (or lack thereof) between the two groups. Do you think there is a significant difference between the birth weights of babies born to mothers who smoked during pregnancy versus those who did not? Be prepared to share your conclusions in class.

This analysis uses descriptive statistics only. How can we make an inference about the difference when the population refers to all pregnant women? We will answer this question in the next in-class activity.