In the next preview assignment and in the next class, you will need to identify population parameters of interest and then determine whether a given sample statistic is evidence that the parameter actually does not have a particular assumed value.
Surprising Water Samples
The next in-class activity uses the context of the Flint, Michigan water crisis (beginning in 2014). During that event, the city of Flint switched water sources, and the residents began to suspect that their water was contaminated with lead. The Michigan Department of Environmental Quality (DEQ) claimed that the water was compliant with federal regulations and that there was not a contamination problem. The federal guidelines for water safety state that a city is compliant if at least 90% of water samples obtained from residences have lead in the water under 15 parts per billion (ppb). A water sample is “contaminated” if it contains lead at 15 ppb or above. In other words, under 10% of residences would need to be contaminated in order for the city to be compliant.
The residents of Flint suspected that more than 10% of the homes in the city had contaminated water, so they took their own sample of residences with support from scientists at Virginia Tech. They tested the water from 271 homes as part of the Flint Water Study (FWS), and they recorded the proportion of homes which had contaminated water.[1]
Question 1
What is the population of interest in this study?
- All of the residences in Flint, Michigan
- The 271 homes from which the water was tested
- All of the water in Flint, Michigan
- All of the residences in Flint, Michigan that had contaminated water e) All of the residences in the 271 sampled that had contaminated water
Question 2
What is the sample being considered in this study?
- All of the residences in Flint, Michigan
- The 271 homes from which the water was tested
- All of the water in Flint, Michigan
- All of the residences in Flint, Michigan that had contaminated water
- All of the residences in the 271 sampled that had contaminated water
Question 3
What is the parameter of interest in this study?
- The mean lead level from all homes in Flint
- The mean lead level from all homes in the sample
- The proportion of homes in Flint which had contaminated water
- The proportion of homes in the sample which had contaminated water
Question 4
Which sample statistic is being considered in the study?
- The mean lead level from all homes in Flint
- The mean lead level from all homes in the sample
- The proportion of homes in Flint which had contaminated water
- The proportion of homes in the sample which had contaminated water
Question 5
The logic behind hypothesis testing involves looking at a result from a sample and determining whether that result is enough evidence to reject a previous assumption.
In this context, the baseline assumption was that the DEQ was correct and the city of Flint was compliant with federal water safety regulations, so the proportion of residences with contaminated water was not over 10%. It was the FWS’s responsibility to provide evidence that the city was really not compliant and that there was actually a higher proportion of homes with contaminated water.
Since it was the FWS’s responsibility to provide evidence against the baseline assumption that Flint was compliant with federal regulations, they had to assume Flint was compliant and then see if their sample result was surprising. A surprising result would have been evidence that the assumption was wrong and that there was actually a higher proportion of homes with contaminated water.
This is a conditional statement because we assume a certain condition had to be true (in this case, that Flint was compliant with federal water safety regulations) and are drawing conclusions based on that assumption.
The question we want to ask here is, “If Flint was compliant with federal water safety regulations, was the FWS sample result surprising?”
- What does a “surprising event” refer to in the probability context?
- The event has a high probability of occurring.
- The event has a low probability of occurring.
- There is a 50/50 chance the event will occur.
- If Flint was really compliant with federal water safety regulations, which would you most expect to be true?
- There was a large proportion of houses in Flint with contaminated water.
- There was a fairly large proportion of houses in Flint with contaminated water.
- There was a moderate proportion of houses in Flint with contaminated water.
- There was a very small proportion of houses in Flint with contaminated water.
- If Flint was really compliant with federal water safety regulations, which event would be most surprising?
- A smaller proportion of houses in the sample having contaminated water
- A larger proportion of houses in the sample having contaminated water
- No houses in the sample having contaminated water
- Explain your answer to Part c.
Question 6
The FWS found that a large proportion of houses in their sample had contaminated water. What might you conclude from that?
Hint: Remember that the FWS was working under the assumption that Flint was compliant with federal water safety regulations, as found by the DEQ, so they were seeking evidence that this assumption was wrong.
- It is very unlikely that a large proportion of houses had contaminated water if the water source was actually contaminated, so the sample was strong evidence that the DEQ was incorrect.
- It is very likely that a large proportion of houses had contaminated water if the water source was actually contaminated, so the sample was not strong evidence that the DEQ was incorrect.
- It is very unlikely that a large proportion of houses had contaminated water if the water source was not contaminated, so the sample was strong evidence that the DEQ was incorrect.
- It is very likely that a large proportion of houses had contaminated water if the water source was actually contaminated, so the sample was not strong evidence that the DEQ was incorrect.
Summary: When using statistical hypothesis testing to test an assumption:
- We start with a baseline assumption about a population parameter of interest (this is called the null hypothesis), and we assume this to be true.
- We collect a sample, and we ask whether that sample is surprising if our baseline assumption is actually true.
- If the sample result is very surprising (i.e., very unlikely), we then have strong evidence that our starting assumption was incorrect.
Practice
The American Community Survey (ACS) is a survey of around 3.5 million homes in the United States and Puerto Rico conducted annually by the U.S. Census Bureau. Aggregate data from the ACS are released every five years. The overall ACS poverty rate from 2015 to 2019 was reported to be 13.4%.[2] Suppose that some new economic reforms were made in the years after 2019, and you want to test whether or not the poverty rate has decreased.
Question 7
If the poverty rate has not decreased, what poverty rate would you expect to find in a random sample of American adults?
Question 8
If the poverty rate has not decreased, what poverty rate would be very surprising to obtain in a sample?
- Barry-Jester, A. M. (2016, January 26). What went wrong in Flint. FiveThirtyEight. https://fivethirtyeight.com/features/what-went-wrong-in-flint-water-crisis-michigan/ ↵
- U.S. Census Bureau. (2020, December 10). Census Bureau releases new American community survey 5-year estimates. https://www.census.gov/newsroom/press-releases/2020/acs-5-year.html ↵