11C Preview

Preparing for the next class

In the next in-class activity, you will need to be able to describe what a P-value measures and identify how a P-value is represented in a statistical distribution.

You recently learned that a test statistic is used in hypothesis testing to help measure evidence against a null hypothesis. A test statistic allows us to take standard error into account when considering the evidence provided by a sample statistic. However, recall that the test statistic comes from one observed sample of data. Based on your knowledge of sampling distributions, you know that there are lots of different samples that could have been obtained. Each sample would have its own test statistic.

In In-Class Activity 11.A, we learned that the evidence used in hypothesis testing is probability. The statistical evidence that we gather is always evidence in support of the alternative hypothesis and against the null hypothesis. We ask ourselves the question, “Do we have enough evidence to reject the null hypothesis?”

Question 1

According to a 2021 report published by the Federal Trade Commission (FTC), 29.4% of claims to the FTC in 2020 were due to identity theft cases.^[1] Suppose that in the state of Florida, a random sample of 500 claims to the FTC are observed.

Based on the published national percentage of 29.4%, about how many of the 500 Florida claims would you expect to be due to identity theft?
Hint: What is 29.4% of 500?
What is the null hypothesis value of [latex]p[/latex]?
In this example, we meet the following conditions for a one-sample z-test for proportions.
Conditions for One-Sample Z-Test for Proportions *MISSING LATEX*
1. Large Counts: Check that 10 and 10.
2. Random Samples/Assignment: Check that the samples are random samples.
3. 10% Population Size: Check that the sample size, , is less than 10% of the population size, : 0.10.
Thus, we can use the normal distribution to model the values of the sample proportion that would occur if the null hypothesis is true.

What are the mean and standard deviation of the null distribution? Round to the nearest ten thousandth.
Hint: Recall that the mean of all sample proportions is calculated by [latex]\mu= p[/latex] and the standard deviation of the sample proportions is [latex]\sigma =\sqrt{\frac{1-p}{n}} [/latex'
Consider the sampling distribution of the sample proportion. In In-Class Activity 11.B, we used the Empirical Rule to determine if a test statistic was “unusual.”Using the Empirical Rule, what value(s) of the test statistic would be “unusual?”Hint: When is a z-score considered “unusual?”

Based on the answers to Parts A and B, we would expect about 29.5% (147) of the claims to the FTC in Florida to be because of identity theft. What if 29.6% (148) were due to identity theft? Or 29.8% (149)? At what point does the number seem unusual? That is, at what point does it appear that Florida identity theft complaints surpass that of the national percentage? How do we measure that? In statistical testing, we define the P-value as the probability of obtaining a test statistic at least as extreme (in the direction of the alternative hypothesis) as the one that is actually seen if the null hypothesis is true. That is, a P-value answers the question: “How unlikely is the sample data given the null hypothesis is true?” It is important to remember that a P-value is a probability, which means that it is a number between 0 and 1.

The smaller the P-value is, the more unlikely it is to observe the sample data given the null hypothesis is true. Thus, the evidence against the null hypothesis is stronger and is in favor of the alternative hypothesis.
The larger the P-value is, the more likely it is to observe the sample data. Thus, the evidence against the null hypothesis is weaker.

Question 2

Which of the following values could not be a P-value? There may be more than one correct answer.

a) 0.02
b) -05
c) 2×10-3
d) 1.2
e) 0.88 Hint: Remember that the “P” in P-value stands for probability!

Question 3

Of the following options, which P-value would provide the strongest evidence against a null hypothesis?

a) 0.9
b) 0.5
c) 0.05
d) 0.01

Hint: Remember that the smaller a P-value is, the stronger the evidence is against the null hypothesis.

Once a test statistic has been calculated, we calculate the P-value by using what we know about the distribution of the test statistic. For the test of proportions that meets the sample conditions (like in Question 1), we use the standard normal curve to calculate the P-value as an area under the standard normal curve. Since the P-value provides evidence used in support of the alternative hypothesis, the area we measure depends on the alternative hypothesis. Ask yourself: “If the null is in fact true, how likely are the data that you’ve gathered?”[footnote]Lesson 7.4 - Introduction to hypothesis tests. (2021). Skew The Script. Retrieved from https://skewthescript.org/7-4[/footnote] Questions 4–6: Go to https://dcmathpathways.shinyapps.io/NormalDist/ and select “Find Probability.”

Question 4, 5, 6

	When the alternative is:	The P-value is:	The P-value equals:
4)	Lower-tailed ([latex]p <[/latex] null value)	How “unlikely” it is that we observed the sample data that resulted in a test statistic of [latex]z=-1.2[/latex] or lower. The area on the left of the test statistic under the standard normal curve.
5)	Upper-tailed ([latex]p >[/latex] null value)	How “unlikely” it is that we observed the sample data that resulted in a test statistic of[latex]z=-1.2[/latex] or higher. The area on the right side of the test statistic under the standard normal curve.
6)	Two-tailed ([latex]p \neq[/latex] null value)	How “unlikely” it is that we observed the sample data that resulted in a test statistic of [latex]z=-1.2[/latex] or lower OR [latex]z=1.2[/latex] or higher. The area on the right of the absolute value of the test statistic and the area on the left of the negative absolute value of the test statistic (i.e., more extreme).

Hint: Remember that a z-test statistic is from a standard normal distribution, which has a mean of 0 and a standard deviation of 1.

Looking ahead

Question 7

In the scenario discussed in Question 1, suppose that a commission in Florida is asked to investigate the types of claims made to the FTC.

If Florida follows the national trend, we determined that in a sample of 500 claims, observing 147 (or 29.4%) due to identity theft would not be unusual. Would 148 (29.6%) be unusual? How about 150 (30%)? Or 170 (34%)?

At what point would you start to think that there is convincing evidence that Florida is exceeding the national trend?

Use the data analysis tool at https://dcmathpathways.shinyapps.io/NormalDist/ to complete the following table and identify a sample proportion which you think is unusual and provides convincing evidence that Florida is exceeding the national trend. Try at least three additional sample proportions.

Note that in this exercise, if Florida is exceeding the national trend, then we have an upper-tailed test.

Number of complaints due to identity theft (out of 500)	Value of [latex]\hat{p}[/latex], the sample proportion	[latex]z = \frac{\hat{p}-0.294}{0.0204}[/latex]	P-value	Do you think we have convincing evidence to suggest that Florida is exceeding the national trend? Why?
148	0.296	[latex]z = \frac{0.296-0.294}{0.0204}[/latex] z=0.098	0.461	No, because a sample proportion of 0.296 is not that unlikely given the national trend of 0.294.
150	0.3	z= 0.29	0.3859	No, because a sample proportion of 0.30 is not that unlikely given the national trend of 0.294.

Federal Trade Commission. (2021, February). Consumer sentinel network, data book 2020. https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2020/csn_annual_data_book_2020.pdf ↵

Module 11

Question 1

Question 2

Question 3

Question 4, 5, 6

Question 7