{"id":405,"date":"2017-04-15T03:23:41","date_gmt":"2017-04-15T03:23:41","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/conceptstest1\/chapter\/introduction-to-hypothesis-testing-5-of-5\/"},"modified":"2022-08-01T16:05:07","modified_gmt":"2022-08-01T16:05:07","slug":"introduction-to-hypothesis-testing-5-of-5","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/chapter\/introduction-to-hypothesis-testing-5-of-5\/","title":{"raw":"Hypothesis Testing (5 of 5)","rendered":"Hypothesis Testing (5 of 5)"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>Learning OUTCOMES<\/h3>\r\n<ul>\r\n \t<li>Recognize type I and type II errors.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<h2>What Can Go Wrong: Two Types of Errors<\/h2>\r\nStatistical investigations involve making decisions in the face of uncertainty, so there is always some chance of making a wrong decision. In hypothesis testing, two types of wrong decisions can occur.\r\n\r\nIf the null hypothesis is true, but we reject it, the error is a <strong>type I <\/strong>error.\r\n\r\nIf the null hypothesis is false, but we fail to reject it, the error is a <strong>type II <\/strong>error.\r\n\r\nThe following table summarizes type I and II errors.\r\n\r\n<img class=\"aligncenter wp-image-2434 size-large\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1768\/2017\/04\/21220740\/SociologyAcceptReject-1024x473.png\" alt=\"Hypothesis testing matrices. If we reject H null and H null is false, when we have correctly rejected the null hypothesis. If we reject H null and H null is tue, we have made a Type I error. If we accept H null and H null is trie, we have correct accepted the null hypothesis. If we accept H null and H null is false, we have made a Type II error.\" width=\"1024\" height=\"473\" \/>\r\n<h2>Comment<\/h2>\r\nType I and type II errors are not caused by mistakes. These errors are the result of random chance. The data provide evidence for a conclusion that is false. It\u2019s no one\u2019s fault!\r\n<div class=\"textbox exercises\">\r\n<h3>Example<\/h3>\r\n<h2>Data Use on Smart Phones<\/h2>\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032333\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_teens_data_usage.jpg\" alt=\"Teens using smartphones\" width=\"425\" height=\"282\" \/>\r\n\r\nIn a previous example, we looked at a hypothesis test about data usage on smart phones. The researcher investigated the claim that the mean data usage for all teens is greater than 62 MBs. The sample mean was 75 MBs. The P-value was approximately 0.023. In this situation, the P-value is the probability that we will get a sample mean of 75 MBs or higher if the true mean is 62 MBs.\r\n\r\nNotice that the result (75 MBs) isn\u2019t impossible, only very unusual. The result is rare enough that we question whether the null hypothesis is true. This is why we reject the null hypothesis. But it is possible that the null hypothesis hypothesis is true and the researcher happened to get a very unusual sample mean. In this case, the result is just due to chance, and the data have led to a type I error: rejecting the null hypothesis when it is actually true.\r\n\r\n<\/div>\r\n<div class=\"textbox exercises\">\r\n<h3>Example<\/h3>\r\n<h2>White Male Support for Obama in 2012<\/h2>\r\nIn a previous example, we conducted a hypothesis test using poll results to determine if white male support for Obama in 2012 will be less than 40%. Our poll of white males showed 35% planning to vote for Obama in 2012. Based on the sampling distribution, we estimated the P-value as 0.078. In this situation, the P-value is the probability that we will get a sample proportion of 0.35 or less if 0.40 of the population of white males support Obama.\r\n\r\nAt the 5% level, the poll did not give strong enough evidence for us to conclude that less than 40% of white males will vote for Obama in 2012.\r\n\r\nWhich type of error is possible in this situation? If, in fact, it is true that less than 40% of this population support Obama, then the data led to a type II error: failing to reject a null hypothesis that is false. In other words, we failed to accept an alternative hypothesis that is true.\r\n\r\nWe definitely did not make a type I error here because a type I error requires that we reject the null hypothesis.\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Try It<\/h3>\r\nhttps:\/\/assess.lumenlearning.com\/practice\/1c61b330-1a14-42ea-a1bf-9d610bdef199\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Try It<\/h3>\r\nhttps:\/\/assess.lumenlearning.com\/practice\/5d819fcd-a3df-4131-84f8-44d0822142e8\r\n\r\n<\/div>\r\n<h2>What Is the Probability That We Will Make a Type I Error?<\/h2>\r\nIf the significance level is 5% (\u03b1 = 0.05), then 5% of the time we will reject the null hypothesis (when it is true!). Of course we will not know if the null is true. But if it is, the natural variability that we expect in random samples will produce rare results 5% of the time. This makes sense because we assume the null hypothesis is true when we create the sampling distribution. We look at the variability in random samples selected from the population described by the null hypothesis.\r\n\r\nSimilarly, if the significance level is 1%, then 1% of the time sample results will be rare enough for us to reject the null hypothesis hypothesis. So if the null hypothesis is actually true, then by chance alone, 1% of the time we will reject a true null hypothesis. The probability of a type I error is therefore 1%.\r\n\r\n<strong>In general, the probability of a type I error is \u03b1.<\/strong>\r\n<h2>What Is the Probability That We Will Make a Type II Error?<\/h2>\r\nThe probability of a type I error, if the null hypothesis is true, is equal to the significance level. The probability of a type II error is much more complicated to calculate. We can reduce the risk of a type I error by using a lower significance level. The best way to reduce the risk of a type II error is by increasing the sample size. In theory, we could also increase the significance level, but doing so would increase the likelihood of a type I error at the same time. We discuss these ideas further in a later module.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Try It<\/h3>\r\n<h2>A Fair Coin<\/h2>\r\nIn the long run, a fair coin lands heads up half of the time. (For this reason, a weighted coin is not fair.) We conducted a simulation in which each sample consists of 40 flips of a fair coin. Here is a simulated sampling distribution for the proportion of heads in 2,000 samples. Results ranged from 0.25 to 0.75.\r\n\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032335\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_image2.png\" alt=\"A distribution bar graph with results ranging from 0.25 to 0.75. The center at 0.5 has the highest bar, and on either side the bars get lower. The graph is in the traditional bell curve shape, but with a slightly smaller slope on the left side of the peak.\" width=\"255\" height=\"150\" \/>\r\n\r\nhttps:\/\/assess.lumenlearning.com\/practice\/ceb8765e-911e-4c5b-a3fd-316dbb91b109\r\n\r\nhttps:\/\/assess.lumenlearning.com\/practice\/28882de6-87fb-4ac8-b9e8-8744603a1f6e\r\n\r\nhttps:\/\/assess.lumenlearning.com\/practice\/ab79ae17-e452-458d-b56d-f1055e566d0f\r\n\r\n<\/div>\r\n<h2>Comment<\/h2>\r\nIn general, if the null hypothesis is true, the significance level gives the probability of making a type I error. If we conduct a large number of hypothesis tests using the same null hypothesis, then, a type I error will occur in a predictable percentage (\u03b1) of the hypothesis tests. This is a problem! If we run one hypothesis test and the data is significant at the 5% level, we have reasonably good evidence that the alternative hypothesis is true. If we run 20 hypothesis tests and the data in one of the tests is significant at the 5% level, it doesn\u2019t tell us anything! We expect 5% of the tests (1 in 20) to show significant results just due to chance.\r\n<div class=\"textbox exercises\">\r\n<h3>Example<\/h3>\r\n<h2>Cell Phones and Brain Cancer<\/h2>\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032337\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_cellphone_braincancer.jpg\" alt=\"A man using a cell phone\" width=\"400\" height=\"300\" \/>\r\n\r\nThe following is an excerpt from a 1999 <em>New York Times <\/em>article titled \u201cCell phones: questions but no answers,\u201d as referenced by David S. Moore in <em>Basic Practice of Statistics<\/em> (4th ed., New York: W. H. Freeman, 2007):\r\n<ul style=\"list-style-type: none;\">\r\n \t<li><em>A hospital study that compared brain cancer patients and a similar group without brain cancer found no statistically significant association between cell phone use and a group of brain cancers known as gliomas. But when 20 types of glioma were considered separately, an association was found between cell phone use and one rare form. Puzzlingly, however, this risk appeared to decrease rather than increase with greater mobile phone use.<\/em><\/li>\r\n<\/ul>\r\nThis is an example of a probable type I error. Suppose we conducted 20 hypotheses tests with the null hypothesis \u201cCell phone use is not associated with cancer\u201d at the 5% level. We expect 1 in 20 (5%) to give significant results by chance alone when there is no association between cell phone use and cancer. So the conclusion that this one type of cancer is related to cell phone use is probably just a result of random chance and not an indication of an association.\r\n\r\n<\/div>\r\n<a href=\"http:\/\/imgs.xkcd.com\/comics\/significant.png\" target=\"_blank\" rel=\"noopener noreferrer\">Click here<\/a> to see a fun cartoon that illustrates this same idea.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Try It<\/h3>\r\n<h2>How Many People Are Telepathic?<\/h2>\r\nTelepathy is the ability to read minds. Researchers used Zener cards in the early 1900s for experimental research into telepathy.\r\n\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032340\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_image3.png\" alt=\"5 Zener cards. The first has a circle, the second a +, the third three wavy lines, the fourth a square, and the fifth a star.\" width=\"223\" height=\"80\" \/>\r\n\r\n&nbsp;\r\n\r\nIn a telepathy experiment, the \u201csender\u201d looks at 1 of 5 Zener cards while the \u201creceiver\u201d guesses the symbol. This is repeated 40 times, and the proportion of correct responses is recorded. Because there are 5 cards, we expect random guesses to be right 20% of the time (1 out of 5) in the long run. So in 40 tries, 8 correct guesses, a proportion of 0.20, is common. But of course there will be variability even when someone is just guessing. Thirteen or more correct in 40 tries, a proportion of 0.325, is statistically significant at the 5% level. When people perform this well on the telepathy test, we conclude their performance is not due to chance and take it as an indication of the ability to read minds.\r\n\r\nhttps:\/\/assess.lumenlearning.com\/practice\/393afb9d-f49f-4c87-9fe0-adf1a9662ee9\r\n\r\n<\/div>\r\nIn the next section, \"Hypothesis Test for a Population Proportion,\" we learn the details of hypothesis testing for claims about a population proportion. Before we get into the details, we want to step back and think more generally about hypothesis testing. We close our introduction to hypothesis testing with a helpful analogy.\r\n<h2>Courtroom Analogy for Hypothesis Tests<\/h2>\r\nWhen a defendant stands trial for a crime, he or she is innocent until proven guilty. It is the job of the prosecution to present evidence showing that the defendant is guilty <em>beyond a reasonable doubt<\/em>. It is the job of the defense to challenge this evidence to establish a reasonable doubt. The jury weighs the evidence and makes a decision.\r\n\r\nWhen a jury makes a decision, it has only two possible verdicts:\r\n<ul>\r\n \t<li><strong>Guilty: <\/strong>The jury concludes that there is enough evidence to convict the defendant. The evidence is so strong that there is not a reasonable doubt that the defendant is guilty.<\/li>\r\n \t<li><strong>Not Guilty: <\/strong>The jury concludes that there is not enough evidence to conclude beyond a reasonable doubt that the person is guilty. Notice that they do not conclude that the person is innocent. This verdict says only that there is not enough evidence to return a guilty verdict.<\/li>\r\n<\/ul>\r\n<em>How is this example like a hypothesis test?<\/em>\r\n\r\nThe null hypothesis is \u201cThe person is innocent.\u201d The alternative hypothesis is \u201cThe person is guilty.\u201d The evidence is the data. In a courtroom, the person is assumed innocent until proven guilty. In a hypothesis test, we assume the null hypothesis is true until the data proves otherwise.\r\n\r\nThe two possible verdicts are similar to the two conclusions that are possible in a hypothesis test.\r\n\r\n<strong>Reject the null hypothesis: <\/strong>When we reject a null hypothesis, we accept the alternative hypothesis. This is like a guilty verdict. The evidence is strong enough for the jury to reject the assumption of innocence. In a hypothesis test, the data is strong enough for us to reject the assumption that the null hypothesis is true.\r\n\r\n<strong>Fail to reject the null hypothesis: <\/strong>When we fail to reject the null hypothesis, we are delivering a \u201cnot guilty\u201d verdict. The jury concludes that the evidence is not strong enough to reject the assumption of innocence, so the evidence is too weak to support a guilty verdict. We conclude the data is not strong enough to reject the null hypothesis, so the data is too weak to accept the alternative hypothesis.\r\n\r\n<em>How does the courtroom analogy relate to type I and type II errors?<\/em>\r\n\r\n<strong>Type I error: <\/strong>The jury convicts an innocent person. By analogy, we reject a true null hypothesis and accept a false alternative hypothesis.\r\n\r\n<strong>Type II error: <\/strong>The jury says a person is not guilty when he or she really is. By analogy, we fail to reject a null hypothesis that is false. In other words, we do not accept an alternative hypothesis when it is really true.\r\n<h2>Let\u2019s Summarize<\/h2>\r\nIn this section, we introduced the four-step process of hypothesis testing:\r\n\r\n<strong>Step 1: Determine the hypotheses.<\/strong>\r\n<ul>\r\n \t<li>The hypotheses are claims about the population(s).<\/li>\r\n \t<li>The null hypothesis is a hypothesis that the parameter equals a specific value.<\/li>\r\n \t<li>The alternative hypothesis is the competing claim that the parameter is less than, greater than, or not equal to the parameter value in the null. The claim that drives the statistical investigation is usually found in the alternative hypothesis.<\/li>\r\n<\/ul>\r\n<strong>Step 2: Collect the data.<\/strong>\r\n\r\nBecause the hypothesis test is based on probability, random selection or assignment is essential in data production.\r\n\r\n<strong>Step 3: Assess the evidence.<\/strong>\r\n<ul>\r\n \t<li>Use the data to find a P-value.<\/li>\r\n \t<li>The P-value is a probability statement about how unlikely the data is if the null hypothesis is true.<\/li>\r\n \t<li>More specifically, the P-value gives the probability of sample results at least as extreme as the data if the null hypothesis is true.<\/li>\r\n<\/ul>\r\n<strong>Step 4: Give the conclusion.<\/strong>\r\n<ul>\r\n \t<li>A small P-value says the data is unlikely to occur if the null hypothesis is true. We therefore conclude that the null hypothesis is probably not true and that the alternative hypothesis is true instead.<\/li>\r\n \t<li>We often choose a significance level as a benchmark for judging if the P-value is small enough. If the P-value is less than or equal to the significance level, we reject the null hypothesis and accept the alternative hypothesis instead.<\/li>\r\n \t<li>If the P-value is greater than the significance level, we say we \u201cfail to reject\u201d the null hypothesis. We never say that we \u201caccept\u201d the null hypothesis. We just say that we don\u2019t have enough evidence to reject it. This is equivalent to saying we don\u2019t have enough evidence to support the alternative hypothesis.<\/li>\r\n \t<li>Our conclusion will respond to the research question, so we often state the conclusion in terms of the alternative hypothesis.<\/li>\r\n<\/ul>\r\nInference is based on probability, so there is always uncertainty. Although we may have strong evidence against it, the null hypothesis may still be true. If this is the case, we have a <strong>type I <\/strong>error. Similarly, even if we fail to reject the null hypothesis, it does not mean the alternative hypothesis is false. In this case, we have a <strong>type II <\/strong>error. These errors are not the result of a mistake in conducting the hypothesis test. They occur because of random chance.\r\n<h2>Contribute!<\/h2>\r\n<div style=\"margin-bottom: 8px;\">Did you have an idea for improving this content? We\u2019d love your input.<\/div>\r\n<a style=\"font-size: 10pt; font-weight: 600; color: #077fab; text-decoration: none; border: 2px solid #077fab; border-radius: 7px; padding: 5px 25px; text-align: center; cursor: pointer; line-height: 1.5em;\" href=\"https:\/\/docs.google.com\/document\/d\/1O4qfLb1a7mCGO1SRTwobeltWOqosljei0vRf3yeq3Lc\" target=\"_blank\" rel=\"noopener\">Improve this page<\/a><a style=\"margin-left: 16px;\" href=\"https:\/\/docs.google.com\/document\/d\/1vy-T6DtTF-BbMfpVEI7VP_R7w2A4anzYZLXR8Pk4Fu4\" target=\"_blank\" rel=\"noopener\">Learn More<\/a>","rendered":"<div class=\"textbox learning-objectives\">\n<h3>Learning OUTCOMES<\/h3>\n<ul>\n<li>Recognize type I and type II errors.<\/li>\n<\/ul>\n<\/div>\n<h2>What Can Go Wrong: Two Types of Errors<\/h2>\n<p>Statistical investigations involve making decisions in the face of uncertainty, so there is always some chance of making a wrong decision. In hypothesis testing, two types of wrong decisions can occur.<\/p>\n<p>If the null hypothesis is true, but we reject it, the error is a <strong>type I <\/strong>error.<\/p>\n<p>If the null hypothesis is false, but we fail to reject it, the error is a <strong>type II <\/strong>error.<\/p>\n<p>The following table summarizes type I and II errors.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2434 size-large\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1768\/2017\/04\/21220740\/SociologyAcceptReject-1024x473.png\" alt=\"Hypothesis testing matrices. If we reject H null and H null is false, when we have correctly rejected the null hypothesis. If we reject H null and H null is tue, we have made a Type I error. If we accept H null and H null is trie, we have correct accepted the null hypothesis. If we accept H null and H null is false, we have made a Type II error.\" width=\"1024\" height=\"473\" \/><\/p>\n<h2>Comment<\/h2>\n<p>Type I and type II errors are not caused by mistakes. These errors are the result of random chance. The data provide evidence for a conclusion that is false. It\u2019s no one\u2019s fault!<\/p>\n<div class=\"textbox exercises\">\n<h3>Example<\/h3>\n<h2>Data Use on Smart Phones<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032333\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_teens_data_usage.jpg\" alt=\"Teens using smartphones\" width=\"425\" height=\"282\" \/><\/p>\n<p>In a previous example, we looked at a hypothesis test about data usage on smart phones. The researcher investigated the claim that the mean data usage for all teens is greater than 62 MBs. The sample mean was 75 MBs. The P-value was approximately 0.023. In this situation, the P-value is the probability that we will get a sample mean of 75 MBs or higher if the true mean is 62 MBs.<\/p>\n<p>Notice that the result (75 MBs) isn\u2019t impossible, only very unusual. The result is rare enough that we question whether the null hypothesis is true. This is why we reject the null hypothesis. But it is possible that the null hypothesis hypothesis is true and the researcher happened to get a very unusual sample mean. In this case, the result is just due to chance, and the data have led to a type I error: rejecting the null hypothesis when it is actually true.<\/p>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Example<\/h3>\n<h2>White Male Support for Obama in 2012<\/h2>\n<p>In a previous example, we conducted a hypothesis test using poll results to determine if white male support for Obama in 2012 will be less than 40%. Our poll of white males showed 35% planning to vote for Obama in 2012. Based on the sampling distribution, we estimated the P-value as 0.078. In this situation, the P-value is the probability that we will get a sample proportion of 0.35 or less if 0.40 of the population of white males support Obama.<\/p>\n<p>At the 5% level, the poll did not give strong enough evidence for us to conclude that less than 40% of white males will vote for Obama in 2012.<\/p>\n<p>Which type of error is possible in this situation? If, in fact, it is true that less than 40% of this population support Obama, then the data led to a type II error: failing to reject a null hypothesis that is false. In other words, we failed to accept an alternative hypothesis that is true.<\/p>\n<p>We definitely did not make a type I error here because a type I error requires that we reject the null hypothesis.<\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>Try It<\/h3>\n<p>\t<iframe id=\"assessment_practice_1c61b330-1a14-42ea-a1bf-9d610bdef199\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/1c61b330-1a14-42ea-a1bf-9d610bdef199?iframe_resize_id=assessment_practice_id_1c61b330-1a14-42ea-a1bf-9d610bdef199\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>Try It<\/h3>\n<p>\t<iframe id=\"assessment_practice_5d819fcd-a3df-4131-84f8-44d0822142e8\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/5d819fcd-a3df-4131-84f8-44d0822142e8?iframe_resize_id=assessment_practice_id_5d819fcd-a3df-4131-84f8-44d0822142e8\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<h2>What Is the Probability That We Will Make a Type I Error?<\/h2>\n<p>If the significance level is 5% (\u03b1 = 0.05), then 5% of the time we will reject the null hypothesis (when it is true!). Of course we will not know if the null is true. But if it is, the natural variability that we expect in random samples will produce rare results 5% of the time. This makes sense because we assume the null hypothesis is true when we create the sampling distribution. We look at the variability in random samples selected from the population described by the null hypothesis.<\/p>\n<p>Similarly, if the significance level is 1%, then 1% of the time sample results will be rare enough for us to reject the null hypothesis hypothesis. So if the null hypothesis is actually true, then by chance alone, 1% of the time we will reject a true null hypothesis. The probability of a type I error is therefore 1%.<\/p>\n<p><strong>In general, the probability of a type I error is \u03b1.<\/strong><\/p>\n<h2>What Is the Probability That We Will Make a Type II Error?<\/h2>\n<p>The probability of a type I error, if the null hypothesis is true, is equal to the significance level. The probability of a type II error is much more complicated to calculate. We can reduce the risk of a type I error by using a lower significance level. The best way to reduce the risk of a type II error is by increasing the sample size. In theory, we could also increase the significance level, but doing so would increase the likelihood of a type I error at the same time. We discuss these ideas further in a later module.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Try It<\/h3>\n<h2>A Fair Coin<\/h2>\n<p>In the long run, a fair coin lands heads up half of the time. (For this reason, a weighted coin is not fair.) We conducted a simulation in which each sample consists of 40 flips of a fair coin. Here is a simulated sampling distribution for the proportion of heads in 2,000 samples. Results ranged from 0.25 to 0.75.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032335\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_image2.png\" alt=\"A distribution bar graph with results ranging from 0.25 to 0.75. The center at 0.5 has the highest bar, and on either side the bars get lower. The graph is in the traditional bell curve shape, but with a slightly smaller slope on the left side of the peak.\" width=\"255\" height=\"150\" \/><\/p>\n<p>\t<iframe id=\"assessment_practice_ceb8765e-911e-4c5b-a3fd-316dbb91b109\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/ceb8765e-911e-4c5b-a3fd-316dbb91b109?iframe_resize_id=assessment_practice_id_ceb8765e-911e-4c5b-a3fd-316dbb91b109\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"assessment_practice_28882de6-87fb-4ac8-b9e8-8744603a1f6e\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/28882de6-87fb-4ac8-b9e8-8744603a1f6e?iframe_resize_id=assessment_practice_id_28882de6-87fb-4ac8-b9e8-8744603a1f6e\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"assessment_practice_ab79ae17-e452-458d-b56d-f1055e566d0f\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/ab79ae17-e452-458d-b56d-f1055e566d0f?iframe_resize_id=assessment_practice_id_ab79ae17-e452-458d-b56d-f1055e566d0f\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<h2>Comment<\/h2>\n<p>In general, if the null hypothesis is true, the significance level gives the probability of making a type I error. If we conduct a large number of hypothesis tests using the same null hypothesis, then, a type I error will occur in a predictable percentage (\u03b1) of the hypothesis tests. This is a problem! If we run one hypothesis test and the data is significant at the 5% level, we have reasonably good evidence that the alternative hypothesis is true. If we run 20 hypothesis tests and the data in one of the tests is significant at the 5% level, it doesn\u2019t tell us anything! We expect 5% of the tests (1 in 20) to show significant results just due to chance.<\/p>\n<div class=\"textbox exercises\">\n<h3>Example<\/h3>\n<h2>Cell Phones and Brain Cancer<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032337\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_cellphone_braincancer.jpg\" alt=\"A man using a cell phone\" width=\"400\" height=\"300\" \/><\/p>\n<p>The following is an excerpt from a 1999 <em>New York Times <\/em>article titled \u201cCell phones: questions but no answers,\u201d as referenced by David S. Moore in <em>Basic Practice of Statistics<\/em> (4th ed., New York: W. H. Freeman, 2007):<\/p>\n<ul style=\"list-style-type: none;\">\n<li><em>A hospital study that compared brain cancer patients and a similar group without brain cancer found no statistically significant association between cell phone use and a group of brain cancers known as gliomas. But when 20 types of glioma were considered separately, an association was found between cell phone use and one rare form. Puzzlingly, however, this risk appeared to decrease rather than increase with greater mobile phone use.<\/em><\/li>\n<\/ul>\n<p>This is an example of a probable type I error. Suppose we conducted 20 hypotheses tests with the null hypothesis \u201cCell phone use is not associated with cancer\u201d at the 5% level. We expect 1 in 20 (5%) to give significant results by chance alone when there is no association between cell phone use and cancer. So the conclusion that this one type of cancer is related to cell phone use is probably just a result of random chance and not an indication of an association.<\/p>\n<\/div>\n<p><a href=\"http:\/\/imgs.xkcd.com\/comics\/significant.png\" target=\"_blank\" rel=\"noopener noreferrer\">Click here<\/a> to see a fun cartoon that illustrates this same idea.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Try It<\/h3>\n<h2>How Many People Are Telepathic?<\/h2>\n<p>Telepathy is the ability to read minds. Researchers used Zener cards in the early 1900s for experimental research into telepathy.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032340\/m8_inference_one_proportion_topic_8_2_m8_intro_hypothesis_testing_5_image3.png\" alt=\"5 Zener cards. The first has a circle, the second a +, the third three wavy lines, the fourth a square, and the fifth a star.\" width=\"223\" height=\"80\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>In a telepathy experiment, the \u201csender\u201d looks at 1 of 5 Zener cards while the \u201creceiver\u201d guesses the symbol. This is repeated 40 times, and the proportion of correct responses is recorded. Because there are 5 cards, we expect random guesses to be right 20% of the time (1 out of 5) in the long run. So in 40 tries, 8 correct guesses, a proportion of 0.20, is common. But of course there will be variability even when someone is just guessing. Thirteen or more correct in 40 tries, a proportion of 0.325, is statistically significant at the 5% level. When people perform this well on the telepathy test, we conclude their performance is not due to chance and take it as an indication of the ability to read minds.<\/p>\n<p>\t<iframe id=\"assessment_practice_393afb9d-f49f-4c87-9fe0-adf1a9662ee9\" class=\"resizable\" src=\"https:\/\/assess.lumenlearning.com\/practice\/393afb9d-f49f-4c87-9fe0-adf1a9662ee9?iframe_resize_id=assessment_practice_id_393afb9d-f49f-4c87-9fe0-adf1a9662ee9\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:300px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<p>In the next section, &#8220;Hypothesis Test for a Population Proportion,&#8221; we learn the details of hypothesis testing for claims about a population proportion. Before we get into the details, we want to step back and think more generally about hypothesis testing. We close our introduction to hypothesis testing with a helpful analogy.<\/p>\n<h2>Courtroom Analogy for Hypothesis Tests<\/h2>\n<p>When a defendant stands trial for a crime, he or she is innocent until proven guilty. It is the job of the prosecution to present evidence showing that the defendant is guilty <em>beyond a reasonable doubt<\/em>. It is the job of the defense to challenge this evidence to establish a reasonable doubt. The jury weighs the evidence and makes a decision.<\/p>\n<p>When a jury makes a decision, it has only two possible verdicts:<\/p>\n<ul>\n<li><strong>Guilty: <\/strong>The jury concludes that there is enough evidence to convict the defendant. The evidence is so strong that there is not a reasonable doubt that the defendant is guilty.<\/li>\n<li><strong>Not Guilty: <\/strong>The jury concludes that there is not enough evidence to conclude beyond a reasonable doubt that the person is guilty. Notice that they do not conclude that the person is innocent. This verdict says only that there is not enough evidence to return a guilty verdict.<\/li>\n<\/ul>\n<p><em>How is this example like a hypothesis test?<\/em><\/p>\n<p>The null hypothesis is \u201cThe person is innocent.\u201d The alternative hypothesis is \u201cThe person is guilty.\u201d The evidence is the data. In a courtroom, the person is assumed innocent until proven guilty. In a hypothesis test, we assume the null hypothesis is true until the data proves otherwise.<\/p>\n<p>The two possible verdicts are similar to the two conclusions that are possible in a hypothesis test.<\/p>\n<p><strong>Reject the null hypothesis: <\/strong>When we reject a null hypothesis, we accept the alternative hypothesis. This is like a guilty verdict. The evidence is strong enough for the jury to reject the assumption of innocence. In a hypothesis test, the data is strong enough for us to reject the assumption that the null hypothesis is true.<\/p>\n<p><strong>Fail to reject the null hypothesis: <\/strong>When we fail to reject the null hypothesis, we are delivering a \u201cnot guilty\u201d verdict. The jury concludes that the evidence is not strong enough to reject the assumption of innocence, so the evidence is too weak to support a guilty verdict. We conclude the data is not strong enough to reject the null hypothesis, so the data is too weak to accept the alternative hypothesis.<\/p>\n<p><em>How does the courtroom analogy relate to type I and type II errors?<\/em><\/p>\n<p><strong>Type I error: <\/strong>The jury convicts an innocent person. By analogy, we reject a true null hypothesis and accept a false alternative hypothesis.<\/p>\n<p><strong>Type II error: <\/strong>The jury says a person is not guilty when he or she really is. By analogy, we fail to reject a null hypothesis that is false. In other words, we do not accept an alternative hypothesis when it is really true.<\/p>\n<h2>Let\u2019s Summarize<\/h2>\n<p>In this section, we introduced the four-step process of hypothesis testing:<\/p>\n<p><strong>Step 1: Determine the hypotheses.<\/strong><\/p>\n<ul>\n<li>The hypotheses are claims about the population(s).<\/li>\n<li>The null hypothesis is a hypothesis that the parameter equals a specific value.<\/li>\n<li>The alternative hypothesis is the competing claim that the parameter is less than, greater than, or not equal to the parameter value in the null. The claim that drives the statistical investigation is usually found in the alternative hypothesis.<\/li>\n<\/ul>\n<p><strong>Step 2: Collect the data.<\/strong><\/p>\n<p>Because the hypothesis test is based on probability, random selection or assignment is essential in data production.<\/p>\n<p><strong>Step 3: Assess the evidence.<\/strong><\/p>\n<ul>\n<li>Use the data to find a P-value.<\/li>\n<li>The P-value is a probability statement about how unlikely the data is if the null hypothesis is true.<\/li>\n<li>More specifically, the P-value gives the probability of sample results at least as extreme as the data if the null hypothesis is true.<\/li>\n<\/ul>\n<p><strong>Step 4: Give the conclusion.<\/strong><\/p>\n<ul>\n<li>A small P-value says the data is unlikely to occur if the null hypothesis is true. We therefore conclude that the null hypothesis is probably not true and that the alternative hypothesis is true instead.<\/li>\n<li>We often choose a significance level as a benchmark for judging if the P-value is small enough. If the P-value is less than or equal to the significance level, we reject the null hypothesis and accept the alternative hypothesis instead.<\/li>\n<li>If the P-value is greater than the significance level, we say we \u201cfail to reject\u201d the null hypothesis. We never say that we \u201caccept\u201d the null hypothesis. We just say that we don\u2019t have enough evidence to reject it. This is equivalent to saying we don\u2019t have enough evidence to support the alternative hypothesis.<\/li>\n<li>Our conclusion will respond to the research question, so we often state the conclusion in terms of the alternative hypothesis.<\/li>\n<\/ul>\n<p>Inference is based on probability, so there is always uncertainty. Although we may have strong evidence against it, the null hypothesis may still be true. If this is the case, we have a <strong>type I <\/strong>error. Similarly, even if we fail to reject the null hypothesis, it does not mean the alternative hypothesis is false. In this case, we have a <strong>type II <\/strong>error. These errors are not the result of a mistake in conducting the hypothesis test. They occur because of random chance.<\/p>\n<h2>Contribute!<\/h2>\n<div style=\"margin-bottom: 8px;\">Did you have an idea for improving this content? We\u2019d love your input.<\/div>\n<p><a style=\"font-size: 10pt; font-weight: 600; color: #077fab; text-decoration: none; border: 2px solid #077fab; border-radius: 7px; padding: 5px 25px; text-align: center; cursor: pointer; line-height: 1.5em;\" href=\"https:\/\/docs.google.com\/document\/d\/1O4qfLb1a7mCGO1SRTwobeltWOqosljei0vRf3yeq3Lc\" target=\"_blank\" rel=\"noopener\">Improve this page<\/a><a style=\"margin-left: 16px;\" href=\"https:\/\/docs.google.com\/document\/d\/1vy-T6DtTF-BbMfpVEI7VP_R7w2A4anzYZLXR8Pk4Fu4\" target=\"_blank\" rel=\"noopener\">Learn More<\/a><\/p>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-405\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Concepts in Statistics. <strong>Provided by<\/strong>: Open Learning Initiative. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/oli.cmu.edu\">http:\/\/oli.cmu.edu<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><li>Inferential Statistics Decision Making Table. <strong>Authored by<\/strong>: Wikimedia Commons: Adapted by Lumen Learning. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e2\/Inferential_Statistics_Decision_Making_Table.png\/120px-Inferential_Statistics_Decision_Making_Table.png\">https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e2\/Inferential_Statistics_Decision_Making_Table.png\/120px-Inferential_Statistics_Decision_Making_Table.png<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":163,"menu_order":11,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Concepts in Statistics\",\"author\":\"\",\"organization\":\"Open Learning Initiative\",\"url\":\"http:\/\/oli.cmu.edu\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"},{\"type\":\"cc\",\"description\":\"Inferential Statistics Decision Making Table\",\"author\":\"Wikimedia Commons: Adapted by Lumen Learning\",\"organization\":\"\",\"url\":\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e2\/Inferential_Statistics_Decision_Making_Table.png\/120px-Inferential_Statistics_Decision_Making_Table.png\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"31868dad-3626-4d5d-a84c-d394d8c0fd7d, 22e0784c-0d4b-4ddb-8b77-74e65f4595a7","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-405","chapter","type-chapter","status-publish","hentry"],"part":381,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/users\/163"}],"version-history":[{"count":14,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/405\/revisions"}],"predecessor-version":[{"id":2783,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/405\/revisions\/2783"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/parts\/381"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/405\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/media?parent=405"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapter-type?post=405"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/contributor?post=405"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/wm-concepts-statistics\/wp-json\/wp\/v2\/license?post=405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}