{"id":616,"date":"2017-04-15T03:28:57","date_gmt":"2017-04-15T03:28:57","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/conceptstest1\/chapter\/test-of-homogeneity\/"},"modified":"2017-06-06T20:46:03","modified_gmt":"2017-06-06T20:46:03","slug":"test-of-homogeneity","status":"web-only","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/chapter\/test-of-homogeneity\/","title":{"raw":"Test of Homogeneity","rendered":"Test of Homogeneity"},"content":{"raw":"&nbsp;\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<ul>\n \t<li>Conduct a chi-square test of homogeneity. Interpret the conclusion in context.<\/li>\n<\/ul>\n<\/div>\nWe have learned the details for two chi-square tests, the goodness-of-fit test, and the test of independence. Now we focus on the third and last chi-square test that we will learn, the <strong>test for homogeneity<\/strong>. This test determines if two or more populations (or subgroups of a population) have the same distribution of a single categorical variable.\n\nThe test of homogeneity expands the test for a difference in two population proportions, which is the two-proportion Z-test we learned in <em>Inference for Two Proportions<\/em>. We use the two-proportion Z-test when the response variable has only two outcome categories and we are comparing two populations (or two subgroups.) We use the test of homogeneity if the response variable has two or more categories and we wish to compare two or more populations (or subgroups.)\n\nWe can answer the following research questions with a chi-square test of homogeneity:\n<ul>\n \t<li>Does the use of steroids in collegiate athletics differ across the three NCAA divisions?<\/li>\n \t<li>Was the distribution of political views (liberal, moderate, conservative) different for last three presidential elections in the United States?<\/li>\n<\/ul>\nThe null hypothesis states that the distribution of the categorical variable is the same for the populations (or subgroups). In other words, the proportion with a given response is the same in all of the populations, and this is true for all response categories. The alternative hypothesis says that the distributions differ.\n\nNote: <em>Homogeneous<\/em> means the same in structure or composition. This test gets its name from the null hypothesis, where we claim that the distribution of the responses are the same (homogeneous) across groups.\n\nTo test our hypotheses, we select a random sample from each population and gather data on one categorical variable. As with all chi-square tests, the expected counts reflect the null hypothesis. We must determine what we expect to see in each sample if the distributions are identical. As before, the chi-square test statistic measures the amount that the observed counts in the samples deviate from the expected counts.\n<div class=\"textbox examples\">\n<h3>Example<\/h3>\n<h2>Steroid Use in Collegiate Sports<\/h2>\nIn 2006, the NCAA published a report called \u201cSubstance Use: NCAA Study of Substance Use of College Student-Athletes.\u201d We use data from this report to investigate the following question: <em>Does steroid use by student athletes differ for the three NCAA divisions?<\/em>\n\nThe data comes from a random selection of teams in each NCAA division. The sampling plan was somewhat complex, but we can view the data as though it came from a random sample of athletes in each division. The surveys are anonymous to encourage truthful responses.\n\nTo see the NCAA report on substance use, <a href=\"https:\/\/s3-us-west-2.amazonaws.com\/oerfiles\/Concepts+in+Statistics\/datasets\/ncaa_2006_substance_use_report.pdf\" target=\"self\">click here<\/a>.\n<div class=\"textbox shaded\">A note on NCAA divisions:\u00a0The National Collegiate Athletic Association (NCAA) is divided into three divisions and oversees a wide range of collegiate sports. Division I schools have to sponsor more sports teams. These schools tend to be large universities with large athletic budgets supplemented by revenue from the games. They must offer athletic scholarships. Division II schools tend to be the smaller public universities and many private institutions. They have much smaller budgets that come solely from the college. The NCAA limits the amount Division II colleges can spend on athletic scholarships. Division III consists of colleges and universities that treat athletics as an extracurricular activity for students, instead of a source of revenue. These institutions do not offer athletic scholarships.<\/div>\n<strong>Step 1: State the hypotheses.<\/strong>\n\nIn the test of homogeneity, the null hypothesis says that the distribution of a categorical response variable is the same in each population. In this example, the categorical response variable is <em>steroid use<\/em> (yes or no). The populations are the three NCAA divisions.\n<ul style=\"list-style-type: none\">\n \t<li>H<sub>0<\/sub>: The proportion of athletes using steroids is the same in each of the three NCAA divisions.<\/li>\n \t<li>H<sub>a<\/sub>: The proportion of athletes using steroids is not same in each of the three NCAA divisions.<\/li>\n<\/ul>\nNote: These hypotheses imply that the proportion of athletes not using steroids is also the same in each of the three NCAA divisions, so we don\u2019t need to state this explicitly. For example, if 2% of the athletes in each division are using steroids, then 98% are not.\n\nHere is an alternative way we could state the hypotheses for a test of homogeneity.\n<ul style=\"list-style-type: none\">\n \t<li>H<sub>0<\/sub>: For each of the three NCAA divisions, the distribution of \u201cyes\u201d and \u201cno\u201d responses to the question about steroid use is the same.<\/li>\n \t<li>H<sub>a<\/sub>: The distribution of responses is not the same.<\/li>\n<\/ul>\n<strong>Step 2: Collect and analyze the data.<\/strong>\n\nWe summarized the data from these three samples in a two-way table.\n\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032846\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image5.png\" alt=\"Observed data for the amount of athletes in each division (I, II, and III) who do and do not admit to steroid use\" width=\"263\" height=\"99\" \/>&nbsp;\n\nWe use percentages to compare the distributions of yes and no responses in the three samples. This step is similar to our data analysis for the test of independence.\n\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032847\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image6.png\" alt=\"Conditional percentages for the number of athletes in each division (I,II, and III) who do and do not admit to steroid use.\" width=\"371\" height=\"224\" \/>&nbsp;\n\nWe can see that Division I and Division II schools have essentially the same percentage of athletes who admit steroid use (about 1.2%). Not surprisingly, the least competitive division, Division III, has a slightly lower percentage (about 1.0%). Do these results suggest that the proportion of athletes using steroids is the same for the three divisions? Or is the difference seen in the sample of Division III schools large enough to suggest differences in the divisions? After all, the sample sizes are very large. We know that for large samples, a small difference can be statistically significant. Of course, we have to conduct the test of homogeneity to find out.\n\nNote: We decided not to use ribbon charts for visual comparison of the three distributions because the percentage admitting steroid use is too small in each sample to be visible.\n\n<strong>Step 3: Assess the evidence.<\/strong>\n\nWe need to determine the expected values and the chi-square test statistic so that we can find the P-value.\n\n<em>Calculating Expected Values for a Test of Homogeneity<\/em>\n\nExpected counts always describe what we expect to see in a sample if the null hypothesis is true. In this situation, we expect the percentage using steroids to be the same for each division. What percentage do we use? We find the percentage using steroids in the combined samples. This calculation is the same as we did when finding expected counts for a test of independence, though the logic of the calculation is subtly different.\n\n<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032850\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image7.png\" alt=\"Expected counts\" \/>&nbsp;\n\nHere are the calculations for the response \u201cyes\u201d:\n<ul>\n \t<li>Percentage using steroids in combined samples: 220\/19,377 = 0.01135 = 1.135%<\/li>\n<\/ul>\nExpected count of steroid users for Division I is 1.135% of Division I sample:\n<ul>\n \t<li>0.01135(8,543) = 96.96<\/li>\n<\/ul>\nExpected count of steroid users for Division II is 1.135% of Division II sample:\n<ul>\n \t<li>0.01135(4,341) = 49.27<\/li>\n<\/ul>\nExpected count of steroid users for Division III is 1.135% of Division III sample:\n<ul>\n \t<li>0.01135(6,493) = 73.70<\/li>\n<\/ul>\n<em>Checking Conditions<\/em>\n\nThe conditions for use of the chi-square distribution are the same as we learned previously:\n<ul>\n \t<li>A sample is randomly selected from each population.<\/li>\n \t<li>All of the expected counts are 5 or greater.<\/li>\n<\/ul>\nSince this data meets the conditions, we can proceed with calculating the \u03c7<sup>2<\/sup> test statistic.\n\n<em>Calculating the Chi-Square Test Statistic<\/em>\n\nThere are no changes in the way we calculate the chi-square test statistic.\n<p style=\"text-align: center\">[latex]{\\chi }^{2}\\text{}=\\text{}\u2211\\frac{{(\\mathrm{observed}-\\mathrm{expected})}^{2}}{\\mathrm{expected}}[\/latex]<\/p>\nWe use technology to calculate the chi-square value. For this example, we show the calculation. There are six terms, one for each cell in the 3 \u00d7 2 table. (We ignore the totals, as always.)\n\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032852\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image9.png\" alt=\"The chi-square value is 1.57\" width=\"680\" height=\"108\" \/>&nbsp;\n\n<em>Finding Degrees of Freedom and the P-Value<\/em>\n\nFor chi-square tests based on two-way tables (both the test of independence and the test of homogeneity), the degrees of freedom are (<em>r<\/em> \u2212 1)(<em>c<\/em> \u2212 1), where <em>r<\/em> is the number of rows and <em>c<\/em> is the number of columns in the two-way table (not counting row and column totals). In this case, the degrees of freedom are (3 \u2212 1)(2 \u2212 1) = 2.\n\nWe use the chi-square distribution with <em>df<\/em> = 2 to find the P-value. The P-value is large (0.4561), so we fail to reject the null hypothesis.\n\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032854\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image10.png\" alt=\"For 2 degrees of freedom the P-value is 0.4561\" width=\"375\" height=\"305\" \/>&nbsp;\n\n<strong>Step 4: Conclusion.<\/strong>\n\nThe data does not provide strong enough evidence to conclude that steroid use differs in the three NCAA divisions (P-value = 0.4561).\n\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<h2>First Use of Anabolic Steroids by NCAA Athletes<\/h2>\nThe NCAA survey includes this question: \u201cWhen, if ever, did you start using anabolic steroids?\u201d The response options are: have never used, before junior high, junior high, high school, freshman year of college, after freshman year of college. We focused on those who admitted use of steroids and compared the distribution of their responses for the years 1997, 2001, and 2005. (These are the years that the NCAA conducted the survey. Counts are estimates from reported percentages and sample size.) Recall that the NCAA uses random sampling in its sampling design.\n\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032855\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image11.png\" alt=\"Table of collected data that shows age of initial use of anabolic steroids\" width=\"387\" height=\"121\" \/>&nbsp;\n\nPlease <a href=\"https:\/\/s3-us-west-2.amazonaws.com\/oerfiles\/Concepts+in+Statistics\/interactives\/chi_squared_cal\/chisquared1.html\" target=\"new\">click here to open the simulation<\/a> for use in the following activity.\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3798\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3799\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3730\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3731\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3800\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3801\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3802\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3803\n\n<\/div>\nWe now know the details for the chi-square test for homogeneity. We conclude with two activities that will give you practice recognizing when to use this test.\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<h2>Gender and Politics<\/h2>\nConsider these two situations:\n<ul style=\"list-style-type: none\">\n \t<li>A: Liberal, moderate, or conservative: Are there differences in political views of men and women in the United States? We survey a random sample of 100 U.S. men and 100 U.S. women.<\/li>\n \t<li>B: Do you plan to vote in the next presidential election? We ask a random sample of 100 U.S. men and 100 U.S. women. We look for differences in the proportion of men and women planning to vote.<\/li>\n<\/ul>\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3732\n\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3733\n\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<h2>Steroid Use for Male Athletes in NCAA Sports<\/h2>\nWe plan to compare steroid use for male athletes in NCAA baseball, basketball, and football. We design two different sampling plans.\n<ul style=\"list-style-type: none\">\n \t<li>A: Survey distinct random samples of NCAA athletes from each sport: 500 baseball players, 400 basketball players, 900 football players.<\/li>\n \t<li>B. Survey a random sample of 1,800 NCAA male athletes and categorize players by sport and admitted steroid use. Responses are anonymous.<\/li>\n<\/ul>\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3734\n\n<\/div>\n<h2>Let\u2019s Summarize<\/h2>\nIn \"Chi-Square Tests for Two-Way Tables,\" we discussed two different hypothesis tests using the chi-square test statistic:\n<ul>\n \t<li>Test of independence for a two-way table<\/li>\n \t<li>Test of homogeneity for a two-way table<\/li>\n<\/ul>\n<h3>Test of Independence for a Two-Way Table<\/h3>\n<ul>\n \t<li>In the test of independence, we consider one population and two categorical variables.<\/li>\n \t<li>In <em>Probability and Probability Distribution<\/em>, we learned that two events are independent if <em>P<\/em>(<em>A<\/em>|<em>B<\/em>) = <em>P<\/em>(<em>A<\/em>), but we did not pay attention to variability in the sample. With the chi-square test of independence, we have a method for deciding whether our observed <em>P<\/em>(<em>A<\/em>|<em>B<\/em>) is \u201ctoo far\u201d from our observed <em>P<\/em>(<em>A<\/em>) to infer independence in the population.<\/li>\n \t<li>The null hypothesis says the two variables are independent (or not associated). The alternative hypothesis says the two variables are dependent (or associated).<\/li>\n \t<li>To test our hypotheses, we select a single random sample and gather data for two different categorical variables.<\/li>\n \t<li>Example: Do men and women differ in their perception of their weight? Select a random sample of adults. Ask them two questions: (1) Are you male or female? (2) Do you feel that you are overweight, underweight, or about right in weight?<\/li>\n<\/ul>\n<h3>Test of Homogeneity for a Two-Way Table<\/h3>\n<ul>\n \t<li>In the test of homogeneity, we consider two or more populations (or two or more subgroups of a population) and a single categorical variable.<\/li>\n \t<li>The test of homogeneity expands on the test for a difference in two population proportions that we learned in <em>Inference for Two Proportions<\/em> by comparing the distribution of the categorical variable across multiple groups or populations.<\/li>\n \t<li>The null hypothesis says that the distribution of proportions for all categories is the same in each group or population. The alternative hypothesis says that the distributions differ.<\/li>\n \t<li>To test our hypotheses, we select a random sample from each population or subgroup independently. We gather data for one categorical variable.<\/li>\n \t<li>Example: Is the rate of steroid use different for different men\u2019s collegiate sports (baseball, basketball, football, tennis, track\/field)? Randomly select a sample of athletes from each sport and ask them anonymously if they use steroids.<\/li>\n<\/ul>\nThe difference between these two tests is subtle. They differ primarily in study design. In the test of independence, we select individuals at random from a population and record data for two categorical variables. The null hypothesis says that the variables are independent. In the test of homogeneity, we select random samples from each subgroup or population separately and collect data on a single categorical variable. The null hypothesis says that the distribution of the categorical variable is the same for each subgroup or population.\n\nBoth tests use the same chi-square test statistic.\n<h3>The Chi-Square Test Statistic and Distribution<\/h3>\nFor all chi-square tests, the chi-square test statistic \u03c7<sup>2<\/sup> is the same. It measures how far the observed data are from the null hypothesis by comparing observed counts and expected counts. <em>Expected counts<\/em> are the counts we expect to see if the null hypothesis is true.\n<p style=\"text-align: center\">[latex]{\\chi }^{2}\\text{}=\\text{}\u2211\\frac{{(\\mathrm{observed}-\\mathrm{expected})}^{2}}{\\mathrm{expected}}[\/latex]<\/p>\nThe chi-square model is a family of curves that depend on degrees of freedom. For a two-way table, the degrees of freedom equals (<em>r<\/em> \u2212 1)(<em>c<\/em> \u2212 1). All chi-square curves are skewed to the right with a mean equal to the degrees of freedom.\n\nA chi-square model is a good fit for the distribution of the chi-square test statistic only if the following conditions are met:\n<ul>\n \t<li>The sample is randomly selected.<\/li>\n \t<li>All expected counts are 5 or greater.<\/li>\n<\/ul>\nIf these conditions are met, we use the chi-square distribution to find the P-value. We use the same logic that we have used in all hypothesis tests to draw a conclusion based on the P-value. If the P-value is at least as small as the significance level, we reject the null hypothesis and accept the alternative hypothesis. The P-value is the likelihood that results from random samples have a \u03c7<sup>2<\/sup> value equal to or greater than that calculated from the data if the null hypothesis is true.\n<h3><\/h3>","rendered":"<p>&nbsp;<\/p>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<ul>\n<li>Conduct a chi-square test of homogeneity. Interpret the conclusion in context.<\/li>\n<\/ul>\n<\/div>\n<p>We have learned the details for two chi-square tests, the goodness-of-fit test, and the test of independence. Now we focus on the third and last chi-square test that we will learn, the <strong>test for homogeneity<\/strong>. This test determines if two or more populations (or subgroups of a population) have the same distribution of a single categorical variable.<\/p>\n<p>The test of homogeneity expands the test for a difference in two population proportions, which is the two-proportion Z-test we learned in <em>Inference for Two Proportions<\/em>. We use the two-proportion Z-test when the response variable has only two outcome categories and we are comparing two populations (or two subgroups.) We use the test of homogeneity if the response variable has two or more categories and we wish to compare two or more populations (or subgroups.)<\/p>\n<p>We can answer the following research questions with a chi-square test of homogeneity:<\/p>\n<ul>\n<li>Does the use of steroids in collegiate athletics differ across the three NCAA divisions?<\/li>\n<li>Was the distribution of political views (liberal, moderate, conservative) different for last three presidential elections in the United States?<\/li>\n<\/ul>\n<p>The null hypothesis states that the distribution of the categorical variable is the same for the populations (or subgroups). In other words, the proportion with a given response is the same in all of the populations, and this is true for all response categories. The alternative hypothesis says that the distributions differ.<\/p>\n<p>Note: <em>Homogeneous<\/em> means the same in structure or composition. This test gets its name from the null hypothesis, where we claim that the distribution of the responses are the same (homogeneous) across groups.<\/p>\n<p>To test our hypotheses, we select a random sample from each population and gather data on one categorical variable. As with all chi-square tests, the expected counts reflect the null hypothesis. We must determine what we expect to see in each sample if the distributions are identical. As before, the chi-square test statistic measures the amount that the observed counts in the samples deviate from the expected counts.<\/p>\n<div class=\"textbox examples\">\n<h3>Example<\/h3>\n<h2>Steroid Use in Collegiate Sports<\/h2>\n<p>In 2006, the NCAA published a report called \u201cSubstance Use: NCAA Study of Substance Use of College Student-Athletes.\u201d We use data from this report to investigate the following question: <em>Does steroid use by student athletes differ for the three NCAA divisions?<\/em><\/p>\n<p>The data comes from a random selection of teams in each NCAA division. The sampling plan was somewhat complex, but we can view the data as though it came from a random sample of athletes in each division. The surveys are anonymous to encourage truthful responses.<\/p>\n<p>To see the NCAA report on substance use, <a href=\"https:\/\/s3-us-west-2.amazonaws.com\/oerfiles\/Concepts+in+Statistics\/datasets\/ncaa_2006_substance_use_report.pdf\" target=\"self\">click here<\/a>.<\/p>\n<div class=\"textbox shaded\">A note on NCAA divisions:\u00a0The National Collegiate Athletic Association (NCAA) is divided into three divisions and oversees a wide range of collegiate sports. Division I schools have to sponsor more sports teams. These schools tend to be large universities with large athletic budgets supplemented by revenue from the games. They must offer athletic scholarships. Division II schools tend to be the smaller public universities and many private institutions. They have much smaller budgets that come solely from the college. The NCAA limits the amount Division II colleges can spend on athletic scholarships. Division III consists of colleges and universities that treat athletics as an extracurricular activity for students, instead of a source of revenue. These institutions do not offer athletic scholarships.<\/div>\n<p><strong>Step 1: State the hypotheses.<\/strong><\/p>\n<p>In the test of homogeneity, the null hypothesis says that the distribution of a categorical response variable is the same in each population. In this example, the categorical response variable is <em>steroid use<\/em> (yes or no). The populations are the three NCAA divisions.<\/p>\n<ul style=\"list-style-type: none\">\n<li>H<sub>0<\/sub>: The proportion of athletes using steroids is the same in each of the three NCAA divisions.<\/li>\n<li>H<sub>a<\/sub>: The proportion of athletes using steroids is not same in each of the three NCAA divisions.<\/li>\n<\/ul>\n<p>Note: These hypotheses imply that the proportion of athletes not using steroids is also the same in each of the three NCAA divisions, so we don\u2019t need to state this explicitly. For example, if 2% of the athletes in each division are using steroids, then 98% are not.<\/p>\n<p>Here is an alternative way we could state the hypotheses for a test of homogeneity.<\/p>\n<ul style=\"list-style-type: none\">\n<li>H<sub>0<\/sub>: For each of the three NCAA divisions, the distribution of \u201cyes\u201d and \u201cno\u201d responses to the question about steroid use is the same.<\/li>\n<li>H<sub>a<\/sub>: The distribution of responses is not the same.<\/li>\n<\/ul>\n<p><strong>Step 2: Collect and analyze the data.<\/strong><\/p>\n<p>We summarized the data from these three samples in a two-way table.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032846\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image5.png\" alt=\"Observed data for the amount of athletes in each division (I, II, and III) who do and do not admit to steroid use\" width=\"263\" height=\"99\" \/>&nbsp;<\/p>\n<p>We use percentages to compare the distributions of yes and no responses in the three samples. This step is similar to our data analysis for the test of independence.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032847\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image6.png\" alt=\"Conditional percentages for the number of athletes in each division (I,II, and III) who do and do not admit to steroid use.\" width=\"371\" height=\"224\" \/>&nbsp;<\/p>\n<p>We can see that Division I and Division II schools have essentially the same percentage of athletes who admit steroid use (about 1.2%). Not surprisingly, the least competitive division, Division III, has a slightly lower percentage (about 1.0%). Do these results suggest that the proportion of athletes using steroids is the same for the three divisions? Or is the difference seen in the sample of Division III schools large enough to suggest differences in the divisions? After all, the sample sizes are very large. We know that for large samples, a small difference can be statistically significant. Of course, we have to conduct the test of homogeneity to find out.<\/p>\n<p>Note: We decided not to use ribbon charts for visual comparison of the three distributions because the percentage admitting steroid use is too small in each sample to be visible.<\/p>\n<p><strong>Step 3: Assess the evidence.<\/strong><\/p>\n<p>We need to determine the expected values and the chi-square test statistic so that we can find the P-value.<\/p>\n<p><em>Calculating Expected Values for a Test of Homogeneity<\/em><\/p>\n<p>Expected counts always describe what we expect to see in a sample if the null hypothesis is true. In this situation, we expect the percentage using steroids to be the same for each division. What percentage do we use? We find the percentage using steroids in the combined samples. This calculation is the same as we did when finding expected counts for a test of independence, though the logic of the calculation is subtly different.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032850\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image7.png\" alt=\"Expected counts\" \/>&nbsp;<\/p>\n<p>Here are the calculations for the response \u201cyes\u201d:<\/p>\n<ul>\n<li>Percentage using steroids in combined samples: 220\/19,377 = 0.01135 = 1.135%<\/li>\n<\/ul>\n<p>Expected count of steroid users for Division I is 1.135% of Division I sample:<\/p>\n<ul>\n<li>0.01135(8,543) = 96.96<\/li>\n<\/ul>\n<p>Expected count of steroid users for Division II is 1.135% of Division II sample:<\/p>\n<ul>\n<li>0.01135(4,341) = 49.27<\/li>\n<\/ul>\n<p>Expected count of steroid users for Division III is 1.135% of Division III sample:<\/p>\n<ul>\n<li>0.01135(6,493) = 73.70<\/li>\n<\/ul>\n<p><em>Checking Conditions<\/em><\/p>\n<p>The conditions for use of the chi-square distribution are the same as we learned previously:<\/p>\n<ul>\n<li>A sample is randomly selected from each population.<\/li>\n<li>All of the expected counts are 5 or greater.<\/li>\n<\/ul>\n<p>Since this data meets the conditions, we can proceed with calculating the \u03c7<sup>2<\/sup> test statistic.<\/p>\n<p><em>Calculating the Chi-Square Test Statistic<\/em><\/p>\n<p>There are no changes in the way we calculate the chi-square test statistic.<\/p>\n<p style=\"text-align: center\">[latex]{\\chi }^{2}\\text{}=\\text{}\u2211\\frac{{(\\mathrm{observed}-\\mathrm{expected})}^{2}}{\\mathrm{expected}}[\/latex]<\/p>\n<p>We use technology to calculate the chi-square value. For this example, we show the calculation. There are six terms, one for each cell in the 3 \u00d7 2 table. (We ignore the totals, as always.)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032852\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image9.png\" alt=\"The chi-square value is 1.57\" width=\"680\" height=\"108\" \/>&nbsp;<\/p>\n<p><em>Finding Degrees of Freedom and the P-Value<\/em><\/p>\n<p>For chi-square tests based on two-way tables (both the test of independence and the test of homogeneity), the degrees of freedom are (<em>r<\/em> \u2212 1)(<em>c<\/em> \u2212 1), where <em>r<\/em> is the number of rows and <em>c<\/em> is the number of columns in the two-way table (not counting row and column totals). In this case, the degrees of freedom are (3 \u2212 1)(2 \u2212 1) = 2.<\/p>\n<p>We use the chi-square distribution with <em>df<\/em> = 2 to find the P-value. The P-value is large (0.4561), so we fail to reject the null hypothesis.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032854\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image10.png\" alt=\"For 2 degrees of freedom the P-value is 0.4561\" width=\"375\" height=\"305\" \/>&nbsp;<\/p>\n<p><strong>Step 4: Conclusion.<\/strong><\/p>\n<p>The data does not provide strong enough evidence to conclude that steroid use differs in the three NCAA divisions (P-value = 0.4561).<\/p>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<h2>First Use of Anabolic Steroids by NCAA Athletes<\/h2>\n<p>The NCAA survey includes this question: \u201cWhen, if ever, did you start using anabolic steroids?\u201d The response options are: have never used, before junior high, junior high, high school, freshman year of college, after freshman year of college. We focused on those who admitted use of steroids and compared the distribution of their responses for the years 1997, 2001, and 2005. (These are the years that the NCAA conducted the survey. Counts are estimates from reported percentages and sample size.) Recall that the NCAA uses random sampling in its sampling design.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15032855\/m11_chi_square_tests_topic_11_2_m11_test_homogeneity_1_image11.png\" alt=\"Table of collected data that shows age of initial use of anabolic steroids\" width=\"387\" height=\"121\" \/>&nbsp;<\/p>\n<p>Please <a href=\"https:\/\/s3-us-west-2.amazonaws.com\/oerfiles\/Concepts+in+Statistics\/interactives\/chi_squared_cal\/chisquared1.html\" target=\"new\">click here to open the simulation<\/a> for use in the following activity.<\/p>\n<p>\t<iframe id=\"lumen_assessment_3798\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3798&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3798\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3799\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3799&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3799\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3730\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3730&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3730\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3731\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3731&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3731\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3800\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3800&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3800\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3801\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3801&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3801\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3802\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3802&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3802\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3803\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3803&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3803\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<p>We now know the details for the chi-square test for homogeneity. We conclude with two activities that will give you practice recognizing when to use this test.<\/p>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<h2>Gender and Politics<\/h2>\n<p>Consider these two situations:<\/p>\n<ul style=\"list-style-type: none\">\n<li>A: Liberal, moderate, or conservative: Are there differences in political views of men and women in the United States? We survey a random sample of 100 U.S. men and 100 U.S. women.<\/li>\n<li>B: Do you plan to vote in the next presidential election? We ask a random sample of 100 U.S. men and 100 U.S. women. We look for differences in the proportion of men and women planning to vote.<\/li>\n<\/ul>\n<p>\t<iframe id=\"lumen_assessment_3732\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3732&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3732\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>\t<iframe id=\"lumen_assessment_3733\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3733&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3733\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<h2>Steroid Use for Male Athletes in NCAA Sports<\/h2>\n<p>We plan to compare steroid use for male athletes in NCAA baseball, basketball, and football. We design two different sampling plans.<\/p>\n<ul style=\"list-style-type: none\">\n<li>A: Survey distinct random samples of NCAA athletes from each sport: 500 baseball players, 400 basketball players, 900 football players.<\/li>\n<li>B. Survey a random sample of 1,800 NCAA male athletes and categorize players by sport and admitted steroid use. Responses are anonymous.<\/li>\n<\/ul>\n<p>\t<iframe id=\"lumen_assessment_3734\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3734&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3734\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<h2>Let\u2019s Summarize<\/h2>\n<p>In &#8220;Chi-Square Tests for Two-Way Tables,&#8221; we discussed two different hypothesis tests using the chi-square test statistic:<\/p>\n<ul>\n<li>Test of independence for a two-way table<\/li>\n<li>Test of homogeneity for a two-way table<\/li>\n<\/ul>\n<h3>Test of Independence for a Two-Way Table<\/h3>\n<ul>\n<li>In the test of independence, we consider one population and two categorical variables.<\/li>\n<li>In <em>Probability and Probability Distribution<\/em>, we learned that two events are independent if <em>P<\/em>(<em>A<\/em>|<em>B<\/em>) = <em>P<\/em>(<em>A<\/em>), but we did not pay attention to variability in the sample. With the chi-square test of independence, we have a method for deciding whether our observed <em>P<\/em>(<em>A<\/em>|<em>B<\/em>) is \u201ctoo far\u201d from our observed <em>P<\/em>(<em>A<\/em>) to infer independence in the population.<\/li>\n<li>The null hypothesis says the two variables are independent (or not associated). The alternative hypothesis says the two variables are dependent (or associated).<\/li>\n<li>To test our hypotheses, we select a single random sample and gather data for two different categorical variables.<\/li>\n<li>Example: Do men and women differ in their perception of their weight? Select a random sample of adults. Ask them two questions: (1) Are you male or female? (2) Do you feel that you are overweight, underweight, or about right in weight?<\/li>\n<\/ul>\n<h3>Test of Homogeneity for a Two-Way Table<\/h3>\n<ul>\n<li>In the test of homogeneity, we consider two or more populations (or two or more subgroups of a population) and a single categorical variable.<\/li>\n<li>The test of homogeneity expands on the test for a difference in two population proportions that we learned in <em>Inference for Two Proportions<\/em> by comparing the distribution of the categorical variable across multiple groups or populations.<\/li>\n<li>The null hypothesis says that the distribution of proportions for all categories is the same in each group or population. The alternative hypothesis says that the distributions differ.<\/li>\n<li>To test our hypotheses, we select a random sample from each population or subgroup independently. We gather data for one categorical variable.<\/li>\n<li>Example: Is the rate of steroid use different for different men\u2019s collegiate sports (baseball, basketball, football, tennis, track\/field)? Randomly select a sample of athletes from each sport and ask them anonymously if they use steroids.<\/li>\n<\/ul>\n<p>The difference between these two tests is subtle. They differ primarily in study design. In the test of independence, we select individuals at random from a population and record data for two categorical variables. The null hypothesis says that the variables are independent. In the test of homogeneity, we select random samples from each subgroup or population separately and collect data on a single categorical variable. The null hypothesis says that the distribution of the categorical variable is the same for each subgroup or population.<\/p>\n<p>Both tests use the same chi-square test statistic.<\/p>\n<h3>The Chi-Square Test Statistic and Distribution<\/h3>\n<p>For all chi-square tests, the chi-square test statistic \u03c7<sup>2<\/sup> is the same. It measures how far the observed data are from the null hypothesis by comparing observed counts and expected counts. <em>Expected counts<\/em> are the counts we expect to see if the null hypothesis is true.<\/p>\n<p style=\"text-align: center\">[latex]{\\chi }^{2}\\text{}=\\text{}\u2211\\frac{{(\\mathrm{observed}-\\mathrm{expected})}^{2}}{\\mathrm{expected}}[\/latex]<\/p>\n<p>The chi-square model is a family of curves that depend on degrees of freedom. For a two-way table, the degrees of freedom equals (<em>r<\/em> \u2212 1)(<em>c<\/em> \u2212 1). All chi-square curves are skewed to the right with a mean equal to the degrees of freedom.<\/p>\n<p>A chi-square model is a good fit for the distribution of the chi-square test statistic only if the following conditions are met:<\/p>\n<ul>\n<li>The sample is randomly selected.<\/li>\n<li>All expected counts are 5 or greater.<\/li>\n<\/ul>\n<p>If these conditions are met, we use the chi-square distribution to find the P-value. We use the same logic that we have used in all hypothesis tests to draw a conclusion based on the P-value. If the P-value is at least as small as the significance level, we reject the null hypothesis and accept the alternative hypothesis. The P-value is the likelihood that results from random samples have a \u03c7<sup>2<\/sup> value equal to or greater than that calculated from the data if the null hypothesis is true.<\/p>\n<h3><\/h3>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-616\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Concepts in Statistics. <strong>Provided by<\/strong>: Open Learning Initiative. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/oli.cmu.edu\">http:\/\/oli.cmu.edu<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":163,"menu_order":9,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Concepts in Statistics\",\"author\":\"\",\"organization\":\"Open Learning Initiative\",\"url\":\"http:\/\/oli.cmu.edu\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-616","chapter","type-chapter","status-web-only","hentry"],"part":570,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/616","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/users\/163"}],"version-history":[{"count":8,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/616\/revisions"}],"predecessor-version":[{"id":1543,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/616\/revisions\/1543"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/parts\/570"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/616\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/media?parent=616"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/pressbooks\/v2\/chapter-type?post=616"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/contributor?post=616"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-wm-concepts-statistics\/wp-json\/wp\/v2\/license?post=616"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}