Why It Matters: The Chi-Square Distribution

Why understand the application of chi-square tests using categorical variables?

In this module, The Chi-Square Distribution, we again focus on inference with categorical variables. We will discuss three new hypothesis tests, two of which are an extension of hypothesis tests about proportions that we learned in the two previous modules. This module does not focus on estimating a parameter, so there is nothing about confidence intervals in this module.

Here is the Big Picture of Statistics with the new material for The Chi-Square Distribution highlighted in purple.

Here is the Big Picture of Statistics with the new material for Chi-Square Tests highlighted in purple.

The Big Picture of statistics. Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference. Highlighted in this diagram is Step 4: Inference. Introduced to this module: A new curve called the chi-square

Following are examples of research questions that procedures in this module can address:

Goodness-of-Fit Test: Test a claim about the distribution of a categorical variable in a population.

  • During the presidential election of 2008, the Pew Research Center collected survey data that suggested that 24% of registered voters were liberal, 38% were moderate, and 38% were conservative. Is the distribution of political views different this year?
  • The distribution of blood types for whites in the United States is 45% type O, 41% type A, 10% type B, and 4% type AB. Is the distribution of blood types different for Asian Americans?

Test of Independence: Test a claim about the relationship between two categorical variables in a population.

  • For young adults in the United States, is gender related to body image?
  • Is alcohol abuse by New York firefighters dependent on participation in the 9/11 rescue operation?
  • In the United States, is race associated with political views (conservative, moderate, liberal)?

Test of Homogeneity: Test a claim about the distribution of a categorical variable in several populations.

  • Does the use of steroids in collegiate athletics differ across the three NCAA divisions?
  • Was the distribution of political views (liberal, moderate, conservative) different for the last three presidential elections in the United States?