## Categorical Data and the Multinomial Experiment

The multinomial experiment is the test of the null hypothesis that the parameters of a multinomial distribution equal specified values.

### Learning Objectives

Explain the multinomial experiment for testing a null hypothesis

### Key Takeaways

#### Key Points

- The multinomial experiment is really an extension of the binomial experiment, in which there were only two categories: success or failure.
- The multinomial experiment consists of [latex]\text{n}[/latex] identical and independent trials with [latex]\text{k}[/latex] possible outcomes for each trial.
- For n independent trials each of which leads to a success for exactly one of [latex]\text{k}[/latex] categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

#### Key Terms

**binomial distribution**: the discrete probability distribution of the number of successes in a sequence of [latex]\text{n}[/latex] independent yes/no experiments, each of which yields success with probability [latex]\text{p}[/latex]**multinomial distribution**: A generalization of the binomial distribution; gives the probability of any particular combination of numbers of successes for the various categories.

### The Multinomial Distribution

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For [latex]\text{n}[/latex] independent trials, each of which leads to a success for exactly one of [latex]\text{k}[/latex] categories and with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

The binomial distribution is the probability distribution of the number of successes for one of just two categories in [latex]\text{n}[/latex] independent Bernoulli trials, with the same probability of success on each trial. In a multinomial distribution, the analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number [latex]\text{k}[/latex] of possible outcomes, with probabilities [latex]\text{p}_1, \cdots, \text{p}_\text{k}[/latex] (so that [latex]\text{p}_\text{i} \geq 0[/latex] for [latex]\text{i} = 1, \cdots, \text{k}[/latex] and the sum is [latex]1[/latex]), and there are [latex]\text{n}[/latex] independent trials. Then if the random variables X_{i} indicate the number of times outcome number [latex]\text{i}[/latex] is observed over the [latex]\text{n}[/latex] trials, the vector [latex]\text{X} = (\text{X}_1, \cdots, \text{X}_\text{k})[/latex] follows a multinomial distribution with parameters [latex]\text{n}[/latex] and [latex]\text{p}[/latex], where [latex]\text{p} = (\text{p}_1, \cdots, \text{p}_\text{k})[/latex].

### The Multinomial Experiment

In statistics, the multinomial experiment is the test of the null hypothesis that the parameters of a multinomial distribution equal specified values. It is used for categorical data. It is really an extension of the binomial experiment, where there were only two categories: success or failure. One example of a multinomial experiment is asking which of six candidates a voter preferred in an election.

### Properties for the Multinomial Experiment

- The experiment consists of [latex]\text{n}[/latex] identical trials.
- There are [latex]\text{k}[/latex] possible outcomes for each trial. These outcomes are sometimes called classes, categories, or cells.
- The probabilities of the [latex]\text{k}[/latex] outcomes, denoted by [latex]\text{p}_1[/latex], [latex]\text{p}_2[/latex], [latex]\cdots[/latex], [latex]\text{p}_\text{k}[/latex], remain the same from trial to trial, and they sum to one.
- The trials are independent.
- The random variables of interest are the cell counts [latex]\text{n}_1[/latex], [latex]\text{n}_2[/latex], [latex]\cdots[/latex], [latex]\text{n}_\text{k}[/latex], which refer to the number of observations that fall into each of the [latex]\text{k}[/latex] categories.

## Structure of the Chi-Squared Test

The chi-square test is used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies.

### Learning Objectives

Apply the chi-square test to approximate the probability of an event, distinguishing the different sample conditions in which it can be applied

### Key Takeaways

#### Key Points

- A chi-square test statistic is a measure of how different the data we observe are to what we would expect to observe if the variables were truly independent.
- The higher the test-statistic, the more likely that the data we observe did not come from independent variables.
- The chi-square distribution shows us how likely it is that the test statistic value was due to chance.
- If the difference between what we observe and what we expect from independent variables is large (and not just by chance), then we reject the null hypothesis that the two variables are independent and conclude that there is a relationship between the variables.
- Two types of chi-square tests include the test for goodness of fit and the test for independence.
- Certain assumptions must be made when conducting a goodness of fit test, including a simple random sample, a large enough sample size, independence, and adequate expected cell count.

#### Key Terms

**degrees of freedom**: any unrestricted variable in a frequency distribution**Fisher’s exact test**: a statistical significance test used in the analysis of contingency tables, in which the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity

The chi-square ([latex]\chi^2[/latex]) test is a nonparametric statistical technique used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies. Chi-square statistics use nominal (categorical) or ordinal level data. Thus, instead of using means and variances, this test uses frequencies.

Generally, the chi-squared statistic summarizes the discrepancies between the expected number of times each outcome occurs (assuming that the model is true) and the observed number of times each outcome occurs, by summing the squares of the discrepancies, normalized by the expected numbers, over all the categories.

Data used in a chi-square analysis has to satisfy the following conditions:

*Simple random sample*– The sample data is a random sampling from a fixed distribution or population where each member of the population has an equal probability of selection. Variants of the test have been developed for complex samples, such as where the data is weighted.*Sample size (whole table)*– A sample with a sufficiently large size is assumed. If a chi squared test is conducted on a sample with a smaller size, then the chi squared test will yield an inaccurate inference. The researcher, by using chi squared test on small samples, might end up committing a Type II error.*Expected cell count*– Adequate expected cell counts. Some require 5 or more, and others require 10 or more. A common rule is 5 or more in all cells of a 2-by-2 table, and 5 or more in 80% of cells in larger tables, but no cells with zero expected count.*Independence*– The observations are always assumed to be independent of each other. This means chi-squared cannot be used to test correlated data (like matched pairs or panel data).

There are two types of chi-square test:

- The Chi-square test for goodness of fit, which compares the expected and observed values to determine how well an experimenter’s predictions fit the data.
- The Chi-square test for independence, which compares two sets of categories to determine whether the two groups are distributed differently among the categories.

### How Do We Perform a Chi-Square Test?

First, we calculate a chi-square test statistic. The higher the test-statistic, the more likely that the data we observe did not come from independent variables.

Second, we use the chi-square distribution. We may observe data that give us a high test-statistic just by chance, but the chi-square distribution shows us how likely it is. The chi-square distribution takes slightly different shapes depending on how many categories ( degrees of freedom ) our variables have. Interestingly, when the degrees of freedom get very large, the shape begins to look like the bell curve we know and love. This is a property shared by the [latex]\text{T}[/latex]-distribution.

If the difference between what we observe and what we expect from independent variables is large (that is, the chi-square distribution tells us it is unlikely to be that large just by chance) then we reject the null hypothesis that the two variables are independent. Instead, we favor the alternative that there is a relationship between the variables. Therefore, chi-square can help us discover that there is a relationship but cannot look too deeply into what that relationship is.

### Problems

The approximation to the chi-squared distribution breaks down if expected frequencies are too low. It will normally be acceptable so long as no more than 20% of the events have expected frequencies below 5. Where there is only 1 degree of freedom, the approximation is not reliable if expected frequencies are below 10. In this case, a better approximation can be obtained by reducing the absolute value of each difference between observed and expected frequencies by 0.5 before squaring. This is called Yates’s correction for continuity.

In cases where the expected value, [latex]\text{E}[/latex], is found to be small (indicating a small underlying population probability, and/or a small number of observations), the normal approximation of the multinomial distribution can fail. In such cases it is found to be more appropriate to use the [latex]\text{G}[/latex]-test, a likelihood ratio-based test statistic. Where the total sample size is small, it is necessary to use an appropriate exact test, typically either the binomial test or (for contingency tables) Fisher’s exact test. However, note that this test assumes fixed and known totals in all margins, an assumption which is typically false.

## How Fisher Used the Chi-Squared Test

Fisher’s exact test is preferable to a chi-square test when sample sizes are small, or the data are very unequally distributed.

### Learning Objectives

Calculate statistical significance by employing Fisher’s exact test

### Key Takeaways

#### Key Points

- Fisher’s exact test is a statistical significance test used in the analysis of contingency tables.
- Fisher’s exact test is useful for categorical data that result from classifying objects in two different ways.
- It is used to examine the significance of the association (contingency) between the two kinds of classification.
- The usual rule of thumb for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one degree of freedom.
- Fisher’s exact test becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate.

#### Key Terms

**contingency table**: a table presenting the joint distribution of two categorical variables**hypergeometric distribution**: a discrete probability distribution that describes the number of successes in a sequence of [latex]\text{n}[/latex] draws from a finite population without replacement**p-value**: The probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.

Fisher’s exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, R. A. Fisher. Fisher’s exact test is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity. Fisher is said to have devised the test following a comment from Dr. Muriel Bristol, who claimed to be able to detect whether the tea or the milk was added first to her cup.

### Purpose and Scope

The test is useful for categorical data that result from classifying objects in two different ways. It is used to examine the significance of the association (contingency) between the two kinds of classification. In Fisher’s original example, one criterion of classification could be whether milk or tea was put in the cup first, and the other could be whether Dr. Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated—that is, whether Dr. Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a [latex]2 \times 2[/latex] contingency table. The [latex]\text{p}[/latex]-value from the test is computed as if the margins of the table are fixed (i.e., as if, in the tea-tasting example, Dr. Bristol knows the number of cups with each treatment [milk or tea first] and will, therefore, provide guesses with the correct number in each category). As pointed out by Fisher, under a null hypothesis of independence, this leads to a hypergeometric distribution of the numbers in the cells of the table.

With large samples, a chi-squared test can be used in this situation. However, the significance value it provides is only an approximation, because the sampling distribution of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is inadequate when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the “expected values”) being low. The usual rule of thumb for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one degree of freedom. In fact, for small, sparse, or unbalanced data, the exact and asymptotic [latex]\text{p}[/latex]-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest. In contrast, the Fisher test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed. Therefore, it can be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate.

For hand calculations, the test is only feasible in the case of a [latex]2 \times 2[/latex] contingency table. However, the principle of the test can be extended to the general case of an [latex]\text{m} \times \text{n}[/latex] table, and some statistical packages provide a calculation for the more general case.

## Goodness of Fit

The goodness of fit test determines whether the data “fit” a particular distribution or not.

### Learning Objectives

Outline the procedure for the goodness of fit test

### Key Takeaways

#### Key Points

- The test statistic for a goodness-of-fit test is: [latex]\chi ^{2}=\sum_{\text{i}=1}^{\text{k}}\frac{(O-\text{E})^{2}}{\text{E}}[/latex], where [latex]O[/latex] is the observed values ( data ), [latex]\text{E}[/latex] is the expected values (from theory), and [latex]\text{k}[/latex] is the number of different data cells or categories.
- The goodness-of-fit test is almost always right tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.
- If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.
- The null hypothesis for a chi-square test is that the observed values are close to the predicted values.
- The alternative hypothesis is that they are not close to the predicted values.

#### Key Terms

**binomial distribution**: the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability [latex]\text{p}[/latex]**goodness of fit**: how well a statistical model fits a set of observations

### Procedure for the Goodness of Fit Test

Goodness of fit means how well a statistical model fits a set of observations. A measure of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g., to test for normality of residuals or to test whether two samples are drawn from identical distributions.

In this type of hypothesis test, we determine whether the data “fit” a particular distribution or not. For example, we may suspect that our unknown data fits a binomial distribution. We use a chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null and the alternate hypotheses for this test may be written in sentences or may be stated as equations or inequalities.

The test statistic for a goodness-of-fit test is:

[latex]\displaystyle{\chi ^{2}=\sum_{\text{i}=1}^{\text{k}}\dfrac{(O-\text{E})^{2}}{\text{E}}}[/latex]

where [latex]O[/latex] is the observed values (data), [latex]\text{E}[/latex] is the expected values (from theory), and [latex]\text{k}[/latex] is the number of different data cells or categories.

The observed values are the data values and the expected values are the values we would expect to get if the null hypothesis was true. The degrees of freedom are found as follows:

[latex]\text{df} = \text{n}-1[/latex]

where [latex]\text{n}[/latex] is the number of categories.The goodness-of-fit test is almost always right tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.

As an example, suppose a coin is tossed 100 times. The outcomes would be expected to be 50 heads and 50 tails. If 47 heads and 53 tails are observed instead, does this deviation occur because the coin is biased, or is it by chance?

The null hypothesis for the above experiment is that the observed values are close to the predicted values. The alternative hypothesis is that they are not close to the predicted values. These hypotheses hold for all chi-square goodness of fit tests. Thus in this case the null and alternative hypotheses corresponds to:

Null hypothesis: The coin is fair.

Alternative hypothesis: The coin is biased.

We calculate chi-square by substituting values for [latex]O[/latex] and [latex]\text{E}[/latex].

For heads:

[latex]\dfrac{(47-50)^2}{50}=.18[/latex]

For tails:

[latex]\dfrac{(53-50)^2}{50}=.18[/latex]

The sum of these categories is:

[latex]0.18 + 0.18 = 0.36[/latex]

Significance of the chi-square test for goodness of fit value is established by calculating the degree of freedom [latex]\nu[/latex] (the Greek letter *nu*) and by using the chi-square distribution table. The [latex]\nu[/latex] in a chi-square goodness of fit test is equal to the number of categories, [latex]\text{c}[/latex], minus one ([latex]\nu=\text{c}-1[/latex]). This is done in order to check if the null hypothesis is valid or not, by looking at the critical chi-square value from the table that corresponds to the calculated [latex]\nu[/latex]. If the calculated chi-square is greater than the value in the table, then the null hypothesis is rejected, and it is concluded that the predictions made were incorrect. In the above experiment, [latex]\nu = 2-1 = 1[/latex]. The critical value for a chi-square for this example at [latex]\text{a} = 0.05[/latex] and [latex]\nu=1[/latex] is [latex]3.84[/latex], which is greater than [latex]\chi ^ 2 = 0.36[/latex]. Therefore the null hypothesis is not rejected, and the coin toss was fair.

## Inferences of Correlation and Regression

The chi-square test of association allows us to evaluate associations (or correlations) between categorical data.

### Learning Objectives

Calculate the adjusted standardized residuals for a chi-square test

### Key Takeaways

#### Key Points

- The chi-square test indicates whether there is an association between two categorical variables, but unlike the correlation coefficient between two quantitative variables, it does not in itself give an indication of the strength of the association.
- In order to describe the association more fully, it is necessary to identify the cells that have large differences between the observed and expected frequencies. These differences are referred to as residuals, and they can be standardized and adjusted to follow a Normal distribution.
- The larger the absolute value of the residual, the larger the difference between the observed and expected frequencies, and therefore the more significant the association between the two variables.

#### Key Terms

**residuals**: The difference between the observed value and the estimated function value.**correlation coefficient**: Any of the several measures indicating the strength and direction of a linear relationship between two random variables.

The chi-square test of association allows us to evaluate associations (or correlations) between categorical data. It indicates whether there is an association between two categorical variables, but unlike the correlation coefficient between two quantitative variables, it does not in itself give an indication of the strength of the association.

In order to describe the association more fully, it is necessary to identify the cells that have large differences between the observed and expected frequencies. These differences are referred to as residuals, and they can be standardized and adjusted to follow a normal distribution with mean [latex]0[/latex] and standard deviation [latex]1[/latex]. The adjusted standardized residuals, [latex]\text{d}_{\text{ij}}[/latex], are given by:

[latex]\displaystyle{\text{d}_{\text{ij}}=\dfrac{\text{O}_{\text{ij}}-\text{E}_{\text{ij}}}{\sqrt{\text{E}_{\text{ij}\left ( 1-\dfrac{\text{n}_{\text{i}}}{\text{N}} \right )\left(1-\dfrac{\text{n}_{\text{j}}}{\text{N}}\right)}}}}[/latex]

where [latex]\text{n}_\text{i}[/latex] is the total frequency for row [latex]\text{i}[/latex], [latex]\text{n}_\text{j}[/latex] is the total frequency for column [latex]\text{j}[/latex], and [latex]\text{N}[/latex] is the overall total frequency. The larger the absolute value of the residual, the larger the difference between the observed and expected frequencies, and therefore the more significant the association between the two variables.

## Example: Test for Goodness of Fit

The Chi-square test for goodness of fit compares the expected and observed values to determine how well an experimenter’s predictions fit the data.

### Learning Objectives

Support the use of Pearson’s chi-squared test to measure goodness of fit

### Key Takeaways

#### Key Points

- Pearson’s chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation.
- If the value of the chi-square test statistic is greater than the value in the chi-square table, then the null hypothesis is rejected.
- In this text, we examine a goodness of fit test as follows: for a population of employees, do the days for the highest number of absences occur with equal frequencies during a five day work week?

#### Key Terms

**null hypothesis**: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

Pearson’s chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

[latex]\displaystyle{{ \chi }^{ 2 }=\sum _{ \text{i}=1 }^{ \text{n} }{ \dfrac { { \left( { \text{O} }_{ \text{i} }-{ \text{E} }_{ \text{i} } \right) }^{ 2 } }{ { \text{E} }_{ \text{i} } } }}[/latex]

where [latex]\text{O}_\text{i}[/latex] is an observed frequency (i.e. count) for bin [latex]\text{i}[/latex] and [latex]\text{E}_\text{i}[/latex] = an expected (theoretical) frequency for bin [latex]\text{i}[/latex], asserted by the null hypothesis.

The expected frequency is calculated by:

[latex]\text{E}_\text{i} = [\text{F}(\text{Y}_\text{u})-\text{F}(\text{Y}_\text{l})] \cdot N[/latex]

where [latex]\text{F}[/latex] is the cumulative distribution function for the distribution being tested, [latex]\text{Y}_\text{u}[/latex] is the upper limit for class [latex]\text{i}[/latex], [latex]\text{Y}_\text{l}[/latex] is the lower limit for class [latex]\text{i}[/latex], and [latex]\text{N}[/latex] is the sample size.

### Example

Employers want to know which days of the week employees are absent in a five day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week did they have the highest number of employee absences. The results were distributed as follows:

- Monday: 15
- Tuesday: 12
- Wednesday: 9
- Thursday: 9
- Friday: 15

#### Solution

The null and alternate hypotheses are:

[latex]\text{H}_0[/latex]: The absent days occur with equal frequencies—that is, they fit a uniform distribution.

[latex]\text{H}_\text{a}[/latex]: The absent days occur with unequal frequencies—that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies then, out of [latex]60[/latex] absent days (the total in the sample: [latex]15 + 12 + 9 + 9 + 15 = 60[/latex]), there would be [latex]12[/latex] absences on Monday, [latex]12[/latex] on Tuesday, [latex]12[/latex] on Wednesday, [latex]12[/latex] on Thursday, and [latex]12[/latex] on Friday. These numbers are the expected ([latex]\text{E}[/latex]) values. The values in the table are the observed ([latex]\text{O}[/latex]) values or data.

Calculate the [latex]\chi^2[/latex] test statistic. Make a chart with the following column headings and fill in the cells:

- Expected ([latex]\text{E}[/latex]) values ([latex]12[/latex], [latex]12[/latex], [latex]12[/latex], [latex]12[/latex], [latex]12[/latex])
- Observed ([latex]\text{O}[/latex]) values ([latex]15[/latex], [latex]12[/latex], [latex]9[/latex], [latex]9[/latex], [latex]15[/latex])
- [latex]\left( \text{O}-\text{E} \right)[/latex]
- [latex]{ \left( \text{O}-\text{E} \right) }^{ 2 }[/latex]
- [latex]\dfrac { { \left( \text{O}-\text{E} \right) }^{ 2 } }{ \text{E} }[/latex]

Now add (sum) the values of the last column. Verify that this sum is [latex]3[/latex]. This is the [latex]\chi^2[/latex] test statistic.

To find the [latex]\text{p}[/latex]-value, calculate [latex]\text{P}[/latex]([latex]\chi^2>3[/latex]). This test is right-tailed. ([latex]\text{p}=0.5578[/latex])

The degrees of freedom are one fewer than the number of cells: [latex]\text{df} = 5-1 = 4[/latex].

### Conclusion

The decision is to not reject the null hypothesis. At a [latex]5\%[/latex] level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

## Example: Test for Independence

The chi-square test for independence is used to determine the relationship between two variables of a sample.

### Learning Objectives

Explain how to calculate chi-square test for independence

### Key Takeaways

#### Key Points

- As with the goodness of fit example in the previous section, the key idea of the chi-square test for independence is a comparison of observed and expected values.
- It is important to keep in mind that the chi-square test for independence only tests whether two variables are independent or not, it cannot address questions of which is greater or less.
- In the example presented in this text, we examine whether boys or girls get into trouble more often in school.
- The null hypothesis is that the likelihood of getting in trouble is the same for boys and girls.
- We calculate a chi-square statistic of [latex]1.87[/latex] and find a [latex]\text{p}[/latex]-value of [latex]0.20[/latex]. Therefore, we fail to reject the null hypothesis.

#### Key Terms

**alternative hypothesis**: a rival hypothesis to the null hypothesis, whose likelihoods are compared by a statistical hypothesis test**null hypothesis**: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

The chi-square test for independence is used to determine the relationship between two variables of a sample. In this context, independence means that the two factors are not related. Typically in social science research, researchers are interested in finding factors which are related (e.g., education and income, occupation and prestige, age and voting behavior).

Suppose we want to know whether boys or girls get into trouble more often in school. Below is the table documenting the frequency of boys and girls who got into trouble in school.

To examine statistically whether boys got in trouble more often in school, we need to establish hypotheses for the question. The null hypothesis is that the two variables are independent. In this particular case, it is that the likelihood of getting in trouble is the same for boys and girls. The alternative hypothesis to be tested is that the likelihood of getting in trouble is not the same for boys and girls.

It is important to keep in mind that the chi-square test for independence only tests whether two variables are independent or not. It cannot address questions of which is greater or less. Using the chi-square test for independence, who gets into more trouble between boys and girls cannot be evaluated directly from the hypothesis.

As with the goodness of fit example seen previously, the key idea of the chi-square test for independence is a comparison of observed and expected values. In the case of tabular data, however, we usually do not know what the distribution should look like (as we did with tossing the coin). Rather, expected values are calculated based on the row and column totals from the table using the following equation:

expected value = (row total x column total) / total for table.

[latex]\text{E}=\dfrac{\sigma_\text{r} \cdot \sigma_\text{c}}{\sigma_\text{t}}[/latex]

where [latex]\sigma_\text{r}[/latex] is the sum over that row, [latex]\sigma_\text{c}[/latex] is the sum over that column, and [latex]\sigma_\text{t}[/latex] is the sum over the entire table. The expected values (in parentheses, italics and bold) for each cell are also presented in the table above.

With the values in the table, the chi-square statistic can be calculated as follows:

[latex]\begin{align}\chi^2 &= \dfrac{(46-40.97)^2}{40.97} + \dfrac{(37-42.03)^ \text{d}}{42.03} + \text{d}\frac{(71-76.03)^2}{76.03} +\dfrac{(83-77.97)^2}{77.97} \\ &= 1.87\end{align}[/latex]

In the chi-square test for independence, the degrees of freedom are found as follows:

[latex]\text{df}=(\text{r}-1)(\text{c}-1)[/latex]

where [latex]\text{r}[/latex] is the number of rows in the table and [latex]\text{c}[/latex] is the number of columns in the table. Substituting in the proper values yields:

[latex]\text{df}=(2-1)(2-1)=1[/latex]

Finally, the value calculated from the formula above is compared with values in the chi-square distribution table. The value returned from the table is [latex]\text{p}<0.2[/latex] ([latex]20\%[/latex]). Therefore, the null hypothesis is not rejected. Hence, boys are not significantly more likely to get in trouble in school than girls.