The Chi-Square Distribution

Learning Outcomes

  • Describe the characteristics of a chi-square distribution

The notation for the chi-square distribution is:

[latex]\displaystyle\chi\sim\chi^2_{df}[/latex]

where df = degrees of freedom which depends on how chi-square is being used. (If you want to practice calculating chi-square probabilities then use [latex]\displaystyle{df}=n-1[/latex]. The degrees of freedom for the three major uses are each calculated differently).

For the χ2 distribution, the population mean is μ = df and the population standard deviation is [latex]\displaystyle\sigma=\sqrt{2(df)}[/latex].

The random variable is shown as χ2, but may be any upper case letter.

Recall: Skewness

In a perfectly symmetrical distribution, the mean and the median are the same. The histogram for the data: 4; 5; 6; 6; 6; 7; 7; 7; 7; 8 (shown in the Figure below) is not symmetrical. A distribution of this type is called skewed to the left because the majority of the data is around 6 or 7, the mean is going to be highly affected by a value further away from the middle, in this case, 4. In order words, the value to the left, 4, is pulling the mean in that direction.

This histogram matches the supplied data. It consists of 5 adjacent bars with the x-axis split into intervals of 1 from 4 to 8. The peak is to the right, and the heights of the bars taper down to the left.

The histogram for the data: 6; 7; 7; 7; 7; 8; 8; 8; 9; 10 (shown in the Figure below) is also not symmetrical. It is skewed to the right.

This histogram matches the supplied data. It consists of 5 adjacent bars with the x-axis split into intervals of 1 from 6 to 10. The peak is to the left, and the heights of the bars taper down to the right.

The mean is 7.7, the median is 7.5, and the mode is seven. The value to the right, 10, is pulling the mean in that direction. To summarize, generally, if the distribution of data is skewed to the left, the mean is less than the median, which is often less than the mode. If the distribution of data is skewed to the right, the mode is often less than the median, which is less than the mean.

The random variable for a chi-square distribution with k degrees of freedom is the sum of k independent, squared standard normal variables.

[latex]\displaystyle\chi^2=(Z_1)^2+(Z_2)^2+\dots+(Z_k)^2[/latex]

  1. The curve is nonsymmetrical and skewed to the right.
  2. There is a different chi-square curve for each df.
    Part (a) shows a chi-square curve with 2 degrees of freedom. It is nonsymmetrical and slopes downward continually. Part (b) shows a chi-square curve with 24 df. This nonsymmetrical curve does have a peak and is skewed to the right. The graphs illustrate that different degrees of freedom produce different chi-square curves.
  3. The test statistic for any test is always greater than or equal to zero.
  4. When df > 90, the chi-square curve approximates the normal distribution. For [latex]\displaystyle{X}\sim\chi^2_{1,000}[/latex] the mean, [latex]\displaystyle\mu=df=1,000[/latex] and the standard deviation, [latex]\displaystyle\sigma=\sqrt{2(1,000)}[/latex]. Therefore, [latex]\displaystyle{X}\sim{N}(1,000, 44.7)[/latex], approximately.
  5. The mean, μ, is located just to the right of the peak.
    This is a nonsymmetrical chi-square curve which is skewed to the right. The mean, m, is labeled on the horizontal axis and is located to the right of the curve's peak.