Key Concepts
- Even if a distribution is non-normal, if the sample size is sufficiently large, a normal distribution can be used to calculate probabilities involving sample means and sample sums. This is even true for exponential distributions and uniform distributions.
- As the sample size gets larger, the mean of the sample means approaches the population mean. This is due to the law of large numbers.
- The central limit theorem (CLT) is not for calculating probabilities involving an individual value.
Glossary
central limit theorem (for means and sums): given a random variable (RV) with known mean μ and known standard deviation, σ, we are sampling with size n, and we are interested in two new RVs: the sample mean, [latex]\overline{X}[/latex] and the sample sum [latex]\sum x[/latex]. If the size (n) of the sample is sufficiently large, then [latex]\overline{X} \sim N (M, \frac{\sigma}{\sqrt{n}})[/latex] and [latex]\sum X \sim N (n \mu, \sqrt{n} \sigma)[/latex]. If the size (n) of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. The mean of the sample means will equal the population mean, and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, [latex]\frac{\sigma}{\sqrt{n}}[/latex], is called the standard error of the mean.
exponential distribution: a continuous random variable (RV) that appears when we are interested in the intervals of time between some random events, for example, the length of time between emergency arrivals at a hospital; the notation is [latex]X \sim Exp(m)[/latex]. The mean is [latex]\mu = \frac{1}{m}[/latex] and the standard deviation is [latex]\sigma = \frac{1}{m}[/latex]. The probability density function is [latex]f(x)=me^{(-mx)}, x \geq 0[/latex] and the cumulative distribution function is [latex]P(X \leq x) = 1-e^{(-mx)}[/latex].
uniform distribution: a continuous random variable (RV) that has equally likely outcomes over the domain, [latex]a<x<b[/latex]. Notation: [latex]X \sim U(a,b)[/latex]. The mean is [latex]\mu = \frac{a+b}{2}[/latex] and the standard deviation is [latex]\sigma = \sqrt{\frac{(b-a)^2}{12}}[/latex]. The probability density function is [latex]f(x)=\frac{1}{b-a}[/latex] for [latex]a<x<b[/latex] or [latex]a \leq x \leq b[/latex]. The cumulative distribution is [latex]P(X \leq x) = \frac{x-a}{b-a}[/latex].