## The Binomial Formula

The binomial distribution is a discrete probability distribution of the successes in a sequence of [latex]\text{n}[/latex] independent yes/no experiments.

### Learning Objectives

Employ the probability mass function to determine the probability of success in a given amount of trials

### Key Takeaways

#### Key Points

- The probability of getting exactly [latex]\text{k}[/latex] successes in [latex]\text{n}[/latex] trials is given by the Probability Mass Function.
- The binomial distribution is frequently used to model the number of successes in a sample of size [latex]\text{n}[/latex] drawn with replacement from a population of size [latex]\text{N}[/latex].
- The binomial distribution is the discrete probability distribution of the number of successes in a sequence of [latex]\text{n}[/latex] independent yes/no experiments, each of which yields success with probability [latex]\text{p}[/latex].

#### Key Terms

**central limit theorem**: a theorem which states that, given certain conditions, the mean of a sufficiently large number of independent random variables–each with a well-defined mean and well-defined variance– will be approximately normally distributed**probability mass function**: a function that gives the probability that a discrete random variable is exactly equal to some value

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of [latex]\text{n}[/latex] independent yes/no experiments, each of which yields success with probability [latex]\text{p}[/latex]. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size [latex]\text{n}[/latex] drawn with replacement from a population of size [latex]\text{N}[/latex]. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for [latex]\text{N}[/latex] much larger than [latex]\text{n}[/latex], the binomial distribution is a good approximation, and widely used.

In general, if the random variable [latex]\text{X}[/latex] follows the binomial distribution with parameters [latex]\text{n}[/latex] and [latex]\text{p}[/latex], we write [latex]\text{X} \sim \text{B}(\text{n}, \text{p})[/latex]. The probability of getting exactly [latex]\text{k}[/latex] successes in [latex]\text{n}[/latex] trials is given by the Probability Mass Function:

[latex]\displaystyle \text{f}(\text{k}; \text{n}, \text{p}) = \text{P}(\text{X}=\text{k}) = {{\text{n}}\choose{\text{k}}}\text{p}^\text{k}(1-\text{p})^{\text{n}-\text{k}}[/latex]

For [latex]\text{k} = 0, 1, 2, \dots, \text{n}[/latex] where:

[latex]\displaystyle {{\text{n}}\choose{\text{k}}} = \frac{\text{n}!}{\text{k}!(\text{n}-\text{k})!}[/latex]

Is the binomial coefficient (hence the name of the distribution) “*n choose k*,*“* also denoted [latex]\text{C}(\text{n}, \text{k})[/latex] or [latex]_\text{n}\text{C}_\text{k}[/latex]. The formula can be understood as follows: We want [latex]\text{k}[/latex] successes *(*[latex]\text{p}^\text{k}[/latex]*)* and [latex]\text{n}-\text{k}[/latex] failures ([latex](1-\text{p})^{\text{n}-\text{k}}[/latex]); however, the [latex]\text{k}[/latex] successes can occur anywhere among the [latex]\text{n}[/latex] trials, and there are [latex]\text{C}(\text{n}, \text{k})[/latex] different ways of distributing [latex]\text{k}[/latex] successes in a sequence of [latex]\text{n}[/latex] trials.

One straightforward way to simulate a binomial random variable [latex]\text{X}[/latex] is to compute the sum of [latex]\text{n}[/latex] independent 0−1 random variables, each of which takes on the value 1 with probability [latex]\text{p}[/latex]. This method requires [latex]\text{n}[/latex] calls to a random number generator to obtain one value of the random variable. When [latex]\text{n}[/latex] is relatively large (say at least 30), the Central Limit Theorem implies that the binomial distribution is well-approximated by the corresponding normal density function with parameters [latex]\mu = \text{np}[/latex] and [latex]\sigma = \sqrt{\text{npq}}[/latex]*.*

*Figures from the Example*

## Binomial Probability Distributions

This chapter explores Bernoulli experiments and the probability distributions of binomial random variables.

### Learning Objectives

Apply Bernoulli distribution in determining success of an experiment

### Key Takeaways

#### Key Points

- A Bernoulli (success-failure) experiment is performed [latex]\text{n}[/latex] times, and the trials are independent.
- The probability of success on each trial is a constant [latex]\text{p}[/latex]; the probability of failure is [latex]\text{q}=1-\text{p}[/latex].
- The random variable [latex]\text{X}[/latex] counts the number of successes in the [latex]\text{n}[/latex] trials.

#### Key Terms

**Bernoulli Trial**: an experiment whose outcome is random and can be either of two possible outcomes, “success” or “failure”

Many random experiments include counting the number of successes in a series of a fixed number of independently repeated trials, which may result in either success or failure. The distribution of the number of successes is a binomial distribution. It is a discrete probability distribution with two parameters, traditionally indicated by [latex]\text{n}[/latex], the number of trials, and [latex]\text{p}[/latex], the probability of success. Such a success/failure experiment is also called a Bernoulli experiment, or Bernoulli trial; when [latex]\text{n}=1[/latex], the Bernoulli distribution is a binomial distribution.

Named after Jacob Bernoulli, who studied them extensively in the 1600s, a well known example of such an experiment is the repeated tossing of a coin and counting the number of times “heads” comes up.

In a sequence of Bernoulli trials, we are often interested in the total number of successes and not in the order of their occurrence. If we let the random variable [latex]\text{X}[/latex] equal the number of observed successes in [latex]\text{n}[/latex] Bernoulli trials, the possible values of [latex]\text{X}[/latex] are [latex]0, 1, 2, \dots, \text{n}[/latex]. If [latex]\text{x}[/latex] success occur, where [latex]\text{x}=0, 1, 2, \dots, \text{n}[/latex], then [latex]\text{n}-\text{x}[/latex] failures occur. The number of ways of selecting [latex]\text{x}[/latex] positions for the [latex]\text{x}[/latex] successes in the [latex]\text{x}[/latex] trials is:

[latex]\displaystyle {{\text{n}}\choose{\text{x}}} = \frac{\text{n}!}{\text{x}!(\text{n}-\text{x})!}[/latex]

Since the trials are independent and since the probabilities of success and failure on each trial are, respectively, [latex]\text{p}[/latex] and [latex]\text{q}=1-\text{p}[/latex], the probability of each of these ways is [latex]\text{p}^\text{x}(1-\text{p})^{\text{n}-\text{x}}[/latex]. Thus, the *p.d.f.* of [latex]\text{X}[/latex], say [latex]\text{f}(\text{x})[/latex], is the sum of the probabilities of these *(*[latex]\text{nx}[/latex]*)* mutually exclusive events–that is,

[latex]\text{f}(\text{x})=(\text{nx})\text{p}^\text{x}(1-\text{p})^{\text{n}-\text{x}}[/latex]

*, *

[latex]\text{x}=0, 1, 2, \dots, \text{n}[/latex]

These probabilities are called binomial probabilities, and the random variable [latex]\text{X}[/latex] is said to have a binomial distribution.

## Mean, Variance, and Standard Deviation of the Binomial Distribution

In this section, we’ll examine the mean, variance, and standard deviation of the binomial distribution.

### Learning Objectives

Examine the different properties of binomial distributions

### Key Takeaways

#### Key Points

- The mean of a binomial distribution with parameters [latex]\text{N}[/latex] (the number of trials) and [latex]\text{p}[/latex] (the probability of success for each trial) is [latex]\text{m}=\text{Np}[/latex].
- The variance of the binomial distribution is [latex]\text{s}^2 = \text{Np}(1-\text{p})[/latex], where [latex]\text{s}^2[/latex] is the variance of the binomial distribution.
- The standard deviation ([latex]\text{s}[/latex]) is the square root of the variance ([latex]\text{s}^2[/latex]).

#### Key Terms

**variance**: a measure of how far a set of numbers is spread out**mean**: one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution**standard deviation**: shows how much variation or dispersion exists from the average (mean), or expected value

As with most probability distributions, examining the different properties of binomial distributions is important to truly understanding the implications of them. The mean, variance, and standard deviation are three of the most useful and informative properties to explore. In this next section we’ll take a look at these different properties and how they are helpful in establishing the usefulness of statistical distributions. The easiest way to understand the mean, variance, and standard deviation of the binomial distribution is to use a real life example.

Consider a coin-tossing experiment in which you tossed a coin 12 times and recorded the number of heads. If you performed this experiment over and over again, what would the mean number of heads be? On average, you would expect half the coin tosses to come up heads. Therefore, the mean number of heads would be 6. In general, the mean of a binomial distribution with parameters [latex]\text{N}[/latex] (the number of trials) and [latex]\text{p}[/latex] (the probability of success for each trial) is:

[latex]\text{m}=\text{Np}[/latex]

Where [latex]\text{m}[/latex] is the mean of the binomial distribution.

The variance of the binomial distribution is:

[latex]\text{s}^2 = \text{Np}(1-\text{p})[/latex], where [latex]\text{s}^2[/latex] is the variance of the binomial distribution.

The coin was tossed 12 times, so [latex]\text{N}=12[/latex]. A coin has a probability of 0.5 of coming up heads. Therefore, [latex]\text{p}=0.5[/latex]. The mean and standard deviation can therefore be computed as follows:

[latex]\text{m}=\text{Np}=12\cdot0.5 = 6[/latex]

[latex]\text{s}^2=\text{Np}(1-\text{p})=12\cdot0.5\cdot(1.0-0.5)=3.0[/latex]

Naturally, the standard deviation ([latex]\text{s}[/latex]) is the square root of the variance ([latex]\text{s}^2[/latex]).

## Additional Properties of the Binomial Distribution

In this section, we’ll look at the median, mode, and covariance of the binomial distribution.

### Learning Objectives

Explain some results of finding the median in binomial distribution

### Key Takeaways

#### Key Points

- There is no single formula for finding the median of a binomial distribution.
- The mode of a binomial [latex]\text{B}(\text{n}, \text{p})[/latex] distribution is equal to.
- If two binomially distributed random variables [latex]\text{X}[/latex] and [latex]\text{Y}[/latex] are observed together, estimating their covariance can be useful.

#### Key Terms

**floor function**: maps a real number to the smallest following integer**covariance**: A measure of how much two random variables change together.**median**: the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half**Mode**: the value that appears most often in a set of data

In general, there is no single formula for finding the median of a binomial distribution, and it may even be non-unique. However, several special results have been established:

If [latex]\text{np}[/latex] is an integer, then the mean, median, and mode coincide and equal [latex]\text{np}[/latex].

Any median [latex]\text{m}[/latex] must lie within the interval [latex]\lfloor \text{np}\rfloor \leq \text{m} \leq \lceil \text{np}\rceil [/latex].

A median [latex]\text{m}[/latex] cannot lie too far away from the mean: [latex]|\text{m} \text{np}| \leq \min { \ln { 2 } },\max { (\text{p},1 - \text{p} )}[/latex].

The median is unique and equal to [latex]\text{m} = round(\text{np})[/latex] in cases where either [latex]\text{p} \leq 1 \ln 2[/latex] or [latex]\text{p} \geq \ln 2[/latex] or [latex]|\text{m} \text{np}| \leq \min{(\text{p}, 1 \text{p})}[/latex] (except for the case when [latex]\text{p} = \frac{1}{2}[/latex] and *n* is odd ).

When[latex]\text{p} = \frac{1}{2}[/latex] and *n* is odd, any number *m* in the interval [latex]\frac{1}{2} \cdot (\text{n} 1) \leq \text{m} \leq \frac{1}{2} \cdot (\text{n} + 1)[/latex] is a median of the binomial distribution. If [latex]\text{p} = \frac{1}{2}[/latex] and *n* is even, then [latex]\text{m} = \frac{n}{2}[/latex] is the unique median.

There are also conditional binomials. If [latex]\text{X} \sim \text{B}(\text{n}, \text{p})[/latex] and, conditional on [latex]\text{X}, \text{Y} \sim \text{B}(\text{X}, \text{q})[/latex], then *Y* is a simple binomial variable with distribution.

The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of *n* independent non-identical Bernoulli trials *Bern(pi)*. If *X* has the Poisson binomial distribution with p1=…=pn=pp1=\ldots =pn=p then ∼B(n,p)\sim B(n, p)*.*

Usually the mode of a binomial *B(n, p)* distribution is equal to where is the floor function. However, when [latex](\text{n} + 1)\text{p}[/latex] is an integer and *p* is neither 0 nor 1, then the distribution has two modes: [latex](\text{n} + 1)\text{p}[/latex] and [latex](\text{n} + 1)\text{p} 1[/latex]. When *p* is equal to 0 or 1, the mode will be 0 and *n*, respectively. These cases can be summarized as follows:

If two binomially distributed random variables *X* and *Y* are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case *n = 1* (thus being Bernoulli trials) we have.

The first term is non-zero only when both *X* and *Y* are one, and *μX* and *μY* are equal to the two probabilities. Defining *pB* as the probability of both happening at the same time, this gives and for *n* independent pairwise trials.

If *X* and *Y* are the same variable, this reduces to the variance formula given above.