## Distribution-Free Tests

Distribution-free tests are hypothesis tests that make no assumptions about the probability distributions of the variables being assessed.

### Learning Objectives

Distinguish distribution-free tests for testing statistical hypotheses

### Key Takeaways

#### Key Points

• The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution.
• The second meaning of non-parametric covers techniques that do not assume that the structure of a model is fixed.
• Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars).

#### Key Terms

• ordinal: Of a number, indicating position in a sequence.
• parametric: of, relating to, or defined using parameters

### Non-Parametric Statistics

The term “non-parametric statistics” has, at least, two different meanings.

1. The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:

• distribution free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. ( As such, it is the opposite of parametric statistics. It includes non-parametric descriptive statistics, statistical models, inference, and statistical tests).
• non-parametric statistics (in the sense of a statistic over data, which is defined to be a function on a sample that has no dependency on a parameter), whose interpretation does not depend on the population fitting any parameterized distributions. Order statistics, which are based on the ranks of observations, are one example of such statistics. These play a central role in many non-parametric approaches.

2. The second meaning of non-parametric covers techniques that do not assume that the structure of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables are typically assumed to belong to parametric distributions. Assumptions are also made about the types of connections among variables.

Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as assessing preferences. In terms of levels of measurement, non-parametric methods result in “ordinal” data.

### Distribution-Free Tests

Distribution-free statistical methods are mathematical procedures for testing statistical hypotheses which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include the following:

Anderson–Darling test: tests whether a sample is drawn from a given distribution.

Statistical Bootstrap Methods: estimates the accuracy/sampling distribution of a statistic.

Cochran’s $\text{Q}$: tests whether $\text{k}$ treatments in randomized block designs with $0/1$ outcomes have identical effects.

Cohen’s kappa: measures inter-rater agreement for categorical items.

Friedman two-way analysis of variance by ranks: tests whether $\text{k}$ treatments in randomized block designs have identical effects.

Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring.

Kendall’s tau: measures statistical dependence between two variables.

Kendall’s W: a measure between $0$ and $1$ of inter-rater agreement.

Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution.

Kruskal-Wallis one-way analysis of variance by ranks: tests whether more than 2 independent samples are drawn from the same distribution.

Kuiper’s test: tests whether a sample is drawn from a given distribution that is sensitive to cyclic variations such as day of the week.

Logrank Test: compares survival distributions of two right-skewed, censored samples.

Mann–Whitney $\text{U}$ or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.

McNemar’s test: tests whether, in $2 \times 2$ contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal.

Median test: tests whether two samples are drawn from distributions with equal medians.

Pitman’s permutation test: a statistical significance test that yields exact $\text{p}$ values by examining all possible rearrangements of labels.

Rank products: differentially detects expressed genes in replicated microarray experiments.

Siegel–Tukey test: tests for differences in scale between two groups.

Sign test: tests whether matched pair samples are drawn from distributions with equal medians.

Spearman’s rank correlation coefficient: measures statistical dependence between two variables using a monotonic function.

Squared ranks test: tests equality of variances in two or more samples.

Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random.

Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks.

Best Cars of 2010: This image shows a graphical representation of a ranked list of the highest rated cars in 2010. Non-parametric statistics is widely used for studying populations that take on a ranked order.

## Sign Test

The sign test can be used to test the hypothesis that there is “no difference in medians” between the continuous distributions of two random variables.

### Learning Objectives

Discover the nonparametric statistical sign test and outline its method.

### Key Takeaways

#### Key Points

• Non-parametric statistical tests tend to be more general, and easier to explain and apply, due to the lack of assumptions about the distribution of the population or population parameters.
• In order to perform the sign test, we must be able to draw paired samples from the distributions of two random variables, $\text{X}$ and $\text{Y}$.
• The sign test has very general applicability but may lack the statistical power of other tests.
• When performing a sign test, we count the number of values in the sample that are above the median and denote them by the sign $+$ and the ones falling below the median by the symbol $-$.

#### Key Terms

• sign test: a statistical test concerning the median of a continuous population with the idea that the probability of getting a value below the median or a value above the median is $\frac{1}{2}$

Non-parametric statistical tests tend to be more general, and easier to explain and apply, due to the lack of assumptions about the distribution of the population or population parameters. One such statistical method is known as the sign test.

The sign test can be used to test the hypothesis that there is “no difference in medians” between the continuous distributions of two random variables $\text{X}$ and $\text{Y}$, in the situation when we can draw paired samples from $\text{X}$ and $\text{Y}$. As outlined above, the sign test is a non-parametric test which makes very few assumptions about the nature of the distributions under examination. Because of this fact, it has very general applicability but may lack the statistical power of other tests.

### The One-Sample Sign Test

This test concerns the median $\tilde { \mu }$ of a continuous population. The idea is that the probability of getting a value below the median or a value above the median is $\frac{1}{2}$. We test the null hypothesis:

${ \text{H} }_{ 0 }:\tilde { \mu } ={ \tilde { \mu } }_{ 0 }$

against an appropriate alternative hypothesis:

${ \text{H} }_{ 1 }:\tilde { \mu } \neq,>,<{ \tilde { \mu } }_{ 0 }$

We count the number of values in the sample that are above ${ \tilde { \mu } }_{ 0 }$ and represent them with the $+$ sign and the ones falling below ${ \tilde { \mu } }_{ 0 }$ with the $-$.

For example, suppose that in a sample of students from a class the ages of the students are:

$\{ 23.5, 24.2, 19.2, 21, 34.5, 23.5, 27.7, 22, 38, 21.8, 25, 23 \}$

Test the claim that the median is less than $24$ years of age with a significance level of $\alpha = 0.05$. The hypothesis is then written as:

${ \text{H} }_{ 0 }:{ \tilde { \mu } }_{ 0 }=24$

${ \text{H} }_{ 1 }:{ \tilde { \mu } }_{ 0 }<24$

The test statistic $\text{x}$ is then the number of plus signs. In this case we get:

$\{ -,+,-,-,+,-,+,-,+,-,+,- \}$

Therefore, $\text{x}=5$.

The variable $\text{X}$ follows a binomial distribution with $\text{n}=12$ (number of values) and $\text{p}=\frac{1}{2}$. Therefore:

\begin{align}\text{P}\left\{ \text{X}\le 5 \right\} &=0.0002+0.0029+0.0161+0.0537+0.1208+0.1934\\ &=0.3872\end{align}

Since the $\text{p}$-value of $0.3872$ is larger than the significance level $\alpha = 0.05$, the null-hypothesis cannot be rejected. Therefore, we conclude that the median age of the population is not less than $24$ years of age. Actually in this particular class, the median age was $24$, so we arrive at the correct conclusion.

The Sign Test: The sign test involves denoting values above the median of a continuous population with a plus sign and the ones falling below the median with a minus sign in order to test the hypothesis that there is no difference in medians.

## Single-Population Inferences

Two notable nonparametric methods of making inferences about single populations are bootstrapping and the Anderson–Darling test.

### Learning Objectives

Contrast bootstrapping and the Anderson–Darling test for making inferences about single populations

### Key Takeaways

#### Key Points

• Bootstrapping is a method for assigning measures of accuracy to sample estimates.
• More specifically, bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution.
• The bootstrap works by treating inference of the true probability distribution $\text{J}$, given the original data, as being analogous to inference of the empirical distribution of $\hat{\text{J}}$, given the resampled data.
• The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution.
• In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free.
• $\text{K}$-sample Anderson–Darling tests are available for testing whether several collections of observations can be modeled as coming from a single population.

#### Key Terms

• bootstrap: any method or instance of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution
• uniform distribution: a family of symmetric probability distributions such that, for each member of the family, all intervals of the same length on the distribution’s support are equally probable

Two notable nonparametric methods of making inferences about single populations are bootstrapping and the Anderson–Darling test.

### Bootstrapping

Bootstrapping is a method for assigning measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods.

More specifically, bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution of the observed data. In the case where a set of observations can be assumed to be from an independent and identically distributed population, this can be implemented by constructing a number of resamples of the observed dataset (and of equal size to the observed dataset), each of which is obtained by random sampling with replacement from the original dataset.

Bootstrapping may also be used for constructing hypothesis tests. It is often used as an alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors.

### Approach

The bootstrap works by treating inference of the true probability distribution $\text{J}$, given the original data, as being analogous to inference of the empirical distribution of $\hat{\text{J}}$, given the resampled data. The accuracy of inferences regarding $\hat{\text{J}}$ using the resampled data can be assessed because we know $\hat{\text{J}}$. If $\hat{\text{J}}$ is a reasonable approximation to $\text{J}$, then the quality of inference on $\text{J}$ can, in turn, be inferred.

As an example, assume we are interested in the average (or mean ) height of people worldwide. We cannot measure all the people in the global population, so instead we sample only a tiny part of it, and measure that. Assume the sample is of size $\text{N}$; that is, we measure the heights of $\text{N}$ individuals. From that single sample, only one value of the mean can be obtained. In order to reason about the population, we need some sense of the variability of the mean that we have computed.

The simplest bootstrap method involves taking the original data set of $\text{N}$ heights, and, using a computer, sampling from it to form a new sample (called a ‘resample’ or bootstrap sample) that is also of size $\text{N}$. The bootstrap sample is taken from the original using sampling with replacement so it is not identical with the original “real” sample. This process is repeated a large number of times, and for each of these bootstrap samples we compute its mean. We now have a histogram of bootstrap means. This provides an estimate of the shape of the distribution of the mean, from which we can answer questions about how much the mean varies.

Situations where bootstrapping is useful include:

• When the theoretical distribution of a statistic of interest is complicated or unknown.
• When the sample size is insufficient for straightforward statistical inference.
• When power calculations have to be performed, and a small pilot sample is available.

A great advantage of bootstrap is its simplicity. It is a straightforward way to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. Moreover, it is an appropriate way to control and check the stability of the results.

However, although bootstrapping is (under some conditions) asymptotically consistent, it does not provide general finite-sample guarantees. The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches.

### The Anderson–Darling Test

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. $\text{K}$-sample Anderson–Darling tests are available for testing whether several collections of observations can be modeled as coming from a single population, where the distribution function does not have to be specified.

The Anderson–Darling test assesses whether a sample comes from a specified distribution. It makes use of the fact that, when given a hypothesized underlying distribution and assuming the data does arise from this distribution, the data can be transformed to a uniform distribution. The transformed sample data can be then tested for uniformity with a distance test. The formula for the test statistic $\text{A}$ to assess if data $\{ \text{Y}_1 < \dots, \text{n} \}$ comes from a distribution with cumulative distribution function (CDF) $\text{F}$ is:

$\text{A}^2 = -\text{n} - \text{S}$

where

$\displaystyle{\text{S}= \sum_{\text{k}=1}^\text{n} \frac{2\text{k}-1}{\text{n}} \left[ \ln (\text{F} (\text{Y}_\text{k}) ) + \ln ( 1- \text{F} ( \text{Y}_{\text{n}+1-\text{k}})) \right]}$

The test statistic can then be compared against the critical values of the theoretical distribution. Note that in this case no parameters are estimated in relation to the distribution function $\text{F}$.

## Comparing Two Populations: Independent Samples

Nonparametric independent samples tests include Spearman’s and the Kendall tau rank correlation coefficients, the Kruskal–Wallis ANOVA, and the runs test.

### Learning Objectives

Contrast Spearman, Kendall, Kruskal–Wallis, and Walk–Wolfowitz methods for examining the independence of samples

### Key Takeaways

#### Key Points

• Spearman’s rank correlation coefficient assesses how well the relationship between two variables can be described using a monotonic function.
• Kendall’s tau ($\tau$) coefficient is a statistic used to measure the association between two measured quantities.
• The Kruskal–Wallis one-way ANOVA by ranks is a nonparametric method for testing whether samples originate from the same distribution.
• The Walk–Wolfowitz runs test is a non-parametric statistical test for the hypothesis that the elements of a sequence are mutually independent.

#### Key Terms

• monotonic function: a function that either never decreases or never increases as its independent variable increases

Nonparametric methods for testing the independence of samples include Spearman’s rank correlation coefficient, the Kendall tau rank correlation coefficient, the Kruskal–Wallis one-way analysis of variance, and the Walk–Wolfowitz runs test.

### Spearman’s Rank Correlation Coefficient

Spearman’s rank correlation coefficient, often denoted by the Greek letter $\rho$ (rho), is a nonparametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of $1$ or $-1$ occurs when each of the variables is a perfect monotone function of the other.

For a sample of size $\text{n}$, the $\text{n}$ raw scores $\text{X}_\text{i}$, $\text{Y}_\text{i}$ are converted to ranks $\text{x}_\text{i}$, $\text{y}_\text{i}$, and $\rho$ is computed from these:

$\displaystyle{\rho = \frac{\sum_\text{i} (\text{x}_\text{i} - \bar{\text{x}}) (\text{y}_\text{i} - \bar{\text{y}})}{\sqrt{\sum_\text{i}(\text{x}_\text{i} - \bar{\text{x}})^2 \sum_\text{i}(\text{y}_\text{i} - \bar{\text{y}})^2}}}$

The sign of the Spearman correlation indicates the direction of association between $\text{X}$ (the independent variable) and $\text{Y}$ (the dependent variable). If $\text{Y}$ tends to increase when $\text{X}$ increases, the Spearman correlation coefficient is positive. If $\text{Y}$ tends to decrease when $\text{X}$ increases, the Spearman correlation coefficient is negative. A Spearman correlation of zero indicates that there is no tendency for $\text{Y}$ to either increase or decrease when $\text{X}$ increases.

### The Kendall Tau Rank Correlation Coefficient

Kendall’s tau ($\tau$) coefficient is a statistic used to measure the association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.

Let $(\text{x}_1, \text{y}_1), (\text{x}_2, \text{y}_2), \cdots, (\text{x}_\text{n}, \text{y}_\text{n})$ be a set of observations of the joint random variables $\text{X}$ and $\text{Y}$ respectively, such that all the values of ($\text{x}_\text{i}$) and ($\text{y}_\text{i}$) are unique. Any pair of observations are said to be concordant if the ranks for both elements agree. The Kendall $\tau$ coefficient is defined as:

$\displaystyle{\tau = \frac{(\text{number of concordant pairs}) - (\text{number of discordant pairs})}{\frac{1}{2} \text{n} (\text{n}-1)}}$

The denominator is the total number pair combinations, so the coefficient must be in the range $-1 \leq \tau \leq 1$. If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value $1$. If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value $-1$. If $\text{X}$ and $\text{Y}$ are independent, then we would expect the coefficient to be approximately zero.

### The Kruskal–Wallis One-Way Analysis of Variance

The Kruskal–Wallis one-way ANOVA by ranks is a nonparametric method for testing whether samples originate from the same distribution. It is used for comparing more than two samples that are independent, or not related. When the Kruskal–Wallis test leads to significant results, then at least one of the samples is different from the other samples. The test does not identify where the differences occur or how many differences actually occur.

Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution, unlike the analogous one-way analysis of variance. However, the test does assume an identically shaped and scaled distribution for each group, except for any difference in medians.

### The Walk–Wolfowitz Runs Test

The Walk–Wolfowitz runs test is a non-parametric statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent.

A “run” of a sequence is a maximal non-empty segment of the sequence consisting of adjacent equal elements. For example, the 22-element-long sequence

$++++−+++−++++++−$

consists of 6 runs, 3 of which consist of $+$ and the others of $-$. The run test is based on the null hypothesis that the two elements $+$ and $-$ are independently drawn from the same distribution.

The mean and variance parameters of the run do not assume that the positive and negative elements have equal probabilities of occurring, but only assume that the elements are independent and identically distributed. If the number of runs is significantly higher or lower than expected, the hypothesis of statistical independence of the elements may be rejected.

## Comparing Two Populations: Paired Difference Experiment

McNemar’s test is applied to $2 \times 2$ contingency tables with matched pairs of subjects to determine whether the row and column marginal frequencies are equal.

### Learning Objectives

Model the normal approximation of nominal data using McNemar’s test

### Key Takeaways

#### Key Points

• A contingency table used in McNemar’s test tabulates the outcomes of two tests on a sample of $\text{n}$ subjects.
• The null hypothesis of marginal homogeneity states that the two marginal probabilities for each outcome are the same.
• The McNemar test statistic is: ${ \chi }^{ 2 }=\frac { { \left( \text{b}-\text{c} \right) }^{ 2 } }{ \text{b}+\text{c} }$.
• If the ${ \chi }^{ 2 }$ result is significant, this provides sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis that $\text{p}_\text{b} \neq \text{p}_\text{c}$, which would mean that the marginal proportions are significantly different from each other.

#### Key Terms

• binomial distribution: the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability $\text{p}$
• chi-squared distribution: A distribution with $\text{k}$ degrees of freedom is the distribution of a sum of the squares of $\text{k}$ independent standard normal random variables.

McNemar’s test is a normal approximation used on nominal data. It is applied to $2 \times 2$ contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are equal (“marginal homogeneity”).

A contingency table used in McNemar’s test tabulates the outcomes of two tests on a sample of $\text{n}$ subjects, as follows:

$2 \times 2$ Contingency Table: A contingency table used in McNemar’s test tabulates the outcomes of two tests on a sample of $\text{n}$ subjects.

The null hypothesis of marginal homogeneity states that the two marginal probabilities for each outcome are the same, i.e. $\text{p}_\text{a} + \text{p}_\text{b} = \text{p}_\text{a} + \text{p}_\text{c}$ and $\text{p}_\text{c} + \text{p}_\text{d} = \text{p}_\text{b} + \text{p}_\text{d}$. Thus, the null and alternative hypotheses are:

${ \text{H} }_{ 0 }:{ \text{p} }_{ \text{b} }={ \text{p} }_{ \text{c} }$

${ \text{H} }_{ 1 }:{ \text{p} }_{ \text{b} }\neq { \text{p} }_{ \text{c} }$

Here $\text{p}_\text{a}$, etc., denote the theoretical probability of occurrences in cells with the corresponding label. The McNemar test statistic is:

$\displaystyle{{ \chi }^{ 2 }=\frac { { \left( \text{b}-\text{c} \right) }^{ 2 } }{ \text{b}+\text{c} }}$

Under the null hypothesis, with a sufficiently large number of discordants, ${ \chi }^{ 2 }$ has a chi-squared distribution with $1$ degree of freedom. If either $\text{b}$ or $\text{c}$ is small ($\text{b}+\text{c}<25$) then ${ \chi }^{ 2 }$ is not well-approximated by the chi-squared distribution. The binomial distribution can be used to obtain the exact distribution for an equivalent to the uncorrected form of McNemar’s test statistic. In this formulation, $\text{b}$ is compared to a binomial distribution with size parameter equal to $\text{b}+\text{c}$ and “probability of success” of $\frac{1}{2}$, which is essentially the same as the binomial sign test. For $\text{b}+\text{c}<25$, the binomial calculation should be performed. Indeed, most software packages simply perform the binomial calculation in all cases, since the result then is an exact test in all cases. When comparing the resulting ${ \chi }^{ 2 }$ statistic to the right tail of the chi-squared distribution, the $\text{p}$-value that is found is two-sided, whereas to achieve a two-sided $\text{p}$-value in the case of the exact binomial test, the $\text{p}$-value of the extreme tail should be multiplied by $2$.

If the ${ \chi }^{ 2 }$ result is significant, this provides sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis that $\text{p}_\text{b} \neq \text{p}_\text{c}$, which would mean that the marginal proportions are significantly different from each other.

## Comparing Three or More Populations: Randomized Block Design

Nonparametric methods using randomized block design include Cochran’s $\text{Q}$ test and Friedman’s test.

### Learning Objectives

Use the Friedman test to detect differences in treatments across multiple test attempts; use the Cochran’s Q test to verify if k treatments have identical effects

### Key Takeaways

#### Key Points

• In the analysis of two-way randomized block designs, where the response variable can take only two possible outcomes (coded as $0$ and $1$), Cochran’s $\text{Q}$ test is a non- parametric statistical test to verify if $\text{k}$ treatments have identical effects.
• If the Cochran test rejects the null hypothesis of equally effective treatments, pairwise multiple comparisons can be made by applying Cochran’s $\text{Q}$ test on the two treatments of interest.
• Similar to the parametric repeated measures ANOVA, Friedman’s test is used to detect differences in treatments across multiple test attempts.
• The procedure involves ranking each row (or block) together, then considering the values of ranks by columns.

#### Key Terms

• block: experimental units in groups that are similar to one another

### Cochran’s $\text{Q}$ Test

In the analysis of two-way randomized block designs, where the response variable can take only two possible outcomes (coded as $0$ and $1$), Cochran’s $\text{Q}$ test is a non-parametric statistical test to verify if $\text{k}$ treatments have identical effects. Cochran’s $\text{Q}$ test assumes that there are $\text{k} > 2$ experimental treatments and that the observations are arranged in $\text{b}$ blocks; that is:

Cochran’s $\text{Q}$: Cochran’s $\text{Q}$ test assumes that there are $\text{k} > 2$ experimental treatments and that the observations are arranged in $\text{b}$ blocks.

Cochran’s $\text{Q}$ test is:

$\text{H}_0$: The treatments are equally effective.

$\text{H}_\text{a}$: There is a difference in effectiveness among treatments.

The Cochran’s $\text{Q}$ test statistic is:

Cochran’s $\text{Q}$ Test Statistic: This is the equation for Cochran’s $\text{Q}$ test statistic, where

where

• $\text{k}$ is the number of treatments
• X• j is the column total for the jth treatment
• b is the number of blocks
• Xi • is the row total for the ith block
• N is the grand total

For significance level $\alpha$, the critical region is:

$\text{T}>{ \text{X} }_{ 1-\alpha,\text{k}-1 }^{ 2 }$

where ${ \text{X} }_{ 1-\alpha,\text{k}-1 }$ is the $(1-\alpha)$-quantile of the chi-squared distribution with $\text{k}-1$ degrees of freedom. The null hypothesis is rejected if the test statistic is in the critical region. If the Cochran test rejects the null hypothesis of equally effective treatments, pairwise multiple comparisons can be made by applying Cochran’s $\text{Q}$ test on the two treatments of interest.

Cochran’s $\text{Q}$ test is based on the following assumptions:

1. A large sample approximation; in particular, it assumes that $\text{b}$ is “large.”
2. The blocks were randomly selected from the population of all possible blocks.
3. The outcomes of the treatments can be coded as binary responses (i.e., a $0$ or $1$) in a way that is common to all treatments within each block.

### The Friedman Test

The Friedman test is a non-parametric statistical test developed by the U.S. economist Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns.

Examples of use could include:

• $\text{n}$ wine judges each rate $\text{k}$ different wines. Are any wines ranked consistently higher or lower than the others?
• $\text{n}$ welders each use $\text{k}$ welding torches, and the ensuing welds were rated on quality. Do any of the torches produce consistently better or worse welds?

### Method

1. Given data $\{ \text{x}_{\text{ij}} \} _{\text{nxk}}$, that is, a matrix with $\text{n}$ rows (the blocks), $\text{k}$ columns (the treatments) and a single observation at the intersection of each block and treatment, calculate the ranks within each block. If there are tied values, assign to each tied value the average of the ranks that would have been assigned without ties. Replace the data with a new matrix $\{ \text{r}_{\text{ij}} \} _{\text{nxk}}$ where the entry $\text{r}_{\text{ij}}$ is the rank of $\text{x}_{\text{ij}}$ within block $\text{r}_{\text{ij}}$ i.

2. Find the values:

$\displaystyle{\bar{\text{r}}_{\cdot \text{j}} = \frac{1}{\text{n}} \sum_{\text{i}=1}^\text{n} \text{r}_{\text{ij}}\\ \bar{\text{r}} = \frac{1}{\text{nk}} \sum_{\text{i}=1}^\text{n} \sum_{\text{j}=1}^\text{k} \text{r}_{\text{ij}}\\ \text{SS}_\text{t}=\text{n}\sum_{\text{j}=1}^\text{k}(\bar{\text{r}}_{\cdot \text{j}}-\bar{\text{r}})^2\\ \text{SS}_\text{e} = \frac{1}{\text{n}(\text{k}-1)} \sum_{\text{i}=1}^\text{n} \sum_{\text{j}=1}^\text{k} (\text{r}_{\text{ij}} - \bar{\text{r}})^2}$

3. The test statistic is given by $\text{Q}=\frac { { \text{SS} }_{ \text{t} } }{ { \text{SS} }_{ \text{e} } }$. Note that the value of $\text{Q}$ as computed above does not need to be adjusted for tied values in the data.

4. Finally, when $\text{n}$ or $\text{k}$ is large (i.e. $\text{n}>15$ or $\text{k} > 4$), the probability distribution of $\text{Q}$ can be approximated by that of a chi-squared distribution. In this case the $\text{p}$-value is given by $\text{P}\left( { \chi }_{ \text{k}-1 }^{ 2 }\ge \text{Q} \right)$. If $\text{n}$ or $\text{k}$ is small, the approximation to chi-square becomes poor and the $\text{p}$-value should be obtained from tables of $\text{Q}$ specially prepared for the Friedman test. If the $\text{p}$-value is significant, appropriate post-hoc multiple comparisons tests would be performed.

## Rank Correlation

A rank correlation is any of several statistics that measure the relationship between rankings.

### Learning Objectives

Evaluate the relationship between rankings of different ordinal variables using rank correlation

### Key Takeaways

#### Key Points

• A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.
• Kendall’s tau ($\tau$) and Spearman’s rho ($\rho$) are particular (and frequently used) cases of a general correlation coefficient.
• The measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to be likely to be a coincidence.

#### Key Terms

• concordant: Agreeing; correspondent; in keeping with; agreeable with.
• rank correlation: Any of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable.

### Rank Correlation

In statistics, a rank correlation is any of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable, where a “ranking” is the assignment of the labels (e.g., first, second, third, etc.) to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.

If, for example, one variable is the identity of a college basketball program and another variable is the identity of a college football program, one could test for a relationship between the poll rankings of the two types of program: do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to be likely to be a coincidence.

If there is only one variable, the identity of a college football program, but it is subject to two different poll rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls’ rankings can be measured with a rank correlation coefficient.

Some of the more popular rank correlation statistics include Spearman’s rho ($\rho$) and Kendall’s tau ($\tau$).

### Spearman’s $\rho$

Spearman developed a method of measuring rank correlation known as Spearman’s rank correlation coefficient. It is generally denoted by $\text{r}_\text{s}$. There are three cases when calculating Spearman’s rank correlation coefficient:

1. When ranks are given
2. When ranks are not given
3. Repeated ranks

The formula for calculating Spearman’s rank correlation coefficient is:

$\displaystyle{\text{r}_\text{s} = 1- \frac{6 \sum \text{d}^2}{\text{n}(\text{n}^2-1)}}$

where $\text{n}$ is the number of items or individuals being ranked and $\text{d}$ is $\text{R}_1 - \text{R}_2$ (where $\text{R}_1$ is the rank of items with respect to the first variable and $\text{R}_2$ is the rank of items with respect to the second variable).

### Kendall’s τ

The definition of the Kendall coefficient is as follows:

Let $(\text{x}_1, \text{y}_1), (\text{x}_2, \text{y}_2), \cdots, (\text{x}_\text{n}, \text{y}_\text{n})$ be a set of observations of the joint random variables $\text{X}$ and $\text{Y}$, respectively, such that all the values of $\text{x}_\text{i}$ and $\text{y}_\text{i}$ are unique. Any pair of observations $(\text{x}_\text{i},\text{y}_\text{i})$ and $(\text{x}_\text{j},\text{y}_\text{j})$ follows these rules:

• The observations are sadi to be concordant if the ranks for both elements agree—that is, if both $\text{x}_\text{i} > \text{x}_\text{j}$ and $\text{y}_\text{i} > \text{y}_\text{j}$, or if both $\text{x}_\text{i} < \text{x}_\text{j}$ and $\text{y}_\text{i} < \text{y}_\text{j}$.
• The observations are said to be discordant if $\text{x}_\text{i} > \text{x}_\text{j}$ and $\text{y}_\text{i} < \text{y}_\text{j}$, or if $\text{x}_\text{i} < \text{x}_\text{j}$ and $\text{y}_\text{i} > \text{y}_\text{j}$.
• The observations are neither concordant nor discordant if $\text{x}_\text{i} = \text{x}_\text{j}$ or $\text{y}_\text{i} = \text{y}_\text{j}$.

The Kendall $\tau$ coefficient is defined as follows:

$\displaystyle{\tau = \frac{(\text{number of concordant pairs}) - (\text{number of discordant pairs})}{\frac{1}{2} \text{n} (\text{n}-1)}}$

and has the following properties:

• The denominator is the total number pair combinations, so the coefficient must be in the range $-1 \leq \tau \leq 1$.
• If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value $1$.
• If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value $-1$.
• If $\text{X}$ and $\text{Y}$ are independent, then we would expect the coefficient to be approximately zero.

Kendall’s $\tau$ and Spearman’s $\rho$ are particular cases of a general correlation coefficient.