Which Test?

Descriptive or Inferential Statistics?

Descriptive statistics and inferential statistics are both important components of statistics when learning about a population.

Learning Objectives

Contrast descriptive and inferential statistics

Key Takeaways

Key Points

  • Descriptive statistics are distinguished from inferential statistics in that descriptive statistics aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent.
  • Descriptive statistics provides simple summaries about the sample. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.
  • Statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. This involves hypothesis testing using a variety of statistical tests.

Key Terms

  • descriptive statistics: A branch of mathematics dealing with summarization and description of collections of data sets, including the concepts of arithmetic mean, median, and mode.
  • inferential statistics: A branch of mathematics that involves drawing conclusions about a population based on sample data drawn from it.

Descriptive Statistics vs. Inferential Statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data, or the quantitative description itself. Descriptive statistics are distinguished from inferential statistics in that descriptive statistics aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, are not developed on the basis of probability theory. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age and the proportion of subjects of each sex.

Descriptive Statistics

Descriptive statistics provides simple summaries about the sample and about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.

For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. For example, a player who shoots 33% is making approximately one shot in every three. The percentage summarizes or describes multiple discrete events. Consider also the grade point average. This single number describes the general performance of a student across the range of their course experiences.

The use of descriptive and summary statistics has an extensive history and, indeed, the simple tabulation of populations and of economic data was the first way the topic of statistics appeared. More recently, a collection of summary techniques has been formulated under the heading of exploratory data analysis: an example of such a technique is the box plot.

image

Box Plot: The box plot is a graphical depiction of descriptive statistics.

In the business world, descriptive statistics provide a useful summary of security returns when researchers perform empirical and analytical analysis, as they give a historical account of return behavior.

Inferential Statistics

For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses a statistical model of the random process that is supposed to generate the data and a particular realization of the random process.

The conclusion of a statistical inference is a statistical proposition. Some common forms of statistical proposition are:

  • an estimate; i.e., a particular value that best approximates some parameter of interest
  • a confidence interval (or set estimate); i.e., an interval constructed using a data set drawn from a population so that, under repeated sampling of such data sets, such intervals would contain the true parameter value with the probability at the stated confidence level
  • a credible interval; i.e., a set of values containing, for example, 95% of posterior belief
  • rejection of a hypothesis
  • clustering or classification of data points into groups

Hypothesis Tests or Confidence Intervals?

Hypothesis tests and confidence intervals are related, but have some important differences.

Learning Objectives

Explain how confidence intervals are used to estimate parameters of interest

Key Takeaways

Key Points

  • When we conduct a hypothesis test, we assume we know the true parameters of interest.
  • When we use confidence intervals, we are estimating the the parameters of interest.
  • The confidence interval for a parameter is not the same as the acceptance region of a test for this parameter, as is sometimes thought.
  • The confidence interval is part of the parameter space, whereas the acceptance region is part of the sample space.

Key Terms

  • hypothesis test: A test that defines a procedure that controls the probability of incorrectly deciding that a default position (null hypothesis) is incorrect based on how likely it would be for a set of observations to occur if the null hypothesis were true.
  • confidence interval: A type of interval estimate of a population parameter used to indicate the reliability of an estimate.

What is the difference between hypothesis testing and confidence intervals? When we conduct a hypothesis test, we assume we know the true parameters of interest. When we use confidence intervals, we are estimating the parameters of interest.

Explanation of the Difference

Confidence intervals are closely related to statistical significance testing. For example, if for some estimated parameter [latex]\theta[/latex] one wants to test the null hypothesis that [latex]\theta=0[/latex] against the alternative that [latex]\theta \neq 0[/latex], then this test can be performed by determining whether the confidence interval for [latex]\theta[/latex] contains [latex]0[/latex].

More generally, given the availability of a hypothesis testing procedure that can test the null hypothesis [latex]\theta = \theta_0[/latex] against the alternative that [latex]\theta \neq \theta_0[/latex] for any value of [latex]\theta_0[/latex], then a confidence interval with confidence level [latex]\gamma = 1-\alpha[/latex] can be defined as containing any number [latex]\theta_0[/latex] for which the corresponding null hypothesis is not rejected at significance level [latex]\alpha[/latex].

In consequence, if the estimates of two parameters (for example, the mean values of a variable in two independent groups of objects) have confidence intervals at a given [latex]\gamma[/latex] value that do not overlap, then the difference between the two values is significant at the corresponding value of [latex]\alpha[/latex]. However, this test is too conservative. If two confidence intervals overlap, the difference between the two means still may be significantly different.

While the formulations of the notions of confidence intervals and of statistical hypothesis testing are distinct, in some senses and they are related, and are complementary to some extent. While not all confidence intervals are constructed in this way, one general purpose approach is to define a [latex]100(1-\alpha)[/latex]% confidence interval to consist of all those values [latex]\theta_0[/latex] for which a test of the hypothesis [latex]\theta = \theta_0[/latex] is not rejected at a significance level of [latex]100 \alpha[/latex]%. Such an approach may not always be an option, since it presupposes the practical availability of an appropriate significance test. Naturally, any assumptions required for the significance test would carry over to the confidence intervals.

It may be convenient to say that parameter values within a confidence interval are equivalent to those values that would not be rejected by a hypothesis test, but this would be dangerous. In many instances the confidence intervals that are quoted are only approximately valid, perhaps derived from “plus or minus twice the standard error,” and the implications of this for the supposedly corresponding hypothesis tests are usually unknown.

It is worth noting that the confidence interval for a parameter is not the same as the acceptance region of a test for this parameter, as is sometimes assumed. The confidence interval is part of the parameter space, whereas the acceptance region is part of the sample space. For the same reason, the confidence level is not the same as the complementary probability of the level of significance.

image

Confidence Interval: This graph illustrates a 90% confidence interval on a standard normal curve.

Quantitative or Qualitative Data?

Different statistical tests are used to test quantitative and qualitative data.

Learning Objectives

Contrast quantitative and qualitative data

Key Takeaways

Key Points

  • Quantitative (numerical) data is any data that is in numerical form, such as statistics and percentages.
  • Qualitative (categorical) data deals with descriptions with words, such as gender or nationality.
  • Paired and unpaired t-tests and z-tests are just some of the statistical tests that can be used to test quantitative data.
  • One of the most common statistical tests for qualitative data is the chi-square test (both the goodness of fit test and test of independence ).

Key Terms

  • qualitative: of descriptions or distinctions based on some quality rather than on some quantity
  • quantitative: of a measurement based on some quantity or number rather than on some quality
  • central limit theorem: The theorem that states: If the sum of independent identically distributed random variables has a finite variance, then it will be (approximately) normally distributed.

Quantitative Data vs. Qualitative Data

Recall the differences between quantitative and qualitative data.

Quantitative (numerical) data is any data that is in numerical form, such as statistics, percentages, et cetera. In layman’s terms, a researcher studying quantitative data asks a specific, narrow question and collects a sample of numerical data from participants to answer the question. The researcher analyzes the data with the help of statistics and hopes the numbers will yield an unbiased result that can be generalized to some larger population.

Qualitative (categorical) research, on the other hand, asks broad questions and collects word data from participants. The researcher looks for themes and describes the information in themes and patterns exclusive to that set of participants. Examples of qualitative variables are male/female, nationality, color, et cetera.

Quantitative Data Tests

Paired and unpaired t-tests and z-tests are just some of the statistical tests that can be used to test quantitative data. We will give a brief overview of these tests here.

A t-test is any statistical hypothesis test in which the test statistic follows a t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a t distribution.

image

t Distribution: Plots of the t distribution for several different degrees of freedom.

A z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level, the z-test has a single critical value. This fact makes it more convenient than the t-test, which has separate critical values for each sample size. Therefore, many statistical tests can be conveniently performed as approximate z-tests if the sample size is large or the population variance known.

Qualitative Data Tests

One of the most common statistical tests for qualitative data is the chi-square test (both the goodness of fit test and test of independence).

The chi-square test tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events considered must be mutually exclusive and have total probability. A common case for this test is where the events each cover an outcome of a categorical variable. A test of goodness of fit establishes whether or not an observed frequency distribution differs from a theoretical distribution, and a test of independence assesses whether paired observations on two variables, expressed in a contingency table, are independent of each other (e.g., polling responses from people of different nationalities to see if one’s nationality is related to the response).

One, Two, or More Groups?

Different statistical tests are required when there are different numbers of groups (or samples).

Learning Objectives

Identify the appropriate statistical test required for a group of samples

Key Takeaways

Key Points

  • One- sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory or are calculated from the population.
  • Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment.
  • Paired tests are appropriate for comparing two samples where it is impossible to control important variables.
  • [latex]\text{F}[/latex]-tests (analysis of variance, also called ANOVA) are used when there are more than two groups. They are commonly used when deciding whether groupings of data by category are meaningful.

Key Terms

  • t-test: Any statistical hypothesis test in which the test statistic follows a Student’s [latex]\text{t}[/latex]-distribution if the null hypothesis is supported.
  • z-test: Any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution.

Depending on how many groups (or samples) with which we are working, different statistical tests are required.

One-sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory, or are calculated from the population. Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment. Paired tests are appropriate for comparing two samples where it is impossible to control important variables. Rather than comparing two sets, members are paired between samples so the difference between the members becomes the sample. Typically the mean of the differences is then compared to zero.

The number of groups or samples is also an important deciding factor when determining which test statistic is appropriate for a particular hypothesis test. A test statistic is considered to be a numerical summary of a data-set that reduces the data to one value that can be used to perform a hypothesis test. Examples of test statistics include the [latex]\text{z}[/latex]-statistic, [latex]\text{t}[/latex]-statistic, chi-square statistic, and [latex]\text{F}[/latex]-statistic.

A [latex]\text{z}[/latex]-statistic may be used for comparing one or two samples or proportions. When comparing two proportions, it is necessary to use a pooled standard deviation for the [latex]\text{z}[/latex]-test. The formula to calculate a [latex]\text{z}[/latex]-statistic for use in a one-sample [latex]\text{z}[/latex]-test is as follows:

[latex]\text{z} = \dfrac{\bar{\text{x}} - \mu_0}{\sigma} \sqrt{\text{n}}[/latex]

where [latex]\bar{\text{x}}[/latex] is the sample mean, [latex]\mu[/latex] is the population mean, [latex]\sigma[/latex] is the population standard deviation, and [latex]\text{n}[/latex] is the sample size.

A [latex]\text{t}[/latex]-statistic may be used for one sample, two samples (with a pooled or unpooled standard deviation), or for a regression [latex]\text{t}[/latex]-test. The formula to calculate a [latex]\text{t}[/latex]-statistic for a one-sample [latex]\text{t}[/latex]-test is as follows:

[latex]\text{t} = \dfrac{\bar{\text{x}} - \mu_0}{\text{s}/\sqrt{\text{n}})}[/latex]

where [latex]\bar{\text{x}}[/latex] is the sample mean, [latex]\mu[/latex] is the population mean, [latex]\text{s}[/latex] is the sample standard deviation, and [latex]\text{n}[/latex] is the sample size.

[latex]\text{F}[/latex]-tests (analysis of variance, also called ANOVA) are used when there are more than two groups. They are commonly used when deciding whether groupings of data by category are meaningful. If the variance of test scores of the left-handed in a class is much smaller than the variance of the whole class, then it may be useful to study lefties as a group. The null hypothesis is that two variances are the same, so the proposed grouping is not meaningful.