Why It Matters: Types of Statistical Studies and Producing Data

We organized this course around the Big Picture of Statistics. As we learn new material, we will always look at how these new ideas relate to the Big Picture. In this way the Big Picture is a diagram that will help us organize and understand the material we will learn throughout the course.

The Big Picture summarizes the steps in a statistical investigation.

We begin a statistical investigation with a research question. The research question is frequently something we want to know about a population. The population can be people or other things, such as animals or objects. For example, we might want to know the answer to questions such as:

  • What percentage of U.S. adults supports the death penalty? (Population: U.S. adults)
  • Do cell phones affect bees? (Population: bees)
  • Do cars get better gas mileage with a new gasoline additive? (Population: cars)

The population is the entire group that we want to know something about:

The Big Picture of statistics. Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference." This diagram represents population as randomly placed black dots in a circle.

In most cases, the population is a large group. Often, the population is so large that we cannot collect information from every individual in the population. So we select a sample from the population. Then we collect data from this sample. This is the first step in the statistical investigation. We call this step producing data.

Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference. Highlighted in this diagram is Step 1: Producing Data

Of course, we need a sample that represents the population well. This involves careful planning but also involves chance. For example, if our goal is to determine the percentage of U.S. adults who favor the death penalty, we do not want our sample to contain only Democrats or only Republicans. So we can give everyone the same opportunity to be in the sample, but we will let chance select the sample.

At this step of the investigation we also carefully define what kind of information we plan to gather. Then we collect the data.

Data is often a long list of information. To make sense of the data, we explore it and summarize it using graphs and different numerical measures, such as percentages or averages. We call this step exploratory data analysis.

Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference. Highlighted in this diagram is Step 2: Exploratory Data Analysis.

Remember, our goal is to answer a question about a population based on a sample. Of course, samples will vary due to chance, and we will need to answer our question in spite of this variability. So we need to understand how sample results will vary and how sample results relate to the population as a whole when chance is involved. This is where probability comes in.

Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference. Highlighted in this diagram is Step 3: Probability

Probability is the “machinery” behind the last step in the process called inference. We infer something about a population based on a sample. This inference is the conclusion we reach from our sample data that answers our original question about the population.

Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference. Highlighted in this diagram is Step 4: Inference.

Example – The big picture of statistics

At the end of April 2005, ABC News and the Washington Post conducted a poll to determine the percentage of U.S. adults who support the death penalty.

Research question: What percentage of U.S. adults support the death penalty?

Steps in the statistical investigation:

  1. Produce Data: Determine what to measure, then collect the data.
    The poll selected 1,082 U.S. adults at random. Each adult answered this question: “Do you favor or oppose the death penalty for a person convicted of murder?”
  2. Explore the Data: Analyze and summarize the data.
    In the sample, 65% favored the death penalty.
  3. Draw a Conclusion: Use the data, probability, and statistical inference to draw a conclusion about the population.
    Our goal is to determine the percentage of the U.S. adult population that supports the death penalty. We know that different samples will give different results. What are the chances that our sample reflects the opinions of the population within 3%? Probability describes the likelihood that our sample is this accurate. So we can say with 95% confidence that between 62% and 68% of the population favor the death penalty.
Shown on the diagram are Step 1: Producing Data, Step 2: Exploratory Data Analysis, Step 3: Probability, and Step 4: Inference. This diagram includes an example of how this model would answer the question of what percentage of a sample population support the death penalty.

 

Let’s Summarize

A statistical investigation with a research question. Then the investigation proceeds with the following steps:

  • Produce Data: Determine what to measure, then collect the data.
  • Explore the Data: Analyze and summarize the data (also called exploratory data analysis).
  • Draw a Conclusion: Use the data, probability, and statistical inference to draw a conclusion about the population.

Types of Statistical Studies and Producing Data

In this first module, we focus on the produce data step in a statistical investigation. We discuss two types of statistical investigations: the observational study and the experiment. Each type of investigation involves a different approach to collecting data. We will also see that our approach to collecting data determines what we can conclude from the data.

Step 1: Produce Data