Learning Goals
Deepen your understanding and form connections within these skills:
- Organize data in a spreadsheet.
- Distinguish between observational units and variables in a dataset.
- Distinguish between categorical and quantitative variables.
- Distinguish between quantitative variables that are discrete or continuous.
- Identify variables that can be used to collect data.
In What to Know [1C], you learned to distinguish between statistical investigative questions and survey questions. You also began to see that some data could be numerical or non-numerical. In this activity, we’ll extend your understanding of statistical problem-solving by learning some key terms and organizational strategies associated with data collection.
Recall the four steps of the statistical problem-solving process, from (1) forming a statistical question and (2) collecting data to (3) analyzing the data and (4) interpreting the results. Today we’ll consider the connection between the first two steps. That is, how do we get from the statistical investigative question to a data collection plan? Along the way, you’ll be able to see that there are multiple data collection and organization strategies that may be considered for a single statistical question. You’ll also consider ethical obligations related to data collection and storage.
Data Collection and Organization
In practice, there are often multiple data collection options to consider. For example, if we were interested in the relationship between phone use in class and grades, there are many ways to define the relevant variables and collect and organize the information.

Consider Question 1 below individually, then compare your answer with a partner and discuss the similarities and differences in your answers.
Question 1
Do you think there is a relationship between a student’s phone use in class and their grades? Are there any details about “phone use” that are important to consider?
Data Organization
A dataset contains information about a group of individuals or observational units. The characteristics of these observational units are recorded as variables. For example, the researcher collecting data on student phone use might ask individual students to report the number of times they checked their messages during class. In this case, the variable is the number of times messages were checked during class and the observational unit is one student response. Prior to analyzing the data, it needs to be organized into a spreadsheet in rows and columns. See the example below for a demonstration.
example
Picture yourself as the researcher collecting responses for many survey questions (variables) from each individual (observational unit) you survey. The data will be organized into a spreadsheet, which consists of rows and columns. Naturally, there are only two possibilities for arranging the variable responses for each individual surveyed.
Which of the following two options do you think represents the way observational units and variables are usually organized in a spreadsheet?
Option A: Each row is a variable and each column is an observational unit
| Variabiles | Individual 1 | Individual 2 |
| Variable 1 | response 1 | response 1 |
| Variable 2 | response 2 | response 2 |
| Variable 3 | response 3 | response 3 |
Option B: Each row is an observational unit and each column is a variable
| Individual surveyed | Variable 1 | Variable 2 | Variable 3 |
| Individual 1 | response 1 | response 2 | response 3 |
| Individual 2 | response 1 | response 2 | response 3 |
[The hidden answer includes a link to an open access article: Data Organization in Spreadsheets published in The American Statistician and located at Taylor & Francis Online. Please edit as needed in the preferred citation style.]
Are you beginning to develop an image of how data can be organized in a spreadsheet? Answer Question 2 below to check your understanding.
Question 2
A dataset contains information about a group of individuals or observational units. The characteristics of these observational units are recorded as variables. How are observational units and variables usually organized in a spreadsheet?
Types of Variables
A variable is classified as categorical if it places an individual into one of several groups; it is classified as quantitative if it takes numerical values that can be used in arithmetic.
There are two types of quantitative variables. A discrete variable takes a fixed set of possible values, and it is not possible to get any value in between. In contrast, the range of outcomes for a continuous variable includes an infinite number of possible values. The discussion below provides a demonstration and examples of these types of variables. Try the question given in the Example before moving to Question 3.
example
Categorical Variables
These variables place an individual into one of several groups. Categorical survey questions are often encountered when completing forms that ask for information such as gender and race.
Quantitative Variables
Quantitative variables may be discrete or continuous.
Discrete variables often require non-negative whole numbers as responses. For example, an automobile insurance applicant may be asked for how many accidents were they found to be at fault. Responses would necessarily be a whole number like [latex]0[/latex], [latex]1[/latex], or [latex]2[/latex].
Continuous variables take any number or fraction of a number as a response, such as weight in pounds ([latex]155[/latex], [latex]187.2[/latex], or [latex]221.9[/latex]).
Ex. Imagine that you have been selected as a statistics intern in a veterinary clinic. The veterinarian wants to collect data about the dogs seen in her office. You’ve been asked to record information from the patient files to answer the survey questions listed below. For each, state whether the associated variable is categorical, discrete quantitative, or continuous quantitative and explain how you know.
- What zip code does the dog’s owner live in?
- What is the dog’s weight in pounds?
- How many times has the dog been seen in the office?
- Does the owner have an outstanding balance due?
- How many pets are in the household in addition to the dog? ([latex]0[/latex], [latex]1-2[/latex], [latex]3-5[/latex], more than [latex]5[/latex])
Now you try identifying the types of variables present in survey questions with a partner. Work in pairs to discuss the list of survey questions given in Question 3.
question 3
Consider the survey questions below. If you used these questions to collect data, would the resulting variables be categorical or quantitative? For variables that are quantitative, classify them as discrete or continuous.
What type of mobile phone do you have? (iPhone, Android, other)
What is your area code?
How many devices capable of connecting to the Internet do you bring with you to class on a typical day?
How much time did you spend on your phone yesterday? (less than 2 hours, 2–5 hours, more than 5 hours)
Approximately how much time do you spend on your phone in a typical day?
Do you usually spend more time on your phone on weekdays or on weekends?