Learning Goals
- Determine which variables are categorical from raw data.
- Read and interpret a frequency table
- Use a data analysis tool to create a frequency table from a dataset
- Read and interpret a bar graph.
- Use technology to create a bar graph from a dataset.
- Read and interpret a pie chart.
- Use technology to create a pie chart from a dataset.
When describing data using graphical representations, certain variables in the data are appropriate for certain visualizations. To prepare for the upcoming activity, you will need to identify variables that would be appropriate for bar graphs and pie charts and answer research questions by reading bar graphs and pie charts. To do this, you will need to be able to determine which variables in a dataset are categorical, understand how frequency tables are formed, and understand how bar graphs and pie charts are formed.
Categorical Variables
In Module 1, Data Collection and Organization, you learned the difference between categorical and quantitative variables. Let’s take a moment to refresh that information before diving into a deep exploration of categorical variables.
Recall
What is the distinguishing feature of a categorical variable? That is, how can we tell a categorical variable apart from a quantitative variable?
Core Skill:
Practice identifying categorical variables in a list in the example. Then, for the data table that follows, identify the categorical variables to answer Question 1.
Example
Which variable(s) in the list below are categorical?
Age, marital status, number of children in the household, zip code, income, education level
Now it’s your turn to practice what you know by looking at a real dataset obtained from a survey and displayed in the table below. Read the information given about the dataset and its variables, then answer the questions that follow.
Variables in a dataset
identifying characteristics of a categorical variable
[Perspective Video — a 3-instructors video illustrating the identifying characteristics of a categorical variable in raw data.)
In 2013, students of the statistics class at FSEV UK, a Slovakian University, were asked to invite their friends to participate in a survey[1]. Data for the first [latex]15[/latex] out of [latex]1,007[/latex] young people who completed the survey are displayed below.
| Young People Survey |
|||||||
| Alcohol | Age | Height | Punctuality | Number of siblings | Enjoy Music 1=strongly disagree, 5=strongly agree | Internet usage | Left – right handed |
| drink a lot | [latex]20[/latex] | [latex]163[/latex] | I am always on time | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| drink a lot | [latex]19[/latex] | [latex]163[/latex] | I am often early | [latex]2[/latex] | [latex]4[/latex] | few hours a day | right handed |
| drink a lot | [latex]20[/latex] | [latex]176[/latex] | I am often running late | [latex]2[/latex] | [latex]5[/latex] | few hours a day | right handed |
| drink a lot | [latex]22[/latex] | [latex]172[/latex] | I am often early | [latex]1[/latex] | [latex]5[/latex] | most of the day | right handed |
| social drinker | [latex]20[/latex] | [latex]170[/latex] | I am always on time | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| never | [latex]20[/latex] | [latex]186[/latex] | I am often early | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| social drinker | [latex]20[/latex] | [latex]177[/latex] | I am often early | [latex]1[/latex] | [latex]5[/latex] | less than an hour a day | right handed |
| drink a lot | [latex]19[/latex] | [latex]184[/latex] | I am always on time | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| social drinker | [latex]18[/latex] | [latex]166[/latex] | I am often early | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| drink a lot | [latex]19[/latex] | [latex]174[/latex] | I am often running late | [latex]3[/latex] | [latex]5[/latex] | few hours a day | right handed |
| social drinker | [latex]19[/latex] | [latex]175[/latex] | I am often early | [latex]2[/latex] | [latex]5[/latex] | less than an hour a day | left handed |
| never | [latex]17[/latex] | [latex]176[/latex] | I am often running late | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| social drinker | [latex]24[/latex] | [latex]168[/latex] | I am often running late | [latex]10[/latex] | [latex]5[/latex] | few hours a day | right handed |
| social drinker | [latex]19[/latex] | [latex]165[/latex] | I am often early | [latex]1[/latex] | [latex]5[/latex] | few hours a day | right handed |
| social drinker | [latex]22[/latex] | [latex]175[/latex] | I am often early | [latex]1[/latex] | [latex]5[/latex] | most of the day | right handed |
The following eight variables are included in the dataset. The variable names are presented in italics, followed by a brief description. You may recall from Forming Connections in (1D) that this is often called a data dictionary.
- Alcohol: “never,” “social drinker,” or “drink a lot”
- Age: Years
- Height: Height in centimeters (cm)
- Punctuality: “I am often early,” “I am often on time,” or “I am often running late”
- Number of siblings
- Enjoy music: Participants were asked “Do you enjoy music?” and reported on a 5-point Likert scale: strongly disagree = 1, strongly agree = 5
- Internet usage: “no time at all,” “less than an hour a day,” “few hours a day,” or “most of the day”
- Left – right handed
Question 1
Frequency Tables
A frequency table lists the number of observations (the frequency) of each unique value of a categorical variable. The frequency is commonly referred to as the count. See the example below, which illustrates a partially complete, simple frequency table for a survey in which [latex]20[/latex] people were asked to answer a question by choosing one of four responses: strongly disagree, disagree, agree, or strongly agree.
Example
| Variable Name | Frequency (number of times each response appears in the dataset) |
| Strongly disagree | [latex]2[/latex] |
| Disagree | [latex]5[/latex] |
| Agree | [latex]9[/latex] |
| Strongly agree |
For the frequency table above, answer the following questions:
a) If [latex]20[/latex] responses were collected in the survey, how many people responded with “Strongly agree?”
b) What is the total frequency for the table?
Interpreting frequency tables
Frequency tables often include a column for the relative frequency. The relative frequency represents the proportion of observations that are in a particular category and can be expressed as a decimal or a percentage. To find relative frequency, divide the count of a particular category by the sum of all the counts. See the recall box below for a refresher on how to convert ratios to proportions and percentages. Also see the Student Resources: Fractions, Decimals, Percentages and Ratios and Fractions.
Recall
When calculating relative frequencies, you’ll need to convert a proportion to a percent. You’ve probably done this before, so this should be a refresher.
Click the link below to see how to convert the fraction [latex]\dfrac{2}{15}[/latex] to a proportion and then to a percentage.
Core Skill:
creating a frequency table
[Worked Example — a 3-instructors worked example of creating a frequency table for a categorical variable identified from a data table. )
The following frequency table displays the frequency and relative frequency (as a decimal and a percentage) of the categorical variable Alcohol for the first [latex]15[/latex] young people who responded to the survey.
| Alcohol | Frequency | Relative Frequency | Percent (%) |
| Never | [latex]2[/latex] | [latex]\dfrac{2}{15} = 0.1333[/latex] | [latex]13.33[/latex] |
| Social drinker | |||
| Drink a lot |
question 2
- Young people survey. (2016, December 6). Kaggle. Retrieved from https://www.kaggle.com/miroslavsabo/young-people-survey ↵