Displaying Categorical Data: What to Know 1

Learning Goals

  • Determine which variables are categorical from raw data.
  • Read and interpret a frequency table
  • Use a data analysis tool to create a frequency table from a dataset
  • Read and interpret a bar graph.
  • Use technology to create a bar graph from a dataset.
  • Read and interpret a pie chart.
  • Use technology to create a pie chart from a dataset.

When describing data using graphical representations, certain variables in the data are appropriate for certain visualizations. To prepare for the upcoming activity, you will need to identify variables that would be appropriate for bar graphs and pie charts and answer research questions by reading bar graphs and pie charts. To do this, you will need to be able to determine which variables in a dataset are categorical, understand how frequency tables are formed, and understand how bar graphs and pie charts are formed.

Categorical Variables

In Module 1, Data Collection and Organization, you learned the difference between categorical and quantitative variables. Let’s take a moment to refresh that information before diving into a deep exploration of categorical variables.

Recall

What is the distinguishing feature of a categorical variable? That is, how can we tell a categorical variable apart from a quantitative variable?

Core Skill:

Practice identifying categorical variables in a list in the example. Then, for the data table that follows, identify the categorical variables to answer Question 1.

Example

Which variable(s) in the list below are categorical?

Age, marital status, number of children in the household, zip code, income, education level

Now it’s your turn to practice what you know by looking at a real dataset obtained from a survey and displayed in the table below. Read the information given about the dataset and its variables, then answer the questions that follow.

Variables in a dataset

identifying characteristics of a categorical variable

[Perspective Video — a 3-instructors video illustrating the identifying characteristics of a categorical variable in raw data.)

In 2013, students of the statistics class at FSEV UK, a Slovakian University, were asked to invite their friends to participate in a survey[1]. Data for the first [latex]15[/latex] out of [latex]1,007[/latex] young people who completed the survey are displayed below.

Young People Survey
Alcohol Age Height Punctuality Number of siblings Enjoy Music 1=strongly disagree, 5=strongly agree Internet usage Left – right handed
drink a lot [latex]20[/latex] [latex]163[/latex] I am always on time [latex]1[/latex] [latex]5[/latex] few hours a day right handed
drink a lot [latex]19[/latex] [latex]163[/latex] I am often early [latex]2[/latex] [latex]4[/latex] few hours a day right handed
drink a lot [latex]20[/latex] [latex]176[/latex] I am often running late [latex]2[/latex] [latex]5[/latex] few hours a day right handed
drink a lot [latex]22[/latex] [latex]172[/latex] I am often early [latex]1[/latex] [latex]5[/latex] most of the day right handed
social drinker [latex]20[/latex] [latex]170[/latex] I am always on time [latex]1[/latex] [latex]5[/latex] few hours a day right handed
never [latex]20[/latex] [latex]186[/latex] I am often early [latex]1[/latex] [latex]5[/latex] few hours a day right handed
social drinker [latex]20[/latex] [latex]177[/latex] I am often early [latex]1[/latex] [latex]5[/latex] less than an hour a day right handed
drink a lot [latex]19[/latex] [latex]184[/latex] I am always on time [latex]1[/latex] [latex]5[/latex] few hours a day right handed
social drinker [latex]18[/latex] [latex]166[/latex] I am often early [latex]1[/latex] [latex]5[/latex] few hours a day right handed
drink a lot [latex]19[/latex] [latex]174[/latex] I am often running late [latex]3[/latex] [latex]5[/latex] few hours a day right handed
social drinker [latex]19[/latex] [latex]175[/latex] I am often early [latex]2[/latex] [latex]5[/latex] less than an hour a day left handed
never [latex]17[/latex] [latex]176[/latex] I am often running late [latex]1[/latex] [latex]5[/latex] few hours a day right handed
social drinker [latex]24[/latex] [latex]168[/latex] I am often running late [latex]10[/latex] [latex]5[/latex] few hours a day right handed
social drinker [latex]19[/latex] [latex]165[/latex] I am often early [latex]1[/latex] [latex]5[/latex] few hours a day right handed
social drinker [latex]22[/latex] [latex]175[/latex] I am often early [latex]1[/latex] [latex]5[/latex] most of the day right handed

The following eight variables are included in the dataset. The variable names are presented in italics, followed by a brief description. You may recall from Forming Connections in (1D) that this is often called a data dictionary.

  • Alcohol: “never,” “social drinker,” or “drink a lot”
  • Age: Years
  • Height: Height in centimeters (cm)
  • Punctuality: “I am often early,” “I am often on time,” or “I am often running late”
  • Number of siblings
  • Enjoy music: Participants were asked “Do you enjoy music?” and reported on a 5-point Likert scale: strongly disagree = 1, strongly agree = 5
  • Internet usage: “no time at all,” “less than an hour a day,” “few hours a day,” or “most of the day”
  • Left – right handed

Question 1

Frequency Tables

A frequency table lists the number of observations (the frequency) of each unique value of a categorical variable. The frequency is commonly referred to as the count. See the example below, which illustrates a partially complete, simple frequency table for a survey in which [latex]20[/latex] people were asked to answer a question by choosing one of four responses: strongly disagree, disagree, agree, or strongly agree.

Example

Variable Name Frequency (number of times each response appears in the dataset)
Strongly disagree [latex]2[/latex]
Disagree [latex]5[/latex]
Agree [latex]9[/latex]
Strongly agree

For the frequency table above, answer the following questions:

a) If [latex]20[/latex] responses were collected in the survey, how many people responded with “Strongly agree?”

b) What is the total frequency for the table?

Interpreting frequency tables

Frequency tables often include a column for the relative frequency. The relative frequency represents the proportion of observations that are in a particular category and can be expressed as a decimal or a percentage. To find relative frequency, divide the count of a particular category by the sum of all the counts. See the recall box below for a refresher on how to convert ratios to proportions and percentages. Also see the Student Resources: Fractions, Decimals, Percentages and Ratios and Fractions.

Recall

When calculating relative frequencies, you’ll need to convert a proportion to a percent. You’ve probably done this before, so this should be a refresher.

Click the link below to see how to convert the fraction [latex]\dfrac{2}{15}[/latex] to a proportion and then to a percentage.

Core Skill:

creating a frequency table

[Worked Example — a 3-instructors worked example of creating a frequency table for a categorical variable identified from a data table. )

The following frequency table displays the frequency and relative frequency (as a decimal and a percentage) of the categorical variable Alcohol for the first [latex]15[/latex] young people who responded to the survey.

Alcohol Frequency Relative Frequency Percent (%)
Never [latex]2[/latex] [latex]\dfrac{2}{15} = 0.1333[/latex] [latex]13.33[/latex]
Social drinker
Drink a lot

question 2

 


  1. Young people survey. (2016, December 6). Kaggle. Retrieved from https://www.kaggle.com/miroslavsabo/young-people-survey