6.2 Data Collection Basics

Introduction

What you’ll learn to do: Identify the various aspects of data collection

Data collection is something we should all have a general understanding of, especially now. The United States census is an example of a huge, data collection undertaking. The following video gives some background and information about the US census.

 

In this lesson we will introduce some important terminology related to collecting data. When you are finished you will be able to identify the difference between terms like census and sample, parameter and statistic, quantitative and categorical data. In the following lessons we will rely on your understanding of these terms, so study them well!

Learning Outcomes

  • Identify the population of a study
  • Identify the difference between a census and a sample
  • Determine whether a value calculated from a group is a statistic or a parameter
  • Determine whether a measurement is quantitative or qualitative (categorical)

 

Learning new Math vocabulary and notation

You’ve seen before that learning mathematics is similar to learning a new language — it takes repetition and practice to obtain new vocabulary and symbols. Statistics has a rich, well-defined set of vocabulary and symbols particular to itself. The symbols and terms in this section may be completely unfamiliar to you or you may have seen them before in other contexts. Either way, you’ll need to spend time with them to become fluent in their use in statistics.

Populations and Samples

Before we begin gathering and analyzing data we need to characterize the population we are studying. If we want to study the amount of money spent on textbooks by a typical first-year college student, our population might be all first-year students at your college.  Or it might be:

  • All first-year community college students in the state of Washington.
  • All first-year students at public colleges and universities in the state of Washington.
  • All first-year students at all colleges and universities in the state of Washington.
  • All first-year students at all colleges and universities in the entire United States.
  • And so on.

Population

The population of a study is the group the collected data is intended to describe.

Note: Sometimes the intended population is called the target population, since if we design our study badly, the collected data might not actually be representative of the intended population.

Black and white photo of eggs on a table. They range in hue from white to dark grey

Why is it important to specify the population? We might get different answers to our question as we vary the population we are studying. First-year students at the University of Washington might take slightly more diverse courses than those at your college, and some of these courses may require less popular textbooks that cost more; or, on the other hand, the University Bookstore might have a larger pool of used textbooks, reducing the cost of these books to the students. Whichever the case (and it is likely that some combination of these and other factors are in play), the data we gather from your college will probably not be the same as that from the University of Washington. Particularly when conveying our results to others, we want to be clear about the population we are describing with our data.

 

 

example

A newspaper website contains a poll asking people their opinion on a recent news article.

What is the population?

 

If we were able to gather data on every member of our population, the resulting number would be called a parameter.  For example, the average amount of money spent on textbooks by each and every first-year student at your college during the 2009-2010 academic year would be parameter for that population.

Parameter

A parameter is a value which describes the entire population.

We seldom see parameters, however, since surveying an entire population is usually very time-consuming and expensive, unless the population is very small or we already have the data collected.

Census

A survey of an entire population is called a census.

You are probably familiar with two common censuses: the official government Census that attempts to count the population of the U.S. every ten years, and voting, which asks the opinion of all eligible voters in a district. The first of these demonstrates one additional problem with a census: the difficulty in finding and getting participation from everyone in a large population, which can bias, or skew, the results.

There are occasionally times when a census is appropriate, usually when the population is fairly small. For example, if the manager of Starbucks wanted to know the average number of hours her employees worked last week, she should be able to pull up payroll records or ask each employee directly.

Since surveying an entire population is often impractical, we usually select a sample to study.

Sample

A sample is a smaller subset of the entire population, ideally one that is fairly representative of the whole population.

We will discuss sampling methods in greater detail in a later section.  For now, let us assume that samples are chosen in an appropriate manner.  If we survey a sample, say 100 first-year students at your college, and find the average amount of money spent by these students on textbooks, the resulting number is called a statistic.

Statistic

A statistic is a value which describes a sample.

example

The average weight of all adult Labradors is a parameter while the average weight of a sample of 7 adult Labradors is a statistic.

example

A researcher wanted to know how citizens of Tacoma felt about a voter initiative. To study this, she goes to the Tacoma Mall and randomly selects 500 shoppers and asks them their opinion. Of those shoppers surveyed, 60% indicate they are supportive of the initiative. What is the population and the sample? Is the 60% value a parameter or a statistic?

 

The solutions for two of the examples on this page are detailed in the following video:

 

examples

Example 1: To determine the average length of fish in a lake, researchers catch 20 fish and measure them. What is the sample and population in this study?

 

Example 2: A college reports that the average age of their students is 28 years old. Is this a statistic or a parameter?

Try It