Summary: Data, Sampling and Variation in Data and Sampling

Key Concepts

  • Data can be categorical or quantitative (numerical).
  • Graphs that can be used to represent categorical data include bar graphs, pie graphs and Pareto charts.
  • There are a variety of ways to create a sample, including simple random sample, cluster sample, systematic random sample, stratified random sample, and convenience sampling.
  • When making conclusions from data, one needs to take into account sampling errors, sampling bias, and other possible problems from gathering data.

Glossary

bar graph: the length of the bar for each category is proportional to the number or percent of individuals in each category. Bars may be vertical or horizontal.

cluster sampling: a method for selecting a random sample and dividing the population into groups (clusters); use simple random sampling to select a set of clusters. Every individual in the chosen clusters is included in the sample.

continuous random variable: a random variable (RV) whose outcomes are measured. The height of trees in the forest is a continuous RV.

convenience sampling: a nonrandom method of selecting a sample. This method selects individuals that are easily accessible and may result in biased data.

data: a set of observations (a set of possible outcomes). Most data can be put into two groups: qualitative (an attribute whose value is indicated by a label) or quantitative (an attribute whose value is indicated by a number).

qualitative data: the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data.

quantitative data: can be separated into two subgroups, discrete and continuous. Data is discrete if it is the result of counting (such as the number of students of a given ethnic group in a class or the number of books on a shelf). Data is continuous if it is the result of measuring (such as distance traveled or weight of luggage).

discrete random variable: a random variable (RV) whose outcomes are counted

nonsampling error: an issue that affects the reliability of sampling data other than natural variation. It includes a variety of human errors including poor study design, biased sampling methods, inaccurate information provided by study participants, data entry errors, and poor analysis.

pie chart: a graph where categories of data are represented by wedges in a circle and are proportional in size to the percent of individuals in each category

Pareto chart: a graph that consists of bars that are sorted into order by category size (largest to smallest)

proportionate: a value that’s appropriately representative of the whole

random sampling: a method of selecting a sample that gives every member of the population an equal chance of being selected

sampling bias: not all members of the population are equally likely to be selected

sampling error: the natural variation that results from selecting a sample to represent a larger population. This variation decreases as the sample size increases, so selecting larger samples reduces sampling error.

sampling with replacement: once a member of the population is selected for inclusion in a sample, that member is returned to the population for the selection of the next individual

sampling without replacement: a member of the population may be chosen for inclusion in a sample only once. If chosen, the member is not returned to the population before the next selection.

simple random sampling: a straightforward method for selecting a random sample; give each member of the population a number, use a random number generator to select a set of labels. These randomly selected labels identify the members of your sample.

stratified sampling: a method for selecting a random sample used to ensure that subgroups of the population are represented adequately; divide the population into groups (strata). Use simple random sampling to identify a proportionate number of individuals from each stratum.

systematic sampling: a method for selecting a random sample; list the members of the population. Use simple random sampling to select a starting point in the population. Let k = (number of individuals in the population)/(number of individuals needed in the sample). Choose every kth individual in the list starting with the one that was randomly selected. If necessary, return to the beginning of the population list to complete your sample.