Putting It Together: Descriptive Statistics

Let’s Summarize

In Descriptive Statistics, we focused on describing the distribution of a variable.

  • To analyze the distribution of a quantitative variable, we describe the overall pattern of the data (shape, center, spread) and any deviations from the pattern (outliers). We use three types of graphs to analyze the distribution of a quantitative variable: stem-and-leaf plot, histograms, and box plots.
  • Other graphs that can be used to analyze data include line graphs (to show changes over time) and bar graphs (to show relationships in categorical data).
  • We described the shape of a distribution as left-skewed, right-skewed, symmetric with a central peak (bell-shaped).
  • The center of a distribution is a typical value that represents the group. We have two different measurements for determining the center of a distribution: mean and median.
    • The mean is the average. We calculate the mean by adding the data values and dividing by the number of individual data points.
    • The median is the physical center of the data when we make an ordered list. It has the same number of values above it as below it.
    • General Guidelines for Choosing a Measure of Center
      • Always plot the data. We need to use a graph to determine the shape of the distribution. By looking at the shape, we can determine which measure of center best describes the data.
      • Use the mean as a measure of center only for distributions that are reasonably symmetric with a central peak.
      • If a distribution has extreme values or outliers, the median is a better measure of center.
  • The spread of a distribution is a description of how the data varies. We studied three ways to measure spread: range (max – min), the interquartile range (Q3 – Q1), and the standard deviation. When we use the median, Q1 to Q3 gives a typical range of values associated with the middle 50% of the data. When we use the mean, Mean ± SD gives a typical range of values.
    • The interquartile range (IQR) measures the variability in the middle half of the data.
    • Standard deviation measures roughly the average distance of data from the mean.
  • Outliers are data points that fall outside the overall pattern of the distribution.