{"id":438,"date":"2021-12-20T14:31:45","date_gmt":"2021-12-20T14:31:45","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/?post_type=chapter&#038;p=438"},"modified":"2022-02-17T20:10:00","modified_gmt":"2022-02-17T20:10:00","slug":"what-to-know-about-4b","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/what-to-know-about-4b\/","title":{"raw":"What to Know About Comparing Variability of Datasets: 4B - 20","rendered":"What to Know About Comparing Variability of Datasets: 4B &#8211; 20"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>goals for this section<\/h3>\r\nAfter completing this section, you should feel comfortable performing these skills.\r\n<ul>\r\n \t<li><a href=\"#VarHist\">Compare the variability of multiple datasets visually using histograms.<\/a><\/li>\r\n \t<li><a href=\"#VarDot\">Compare the variability of multiple datasets visually using dotplots.<\/a><\/li>\r\n<\/ul>\r\n<ul>\r\n \t<li><a href=\"#StdDev\">Use a data analysis tool to identify the standard deviation of a dataset.<\/a><\/li>\r\n \t<li><a href=\"#Variance\">Calculate the variance of a dataset given standard deviation<\/a><\/li>\r\n \t<li><a href=\"#Range\">Use a data analysis tool to calculate variability by identifying the range of a dataset.<\/a><\/li>\r\n<\/ul>\r\nClick on a skill above to jump to its location in this section.\r\n\r\n<\/div>\r\nIn the next activity, you will be exploring data and using the measures of center and measures of spread to describe the data. This section will introduce you to <strong>variability<\/strong>, which is a\u00a0measure of how dispersed (spread out) the data are.\u00a0You'll learn to recognize variability in histograms and dotplots by using visual clues. You'll also learn how to calculate measures of variability including standard deviation, variance, and range.\r\n<h2>Comparing Variability<\/h2>\r\nThe <strong>variability<\/strong> of a dataset is often referred to as the spread of a dataset. We can visually assess variability using graphical displays such as histograms and dotplots. When answering Questions 1 - 3 below, consider whether the data <em>appears<\/em> to be more spread out from the center (greater variability) or more clustered toward the center (less variability).\r\n<h3 id=\"VarHist\">Comparing Variability Using Histograms<\/h3>\r\nRecall that a histogram visualizes the distribution of a quantitative variable by displaying rectangular bars representing the frequencies (height of the bar) for intervals of data values called bins (width of the bar). Variability can be judged from a histogram by examining the distance of the bars from the statistical center (mean or median) of the graph. If the variability is high, equally sized or taller bars will appear away from the center of the graph. It the variability is low, the data will appear clustered around the center.\r\n\r\nThe images below show distributions of two different datasets using histograms. The first histogram displays the distribution of responses given by parents of thirteen year old children to the question, \"how much allowance do you give weekly?\" The second is a distribution of the heights in inches of 31 thirteen year old boys attending the same middle-school. Use these histograms to answer Question 1 below.\r\n\r\n<img class=\"alignnone wp-image-1962 \" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2021\/12\/26180049\/WeeklyAllowance_Hist.png\" alt=\"a histogram showing weekly allowance ($) ranging from 0 to 20 dollars.\" width=\"450\" height=\"199\" \/>\u00a0<img class=\"alignnone wp-image-1963 \" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2021\/12\/26184147\/Hist_HeightMales13yr.png\" alt=\"a histogram labeled Height Age 13 Male (inches) which ranges from 56 to 66.\" width=\"450\" height=\"199\" \/>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 1<\/h3>\r\nWhich of the two histograms appears to have <em>less<\/em> variability? Explain.\r\n\r\n[reveal-answer q=\"136179\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"136179\"]In which of the graphs do the values appear to be clustered closer together?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3 id=\"VarDot\">Using Dotplots to Visually Compare Variability<\/h3>\r\nA dotplot indicates the variability of the data or the extent to which each observation\u00a0differs from other observations. It can be easier to visualize variability using a dotplot than using a histogram because of the individual observations visible in the dotplot. Use the side-by-side dotplots in the image below to answer Questions 2 and 3.\r\n\r\nTen customers rated four different smartphone apps. The customer\u00a0ratings for the four different apps are shown in the following dotplots.\u00a0The mean for each app is equal to 3. Even though the mean, [latex]\\bar{x}[\/latex], is the same for each app, the dotplots for each app look very different.\r\n\r\n<strong><img class=\"alignnone wp-image-1004\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/11193655\/Picture37-300x279.png\" alt=\"Four side by side dot plots with the horizontal axis labeled &quot;Rating,&quot; numbered in increments of 1 from 1 to 5. The first plot is labeled App 1. For rating 1, there is 1 dot. For rating 2, there are 2 dots. For rating 3, there are 3 dots. For rating 4, there are 2 dots. For rating 5, there is 1 dot. The next plot is titled App 2. For rating 3, there are 10 dots. The next graph is titled App 3. For rating 1, there are 5 dots. For rating 5, there are 5 dots. The next plot is titled App 4. For every rating, there are two dots.\" width=\"648\" height=\"603\" \/><\/strong>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 2<\/h3>\r\nWhich app has the smallest variability? In other words, in which app are the observations really close together?\r\n<p style=\"padding-left: 30px;\">a) App 1\r\nb) App 2\r\nc) App 3\r\nd) App 4<\/p>\r\n[reveal-answer q=\"506703\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"506703\"]What do <em>you<\/em> think?[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 3<\/h3>\r\nWhich app has the largest variability? In other words, in which app are the observations the furthest apart?\r\n<p style=\"padding-left: 30px;\">a) App 1\r\nb) App 2\r\nc) App 3\r\nd) App 4<\/p>\r\n[reveal-answer q=\"950930\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"950930\"]What do <em>you<\/em> think?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h2>Using Technology to Obtain Descriptive Statistics<\/h2>\r\nLet's go to the technology now and recall how to load a dataset in order to describe and explore it.\r\n\r\nFor Questions 4 and 5, recall the sleep study in which you investigated whether college students' chronotypes tend to be larks (morning people) or owls (night people) by examining graphical representations of the data. Let's use the dataset from that study again here.\r\n<div class=\"textbox\">\r\n\r\nGo to the <em>Describing and Exploring Quantitative Variables<\/em> tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\" target=\"_blank\" rel=\"noopener\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.\r\n<p style=\"padding-left: 30px;\">Step 1) Select the <strong>Single Group<\/strong> tab.<\/p>\r\n<p style=\"padding-left: 30px;\">Step 2) Locate the drop-down menu under <strong>Enter Data<\/strong> and select <strong>From Textbook<\/strong>.<\/p>\r\n<p style=\"padding-left: 30px;\">Step 3) Locate the drop-down menu under <strong>Dataset<\/strong> and select <strong>Sleep Study: Average Sleep<\/strong>.<\/p>\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 4<\/h3>\r\nThe descriptive statistics are displayed in a table at the top of the webpage. Would the observations in this dataset be classified as a sample or a population?\r\n\r\n[reveal-answer q=\"559868\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"559868\"]If a dataset represents an entire population, there are probably not many good unanswered statistical questions about the population. What would you consider to be the population for the Sleep Study? [\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 5<\/h3>\r\nWhat is the average amount of sleep (rounded to the nearest whole number) of the college students in the study?\r\n<p style=\"padding-left: 30px;\">a) 8 hours\r\nb) 9 hours\r\nc) 4 hours\r\nd) 7 hours<\/p>\r\n[reveal-answer q=\"341628\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"341628\"]Obtain measures of center from \"Descriptive Statistics.\"[\/hidden-answer]\r\n\r\n<\/div>\r\n<h2>Measures of Variability<\/h2>\r\nIn statistics, we are particularly interested in understanding how data are distributed and where each observation is in reference to the mean. How spread out a set of observations are is called <strong>variability<\/strong> (also called spread or dispersion). In the remainder of this section, we will focus on three measures of spread: standard deviation, variance, and range.\r\n<div class=\"textbox examples\">\r\n<h3>Recall<\/h3>\r\nWe'll be using statistical formulas and symbols to discuss measures of variability. Take a moment to recall the formula you learned to calculate the mean of a sample. What symbols do we use to represent sample mean, summation, and sample size?\r\n\r\nCore skill: [reveal-answer q=\"450894\"]Express the formula for calculating the mean of a sample[\/reveal-answer]\r\n[hidden-answer a=\"450894\"]\r\n\r\n[latex]\\bar{x}=\\dfrac{\\sum{x}}{n}[\/latex] where [latex]\\bar{x}[\/latex] denotes the sample mean, [latex]\\sum{x}[\/latex] indicates to take a sum of all the sample values [latex]x[\/latex],\u00a0and [latex]n[\/latex] indicates the sample size.\r\n\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox tryit\">\r\n<h3>Standard Deviation<\/h3>\r\n<span style=\"background-color: #99cc00;\">[perspective video -- a 3-instructor video showing how to think about standard deviation as a measure of variability. Cover the parts of the formula (go into\u00a0why squaring, why\u00a0<em>df<\/em> if desired) but emphasize the concept of variability from std dev and variance more so than the technical use of the formula.]<\/span>\r\n\r\n<\/div>\r\n<strong>Standard deviation<\/strong> is a measure of how spread out observations are from the mean. The symbol we use to denote standard deviation differs depending on whether we are discussing a sample or a population. We use the Greek letter [latex]\\sigma[\/latex] (sigma) to denote the standard deviation of a population of observations.\u00a0We use the Latin letter\u00a0[latex]s[\/latex]\u00a0to denote the standard deviation of a sample of observations.\r\n\r\nThe following formulas are used to calculate the standard deviation of a population and a sample:\r\n<p style=\"padding-left: 30px;\">Standard deviation of a population: [latex]\\sigma = \\sqrt{\\dfrac{\\sum \\left(x-\\mu\\right)^2}{n}}[\/latex]<\/p>\r\n<p style=\"padding-left: 30px;\">Standard deviation of a sample: [latex]s=\\sqrt{\\dfrac{\\sum \\left(x-\\bar{x}\\right)^2}{n-1}} [\/latex]<\/p>\r\nThe following steps can be applied to calculate a standard deviation by hand.\r\n<ol>\r\n \t<li>Calculate the mean of the population or sample.<\/li>\r\n \t<li>Take the difference between each data value and the mean. Then square each difference.<\/li>\r\n \t<li>Add up all the squared differences<\/li>\r\n \t<li>Divide by either the total number of observations in the case of a population or by 1 fewer than the total in the case of a sample.<\/li>\r\n \t<li>Take the square root of the result of the division in step 4.<\/li>\r\n<\/ol>\r\n<div class=\"textbox exercises\">\r\n<h3>Interactive example<\/h3>\r\nA sample of observations is listed below. Find it's standard deviation.\r\n<p style=\"padding-left: 30px;\">8, 7, 13, 15, 23, 18<\/p>\r\n[reveal-answer q=\"911375\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"911375\"]\r\n\r\nFirst, calculate the mean of the sample observations.\r\n\r\n[latex]\\bar{x}=\\frac{84}{6}=14[\/latex]\r\n\r\nIdentify all the squared differences.\r\n\r\n[latex]\\left(8-14\\right)^{2}=36\\text{, } \\left(7-14\\right)^{2}=49\\text{, } \\left(13-14\\right)^{2}=1\\text{, } \\left(15-14\\right)^{2}=1\\text{, } \\left(23-14\\right)^{2}=81\\text{, } \\left(18-14\\right)^{2}=16\\\\ [\/latex]\r\n\r\nThen, take the square root of the sum of the squared differences divided by 1 less than the sample size.\r\n\r\n[latex]\\sqrt{\\dfrac{26+49+1+1+81+16}{6-1}}=\\sqrt{\\dfrac{184}{5}}\\approx{6.07}[\/latex]\r\n\r\nThis process would be too tedious for large samples or populations! We'll use technology to calculate standard deviation from now on. Try it now using the\u00a0<em>Describing and Exploring Quantitative Variables<\/em> tool at\u00a0<a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.\u00a0 Choose \"Your Own\" under \"Enter Data\" and enter the observations\u00a08, 7, 13, 15, 23, 18 to read the Std. Dev. from Descriptive Statistics.\r\n\r\nDid you obtain the same result? That was much better than calculating it by hand!\r\n\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\nHere is a breakdown of the formula for standard deviation of a sample, [latex]s[\/latex].\r\n<p style=\"text-align: center;\">[latex]s=\\sqrt{\\dfrac{\\sum \\left(x-\\bar{x}\\right)^2}{n-1}} [\/latex]<\/p>\r\n\r\n<ul>\r\n \t<li style=\"text-align: left;\">The distance from each observation to the mean is known as a <strong>deviation from the mean<\/strong> and is expressed as [latex]\\left(x-\\bar{x}\\right)[\/latex]<\/li>\r\n \t<li style=\"text-align: left;\"><strong>The deviations from the mean are squared<\/strong> in the formula because some observations are above the mean, thus [latex]\\left(x-\\bar{x}\\right)&gt;0[\/latex] (the difference is positive), and some observations are below the mean, thus [latex]\\left(x-\\bar{x}\\right)&lt;0[\/latex] (the difference is negative). Squaring ensures the differences will each be expressed as positive distances and won't cancel each other out when summed up.<\/li>\r\n \t<li style=\"text-align: left;\"><strong>The [latex]\\sum[\/latex] symbol sums up<\/strong> the squared deviations for all [latex]n[\/latex] observations.<span style=\"background-color: #00ffff;\">\r\n<\/span><\/li>\r\n \t<li><strong>The denominator in the formula for a sample standard deviation is [latex]\\left(n-1\\right)[\/latex]<\/strong> rather than [latex]n[\/latex] as in the formula for the population standard deviation.\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\">Why do we divide by 1 fewer than the sample size, <strong>[latex]\\left(n-1\\right)[\/latex]<\/strong>?<\/span>\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\"><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\">[reveal-answer q=\"329088\"]The answer to that is complicated but here are some ideas that may help[\/reveal-answer]<\/span><\/span><\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<p style=\"padding-left: 30px;\">[hidden-answer a=\"329088\"]<\/p>\r\n\r\n<div class=\"textbox shaded\">\r\n<p style=\"padding-left: 30px;\"><strong>Why do we divide by [latex]\\left(n-1\\right)[\/latex]?\u00a0<\/strong><\/p>\r\n\r\n<ul>\r\n \t<li><strong>Because the sample standard deviation is an underestimation.\u00a0<\/strong>Recall that a sample is representative of a population if the characteristics of the sample tend to be similar to the characteristics of the population from which it was obtained. But the sample standard deviation tends to underestimate the population standard deviation\u00a0(this can be shown mathematically but its beyond the scope of what we need here). We can fix that by increasing the size of our sample standard deviation if we divide by [latex]\\left(n-1\\right)[\/latex] in the sample standard deviation formula rather than by [latex]n[\/latex].<\/li>\r\n \t<li><strong>Because we are using\u00a0<em>degrees of freedom<\/em> in the denominator.\u00a0<\/strong>You may have heard that the denominator in the standard deviation formula is called the\u00a0<em>degrees of freedom<\/em>, abbreviated <em>df<\/em>. That's true, and it helps us to compensate for the underestimation that crops up when we divide strictly by sample size. There's a lot going on here mathematically, but we can think of it this way: dividing by [latex]\\left(n-1\\right)[\/latex] instead of [latex]n[\/latex]\u00a0helps our sample standard deviation more closely resemble the true (usually unknowable) population standard deviation. This will help make our statistical analysis more reasonable.<\/li>\r\n \t<li><strong>What are degrees of freedom, anyway?<\/strong> A nice way to think of degrees of freedom, [latex]\\left(n-1\\right)[\/latex] is to imagine a set of three numbers whose mean is 5: say, 4, 5, and 6. If those three numbers were written on pieces of paper in a hat, and you chose two of them, say 4 and 5, first, then the only way to get a mean of 5 from the numbers on three scraps of paper would be that the next choice must have a 6 on it. We could say that the first two scraps were <em>free to vary<\/em>; they could have been 4 or 5 or 6 as they pleased. But the third pick couldn't vary. After choosing the 4 and the 5 freely first, there was no freedom for the choice of the third in order to obtain the desired mean. Only two of our choices had a degree of freedom, so we say that the degrees of freedom of a sample size of 3 is [latex]\\left(3-1\\right)=2[\/latex].<\/li>\r\n \t<li><span style=\"background-color: #ffff99;\">Sal Khan's video does a fairly good job as an explanation of the idea of n-1 for an Intro Stats student, but I'd like to see if we can find\/develop a better one. <\/span>For a more detailed discussion, see <a href=\"https:\/\/www.khanacademy.org\/math\/ap-statistics\/summarizing-quantitative-data-ap\/more-standard-deviation\/v\/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance\">https:\/\/www.khanacademy.org\/math\/ap-statistics\/summarizing-quantitative-data-ap\/more-standard-deviation\/v\/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance<\/a><\/li>\r\n<\/ul>\r\n<\/div>\r\n<span style=\"font-size: 1rem; orphans: 1; text-align: initial;\">[\/hidden-answer]<\/span>\r\n<ul>\r\n \t<li><span style=\"font-size: 1em;\"><strong>The square root is taken<\/strong> in order to express the spread in terms of the units of the observations.\u00a0<\/span>Recall that we squared the differences to express them as positive distances, which resulted in squared observation units. Taking the square root can be thought of as \"undoing\" the earlier squaring.\u00a0For example, assume that within the context in which you are working, the data are in terms of dollars. If we do not take the square root, the standard deviation will be\u00a0in terms of dollars squared, which is not something commonly used.<\/li>\r\n \t<li><strong>The standard deviation, [latex]s[\/latex], represents the \u201ctypical\u201d distance of an observation from the mean of the dataset.<\/strong><\/li>\r\n<\/ul>\r\nDon\u2019t worry. We will be using the <em>DCMP Statistical Analysis Tools<\/em> to calculate standard deviation for us!\r\n\r\nLet's practice using the tool by finding the standard deviation of the variable Average Sleep in the Sleep Study dataset.\r\n<div class=\"textbox tryit\">\r\n<h3>Use a data analysis tool to identify the standard deviation of a dataset<\/h3>\r\n<span style=\"background-color: #99cc00;\">[Worked example video - a 3-instructor video showing how to use the tool as in Questions 6 - 8 to calculate standard deviation, variance, and range with commentary on what these values imply for there being \"more\" or \"less\" variability in the data. <\/span>\r\n\r\n<\/div>\r\n<div class=\"textbox\">\r\n\r\nGo to the <em>Describing and Exploring Quantitative Variables<\/em> tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\" target=\"_blank\" rel=\"noopener\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.\r\n<p style=\"padding-left: 30px;\">Step 1) Select the <strong>Single Group<\/strong> tab.<\/p>\r\n<p style=\"padding-left: 30px;\">Step 2) Locate the drop-down menu under <strong>Enter Data<\/strong> and select <strong>From Textbook<\/strong>.<\/p>\r\n<p style=\"padding-left: 30px;\">Step 3) Locate the drop-down menu under <strong>Dataset<\/strong> and select <strong>Sleep Study: Average Sleep<\/strong>.<\/p>\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 6<\/h3>\r\nWhat is the standard deviation of the average number of hours a college student in the study sleeps per week? Make sure to include units in your answer.\r\n\r\n[reveal-answer q=\"805189\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"805189\"]In the tool, look for \u201cStd. Dev.\u201d in the table under Descriptive Statistics.[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3 id=\"Variance\">Variance<\/h3>\r\n<strong>Variance<\/strong> is the standard deviation squared. We use the Greek letter [latex]\\sigma^{2}[\/latex] (sigma squared) to denote the variance of a population of observations, and we use [latex]s^{2}[\/latex] to denote the variation of a sample of observations. The following formulas are used to calculate the variation of a population and a sample:\r\n<p style=\"padding-left: 30px;\">Variance of a population: [latex]\\sigma^{2}=\\dfrac{\\sum\\left(x-\\mu\\right)^{2}}{n}[\/latex]<\/p>\r\n<p style=\"padding-left: 30px;\">Variance of a sample: [latex]s^{2}=\\dfrac{\\sum\\left(x-\\bar{x}\\right)^{2}}{n-1}[\/latex]<\/p>\r\n<strong>Important<\/strong>: The <em>Describing and Exploring Quantitative Variables<\/em> tool does not calculate the variance, so you will need to use the tool to calculate the standard deviation and then square it by hand in order to get the variance.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 7<\/h3>\r\nWhat is the variance of the average number of hours college students in the study sleep? Round to 3 decimal places. Make sure to include units in your answer.\r\n\r\n[reveal-answer q=\"280916\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"280916\"]First identify the Std. Dev. using the tool, then square that value by hand. Round to 3 decimal places.[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3 id=\"Range\">Range<\/h3>\r\nThe simplest way to calculate the variability of a dataset is with the <strong>range<\/strong>:\r\n<p style=\"text-align: center;\">Range = maximum value \u2013 minimum value<\/p>\r\n<p style=\"text-align: center;\">or<\/p>\r\n<p style=\"text-align: center;\">Range = largest value \u2013 smallest value<\/p>\r\nLarger values of range indicate more variability in the data. However, the range value only utilizes two observations in the entire dataset to measure variability. This is not an ideal measure of spread, but when used in combination with other measures of spread, it can help us gain a clearer understanding of the spread of a distribution.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 8<\/h3>\r\nWhat is the range of the average number of hours a college student in the study sleeps per week?\r\n\r\n[reveal-answer q=\"236844\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"236844\"]Look for \u201cMax.\u201d and \u201cMin.\u201d in the summary statistics within the Describing and Exploring Quantitative Variables tool.[\/hidden-answer]\r\n\r\n<\/div>\r\n<h2>Summary<\/h2>\r\nIn this section, you've learned about variability in a dataset in preparation for exploring data via the measures of center and spread. Let's summarize where these skills showed up in the material.\r\n<ul>\r\n \t<li>In Questions 1, 2, and 3, you visually assessed the differences in variability, given comparative histograms or dotplots.<\/li>\r\n \t<li>In Questions 4 and 5, you gained experience using the summary statistics feature of the\u00a0<em>Describing and Exploring Quantitative\u00a0Variables\u00a0<\/em>tool.<\/li>\r\n \t<li>In questions 6 - 8, you used technology to calculate measures of variability: standard deviation, variance, and range.<\/li>\r\n<\/ul>\r\nExploring the measures of center and spread to describe data is a necessary skill for completing the next activity. If you feel comfortable with these skills, it's time to move on!\r\n\r\n&nbsp;","rendered":"<div class=\"textbox learning-objectives\">\n<h3>goals for this section<\/h3>\n<p>After completing this section, you should feel comfortable performing these skills.<\/p>\n<ul>\n<li><a href=\"#VarHist\">Compare the variability of multiple datasets visually using histograms.<\/a><\/li>\n<li><a href=\"#VarDot\">Compare the variability of multiple datasets visually using dotplots.<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"#StdDev\">Use a data analysis tool to identify the standard deviation of a dataset.<\/a><\/li>\n<li><a href=\"#Variance\">Calculate the variance of a dataset given standard deviation<\/a><\/li>\n<li><a href=\"#Range\">Use a data analysis tool to calculate variability by identifying the range of a dataset.<\/a><\/li>\n<\/ul>\n<p>Click on a skill above to jump to its location in this section.<\/p>\n<\/div>\n<p>In the next activity, you will be exploring data and using the measures of center and measures of spread to describe the data. This section will introduce you to <strong>variability<\/strong>, which is a\u00a0measure of how dispersed (spread out) the data are.\u00a0You&#8217;ll learn to recognize variability in histograms and dotplots by using visual clues. You&#8217;ll also learn how to calculate measures of variability including standard deviation, variance, and range.<\/p>\n<h2>Comparing Variability<\/h2>\n<p>The <strong>variability<\/strong> of a dataset is often referred to as the spread of a dataset. We can visually assess variability using graphical displays such as histograms and dotplots. When answering Questions 1 &#8211; 3 below, consider whether the data <em>appears<\/em> to be more spread out from the center (greater variability) or more clustered toward the center (less variability).<\/p>\n<h3 id=\"VarHist\">Comparing Variability Using Histograms<\/h3>\n<p>Recall that a histogram visualizes the distribution of a quantitative variable by displaying rectangular bars representing the frequencies (height of the bar) for intervals of data values called bins (width of the bar). Variability can be judged from a histogram by examining the distance of the bars from the statistical center (mean or median) of the graph. If the variability is high, equally sized or taller bars will appear away from the center of the graph. It the variability is low, the data will appear clustered around the center.<\/p>\n<p>The images below show distributions of two different datasets using histograms. The first histogram displays the distribution of responses given by parents of thirteen year old children to the question, &#8220;how much allowance do you give weekly?&#8221; The second is a distribution of the heights in inches of 31 thirteen year old boys attending the same middle-school. Use these histograms to answer Question 1 below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1962\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2021\/12\/26180049\/WeeklyAllowance_Hist.png\" alt=\"a histogram showing weekly allowance ($) ranging from 0 to 20 dollars.\" width=\"450\" height=\"199\" \/>\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1963\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2021\/12\/26184147\/Hist_HeightMales13yr.png\" alt=\"a histogram labeled Height Age 13 Male (inches) which ranges from 56 to 66.\" width=\"450\" height=\"199\" \/><\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 1<\/h3>\n<p>Which of the two histograms appears to have <em>less<\/em> variability? Explain.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q136179\">Hint<\/span><\/p>\n<div id=\"q136179\" class=\"hidden-answer\" style=\"display: none\">In which of the graphs do the values appear to be clustered closer together?<\/div>\n<\/div>\n<\/div>\n<h3 id=\"VarDot\">Using Dotplots to Visually Compare Variability<\/h3>\n<p>A dotplot indicates the variability of the data or the extent to which each observation\u00a0differs from other observations. It can be easier to visualize variability using a dotplot than using a histogram because of the individual observations visible in the dotplot. Use the side-by-side dotplots in the image below to answer Questions 2 and 3.<\/p>\n<p>Ten customers rated four different smartphone apps. The customer\u00a0ratings for the four different apps are shown in the following dotplots.\u00a0The mean for each app is equal to 3. Even though the mean, [latex]\\bar{x}[\/latex], is the same for each app, the dotplots for each app look very different.<\/p>\n<p><strong><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1004\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/11193655\/Picture37-300x279.png\" alt=\"Four side by side dot plots with the horizontal axis labeled &quot;Rating,&quot; numbered in increments of 1 from 1 to 5. The first plot is labeled App 1. For rating 1, there is 1 dot. For rating 2, there are 2 dots. For rating 3, there are 3 dots. For rating 4, there are 2 dots. For rating 5, there is 1 dot. The next plot is titled App 2. For rating 3, there are 10 dots. The next graph is titled App 3. For rating 1, there are 5 dots. For rating 5, there are 5 dots. The next plot is titled App 4. For every rating, there are two dots.\" width=\"648\" height=\"603\" \/><\/strong><\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 2<\/h3>\n<p>Which app has the smallest variability? In other words, in which app are the observations really close together?<\/p>\n<p style=\"padding-left: 30px;\">a) App 1<br \/>\nb) App 2<br \/>\nc) App 3<br \/>\nd) App 4<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q506703\">Hint<\/span><\/p>\n<div id=\"q506703\" class=\"hidden-answer\" style=\"display: none\">What do <em>you<\/em> think?<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 3<\/h3>\n<p>Which app has the largest variability? In other words, in which app are the observations the furthest apart?<\/p>\n<p style=\"padding-left: 30px;\">a) App 1<br \/>\nb) App 2<br \/>\nc) App 3<br \/>\nd) App 4<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q950930\">Hint<\/span><\/p>\n<div id=\"q950930\" class=\"hidden-answer\" style=\"display: none\">What do <em>you<\/em> think?<\/div>\n<\/div>\n<\/div>\n<h2>Using Technology to Obtain Descriptive Statistics<\/h2>\n<p>Let&#8217;s go to the technology now and recall how to load a dataset in order to describe and explore it.<\/p>\n<p>For Questions 4 and 5, recall the sleep study in which you investigated whether college students&#8217; chronotypes tend to be larks (morning people) or owls (night people) by examining graphical representations of the data. Let&#8217;s use the dataset from that study again here.<\/p>\n<div class=\"textbox\">\n<p>Go to the <em>Describing and Exploring Quantitative Variables<\/em> tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\" target=\"_blank\" rel=\"noopener\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.<\/p>\n<p style=\"padding-left: 30px;\">Step 1) Select the <strong>Single Group<\/strong> tab.<\/p>\n<p style=\"padding-left: 30px;\">Step 2) Locate the drop-down menu under <strong>Enter Data<\/strong> and select <strong>From Textbook<\/strong>.<\/p>\n<p style=\"padding-left: 30px;\">Step 3) Locate the drop-down menu under <strong>Dataset<\/strong> and select <strong>Sleep Study: Average Sleep<\/strong>.<\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 4<\/h3>\n<p>The descriptive statistics are displayed in a table at the top of the webpage. Would the observations in this dataset be classified as a sample or a population?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q559868\">Hint<\/span><\/p>\n<div id=\"q559868\" class=\"hidden-answer\" style=\"display: none\">If a dataset represents an entire population, there are probably not many good unanswered statistical questions about the population. What would you consider to be the population for the Sleep Study? <\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 5<\/h3>\n<p>What is the average amount of sleep (rounded to the nearest whole number) of the college students in the study?<\/p>\n<p style=\"padding-left: 30px;\">a) 8 hours<br \/>\nb) 9 hours<br \/>\nc) 4 hours<br \/>\nd) 7 hours<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q341628\">Hint<\/span><\/p>\n<div id=\"q341628\" class=\"hidden-answer\" style=\"display: none\">Obtain measures of center from &#8220;Descriptive Statistics.&#8221;<\/div>\n<\/div>\n<\/div>\n<h2>Measures of Variability<\/h2>\n<p>In statistics, we are particularly interested in understanding how data are distributed and where each observation is in reference to the mean. How spread out a set of observations are is called <strong>variability<\/strong> (also called spread or dispersion). In the remainder of this section, we will focus on three measures of spread: standard deviation, variance, and range.<\/p>\n<div class=\"textbox examples\">\n<h3>Recall<\/h3>\n<p>We&#8217;ll be using statistical formulas and symbols to discuss measures of variability. Take a moment to recall the formula you learned to calculate the mean of a sample. What symbols do we use to represent sample mean, summation, and sample size?<\/p>\n<p>Core skill: <\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q450894\">Express the formula for calculating the mean of a sample<\/span><\/p>\n<div id=\"q450894\" class=\"hidden-answer\" style=\"display: none\">\n<p>[latex]\\bar{x}=\\dfrac{\\sum{x}}{n}[\/latex] where [latex]\\bar{x}[\/latex] denotes the sample mean, [latex]\\sum{x}[\/latex] indicates to take a sum of all the sample values [latex]x[\/latex],\u00a0and [latex]n[\/latex] indicates the sample size.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox tryit\">\n<h3>Standard Deviation<\/h3>\n<p><span style=\"background-color: #99cc00;\">[perspective video &#8212; a 3-instructor video showing how to think about standard deviation as a measure of variability. Cover the parts of the formula (go into\u00a0why squaring, why\u00a0<em>df<\/em> if desired) but emphasize the concept of variability from std dev and variance more so than the technical use of the formula.]<\/span><\/p>\n<\/div>\n<p><strong>Standard deviation<\/strong> is a measure of how spread out observations are from the mean. The symbol we use to denote standard deviation differs depending on whether we are discussing a sample or a population. We use the Greek letter [latex]\\sigma[\/latex] (sigma) to denote the standard deviation of a population of observations.\u00a0We use the Latin letter\u00a0[latex]s[\/latex]\u00a0to denote the standard deviation of a sample of observations.<\/p>\n<p>The following formulas are used to calculate the standard deviation of a population and a sample:<\/p>\n<p style=\"padding-left: 30px;\">Standard deviation of a population: [latex]\\sigma = \\sqrt{\\dfrac{\\sum \\left(x-\\mu\\right)^2}{n}}[\/latex]<\/p>\n<p style=\"padding-left: 30px;\">Standard deviation of a sample: [latex]s=\\sqrt{\\dfrac{\\sum \\left(x-\\bar{x}\\right)^2}{n-1}}[\/latex]<\/p>\n<p>The following steps can be applied to calculate a standard deviation by hand.<\/p>\n<ol>\n<li>Calculate the mean of the population or sample.<\/li>\n<li>Take the difference between each data value and the mean. Then square each difference.<\/li>\n<li>Add up all the squared differences<\/li>\n<li>Divide by either the total number of observations in the case of a population or by 1 fewer than the total in the case of a sample.<\/li>\n<li>Take the square root of the result of the division in step 4.<\/li>\n<\/ol>\n<div class=\"textbox exercises\">\n<h3>Interactive example<\/h3>\n<p>A sample of observations is listed below. Find it&#8217;s standard deviation.<\/p>\n<p style=\"padding-left: 30px;\">8, 7, 13, 15, 23, 18<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q911375\">Show Answer<\/span><\/p>\n<div id=\"q911375\" class=\"hidden-answer\" style=\"display: none\">\n<p>First, calculate the mean of the sample observations.<\/p>\n<p>[latex]\\bar{x}=\\frac{84}{6}=14[\/latex]<\/p>\n<p>Identify all the squared differences.<\/p>\n<p>[latex]\\left(8-14\\right)^{2}=36\\text{, } \\left(7-14\\right)^{2}=49\\text{, } \\left(13-14\\right)^{2}=1\\text{, } \\left(15-14\\right)^{2}=1\\text{, } \\left(23-14\\right)^{2}=81\\text{, } \\left(18-14\\right)^{2}=16\\\\[\/latex]<\/p>\n<p>Then, take the square root of the sum of the squared differences divided by 1 less than the sample size.<\/p>\n<p>[latex]\\sqrt{\\dfrac{26+49+1+1+81+16}{6-1}}=\\sqrt{\\dfrac{184}{5}}\\approx{6.07}[\/latex]<\/p>\n<p>This process would be too tedious for large samples or populations! We&#8217;ll use technology to calculate standard deviation from now on. Try it now using the\u00a0<em>Describing and Exploring Quantitative Variables<\/em> tool at\u00a0<a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.\u00a0 Choose &#8220;Your Own&#8221; under &#8220;Enter Data&#8221; and enter the observations\u00a08, 7, 13, 15, 23, 18 to read the Std. Dev. from Descriptive Statistics.<\/p>\n<p>Did you obtain the same result? That was much better than calculating it by hand!<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p>Here is a breakdown of the formula for standard deviation of a sample, [latex]s[\/latex].<\/p>\n<p style=\"text-align: center;\">[latex]s=\\sqrt{\\dfrac{\\sum \\left(x-\\bar{x}\\right)^2}{n-1}}[\/latex]<\/p>\n<ul>\n<li style=\"text-align: left;\">The distance from each observation to the mean is known as a <strong>deviation from the mean<\/strong> and is expressed as [latex]\\left(x-\\bar{x}\\right)[\/latex]<\/li>\n<li style=\"text-align: left;\"><strong>The deviations from the mean are squared<\/strong> in the formula because some observations are above the mean, thus [latex]\\left(x-\\bar{x}\\right)>0[\/latex] (the difference is positive), and some observations are below the mean, thus [latex]\\left(x-\\bar{x}\\right)<0[\/latex] (the difference is negative). Squaring ensures the differences will each be expressed as positive distances and won&#8217;t cancel each other out when summed up.<\/li>\n<li style=\"text-align: left;\"><strong>The [latex]\\sum[\/latex] symbol sums up<\/strong> the squared deviations for all [latex]n[\/latex] observations.<span style=\"background-color: #00ffff;\"><br \/>\n<\/span><\/li>\n<li><strong>The denominator in the formula for a sample standard deviation is [latex]\\left(n-1\\right)[\/latex]<\/strong> rather than [latex]n[\/latex] as in the formula for the population standard deviation.\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\">Why do we divide by 1 fewer than the sample size, <strong>[latex]\\left(n-1\\right)[\/latex]<\/strong>?<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\"><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\">\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q329088\">The answer to that is complicated but here are some ideas that may help<\/span><\/span><\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\">\n<div id=\"q329088\" class=\"hidden-answer\" style=\"display: none\">\n<div class=\"textbox shaded\">\n<p style=\"padding-left: 30px;\"><strong>Why do we divide by [latex]\\left(n-1\\right)[\/latex]?\u00a0<\/strong><\/p>\n<ul>\n<li><strong>Because the sample standard deviation is an underestimation.\u00a0<\/strong>Recall that a sample is representative of a population if the characteristics of the sample tend to be similar to the characteristics of the population from which it was obtained. But the sample standard deviation tends to underestimate the population standard deviation\u00a0(this can be shown mathematically but its beyond the scope of what we need here). We can fix that by increasing the size of our sample standard deviation if we divide by [latex]\\left(n-1\\right)[\/latex] in the sample standard deviation formula rather than by [latex]n[\/latex].<\/li>\n<li><strong>Because we are using\u00a0<em>degrees of freedom<\/em> in the denominator.\u00a0<\/strong>You may have heard that the denominator in the standard deviation formula is called the\u00a0<em>degrees of freedom<\/em>, abbreviated <em>df<\/em>. That&#8217;s true, and it helps us to compensate for the underestimation that crops up when we divide strictly by sample size. There&#8217;s a lot going on here mathematically, but we can think of it this way: dividing by [latex]\\left(n-1\\right)[\/latex] instead of [latex]n[\/latex]\u00a0helps our sample standard deviation more closely resemble the true (usually unknowable) population standard deviation. This will help make our statistical analysis more reasonable.<\/li>\n<li><strong>What are degrees of freedom, anyway?<\/strong> A nice way to think of degrees of freedom, [latex]\\left(n-1\\right)[\/latex] is to imagine a set of three numbers whose mean is 5: say, 4, 5, and 6. If those three numbers were written on pieces of paper in a hat, and you chose two of them, say 4 and 5, first, then the only way to get a mean of 5 from the numbers on three scraps of paper would be that the next choice must have a 6 on it. We could say that the first two scraps were <em>free to vary<\/em>; they could have been 4 or 5 or 6 as they pleased. But the third pick couldn&#8217;t vary. After choosing the 4 and the 5 freely first, there was no freedom for the choice of the third in order to obtain the desired mean. Only two of our choices had a degree of freedom, so we say that the degrees of freedom of a sample size of 3 is [latex]\\left(3-1\\right)=2[\/latex].<\/li>\n<li><span style=\"background-color: #ffff99;\">Sal Khan&#8217;s video does a fairly good job as an explanation of the idea of n-1 for an Intro Stats student, but I&#8217;d like to see if we can find\/develop a better one. <\/span>For a more detailed discussion, see <a href=\"https:\/\/www.khanacademy.org\/math\/ap-statistics\/summarizing-quantitative-data-ap\/more-standard-deviation\/v\/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance\">https:\/\/www.khanacademy.org\/math\/ap-statistics\/summarizing-quantitative-data-ap\/more-standard-deviation\/v\/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance<\/a><\/li>\n<\/ul>\n<\/div>\n<p><span style=\"font-size: 1rem; orphans: 1; text-align: initial;\"><\/div>\n<\/div>\n<p><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 1em;\"><strong>The square root is taken<\/strong> in order to express the spread in terms of the units of the observations.\u00a0<\/span>Recall that we squared the differences to express them as positive distances, which resulted in squared observation units. Taking the square root can be thought of as &#8220;undoing&#8221; the earlier squaring.\u00a0For example, assume that within the context in which you are working, the data are in terms of dollars. If we do not take the square root, the standard deviation will be\u00a0in terms of dollars squared, which is not something commonly used.<\/li>\n<li><strong>The standard deviation, [latex]s[\/latex], represents the \u201ctypical\u201d distance of an observation from the mean of the dataset.<\/strong><\/li>\n<\/ul>\n<p>Don\u2019t worry. We will be using the <em>DCMP Statistical Analysis Tools<\/em> to calculate standard deviation for us!<\/p>\n<p>Let&#8217;s practice using the tool by finding the standard deviation of the variable Average Sleep in the Sleep Study dataset.<\/p>\n<div class=\"textbox tryit\">\n<h3>Use a data analysis tool to identify the standard deviation of a dataset<\/h3>\n<p><span style=\"background-color: #99cc00;\">[Worked example video &#8211; a 3-instructor video showing how to use the tool as in Questions 6 &#8211; 8 to calculate standard deviation, variance, and range with commentary on what these values imply for there being &#8220;more&#8221; or &#8220;less&#8221; variability in the data. <\/span><\/p>\n<\/div>\n<div class=\"textbox\">\n<p>Go to the <em>Describing and Exploring Quantitative Variables<\/em> tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\" target=\"_blank\" rel=\"noopener\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.<\/p>\n<p style=\"padding-left: 30px;\">Step 1) Select the <strong>Single Group<\/strong> tab.<\/p>\n<p style=\"padding-left: 30px;\">Step 2) Locate the drop-down menu under <strong>Enter Data<\/strong> and select <strong>From Textbook<\/strong>.<\/p>\n<p style=\"padding-left: 30px;\">Step 3) Locate the drop-down menu under <strong>Dataset<\/strong> and select <strong>Sleep Study: Average Sleep<\/strong>.<\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 6<\/h3>\n<p>What is the standard deviation of the average number of hours a college student in the study sleeps per week? Make sure to include units in your answer.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q805189\">Hint<\/span><\/p>\n<div id=\"q805189\" class=\"hidden-answer\" style=\"display: none\">In the tool, look for \u201cStd. Dev.\u201d in the table under Descriptive Statistics.<\/div>\n<\/div>\n<\/div>\n<h3 id=\"Variance\">Variance<\/h3>\n<p><strong>Variance<\/strong> is the standard deviation squared. We use the Greek letter [latex]\\sigma^{2}[\/latex] (sigma squared) to denote the variance of a population of observations, and we use [latex]s^{2}[\/latex] to denote the variation of a sample of observations. The following formulas are used to calculate the variation of a population and a sample:<\/p>\n<p style=\"padding-left: 30px;\">Variance of a population: [latex]\\sigma^{2}=\\dfrac{\\sum\\left(x-\\mu\\right)^{2}}{n}[\/latex]<\/p>\n<p style=\"padding-left: 30px;\">Variance of a sample: [latex]s^{2}=\\dfrac{\\sum\\left(x-\\bar{x}\\right)^{2}}{n-1}[\/latex]<\/p>\n<p><strong>Important<\/strong>: The <em>Describing and Exploring Quantitative Variables<\/em> tool does not calculate the variance, so you will need to use the tool to calculate the standard deviation and then square it by hand in order to get the variance.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 7<\/h3>\n<p>What is the variance of the average number of hours college students in the study sleep? Round to 3 decimal places. Make sure to include units in your answer.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q280916\">Hint<\/span><\/p>\n<div id=\"q280916\" class=\"hidden-answer\" style=\"display: none\">First identify the Std. Dev. using the tool, then square that value by hand. Round to 3 decimal places.<\/div>\n<\/div>\n<\/div>\n<h3 id=\"Range\">Range<\/h3>\n<p>The simplest way to calculate the variability of a dataset is with the <strong>range<\/strong>:<\/p>\n<p style=\"text-align: center;\">Range = maximum value \u2013 minimum value<\/p>\n<p style=\"text-align: center;\">or<\/p>\n<p style=\"text-align: center;\">Range = largest value \u2013 smallest value<\/p>\n<p>Larger values of range indicate more variability in the data. However, the range value only utilizes two observations in the entire dataset to measure variability. This is not an ideal measure of spread, but when used in combination with other measures of spread, it can help us gain a clearer understanding of the spread of a distribution.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 8<\/h3>\n<p>What is the range of the average number of hours a college student in the study sleeps per week?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q236844\">Hint<\/span><\/p>\n<div id=\"q236844\" class=\"hidden-answer\" style=\"display: none\">Look for \u201cMax.\u201d and \u201cMin.\u201d in the summary statistics within the Describing and Exploring Quantitative Variables tool.<\/div>\n<\/div>\n<\/div>\n<h2>Summary<\/h2>\n<p>In this section, you&#8217;ve learned about variability in a dataset in preparation for exploring data via the measures of center and spread. Let&#8217;s summarize where these skills showed up in the material.<\/p>\n<ul>\n<li>In Questions 1, 2, and 3, you visually assessed the differences in variability, given comparative histograms or dotplots.<\/li>\n<li>In Questions 4 and 5, you gained experience using the summary statistics feature of the\u00a0<em>Describing and Exploring Quantitative\u00a0Variables\u00a0<\/em>tool.<\/li>\n<li>In questions 6 &#8211; 8, you used technology to calculate measures of variability: standard deviation, variance, and range.<\/li>\n<\/ul>\n<p>Exploring the measures of center and spread to describe data is a necessary skill for completing the next activity. If you feel comfortable with these skills, it&#8217;s time to move on!<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"author":25777,"menu_order":10,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-438","chapter","type-chapter","status-publish","hentry"],"part":621,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/438","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/users\/25777"}],"version-history":[{"count":81,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/438\/revisions"}],"predecessor-version":[{"id":3311,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/438\/revisions\/3311"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/parts\/621"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/438\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/media?parent=438"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapter-type?post=438"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/contributor?post=438"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/license?post=438"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}