{"id":263,"date":"2022-02-18T23:48:54","date_gmt":"2022-02-18T23:48:54","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/?post_type=chapter&#038;p=263"},"modified":"2022-03-06T18:05:04","modified_gmt":"2022-03-06T18:05:04","slug":"applications-of-histograms-forming-connections","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/chapter\/applications-of-histograms-forming-connections\/","title":{"raw":"Applications of Histograms: Forming Connections","rendered":"Applications of Histograms: Forming Connections"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>objectives for this activity<\/h3>\r\nDuring this activity, you will:\r\n<ul>\r\n \t<li><a href=\"#SummDistr\">Summarize the description of a distribution of a quantitative variable using the shape, center, spread, and presence of outliers.<\/a><\/li>\r\n \t<li><a href=\"#ReprSpread\">Determine the appropriate representation of the spread based on the shape of the distribution and presence of outliers.<\/a><\/li>\r\n<\/ul>\r\nClick on a skill above to jump to its location in this activity.\r\n\r\n<\/div>\r\nIn the previous section, <a href=\"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/what-to-know-about-applications-of-histograms-3d\/\"><em>What to Know About Applications of Histograms: 3D<\/em><\/a>, you practiced using histograms to describe a quantitative data set. You described the shape, estimated the center and spread, and identified any\u00a0outliers in the distribution. Now it's time to use the skills you learned on a data set involving information collected from student evaluations of their classes at a Texas University.\r\n<h2>What Do Students Think?<\/h2>\r\nBefore you begin this activity, take a moment to think about a scenario in which only a low percentage (e.g., fewer than\u00a0[latex]10[\/latex]%) of students in your class completes the course evaluation at the end of the semester.\r\n\r\n<img class=\"aligncenter wp-image-971\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/11184039\/Picture111-300x185.jpg\" alt=\"A woman working on a laptop and checking her phone. On the left, there are three check boxes displayed: a frowning face, a neutral face, and a smiling face. The smiling face check box is marked with a check, and the other two are marked with an X. \" width=\"456\" height=\"281\" \/>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 1<\/h3>\r\n[ohm_question hide_question_numbers=1]240874[\/ohm_question]\r\n\r\n[reveal-answer q=\"854955\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"854955\"]What do <em>you<\/em> think? Consider whether or not the results would be different if a large percentage (e.g., more than\u00a0[latex]80[\/latex]%) of students had responded.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox tryit\">\r\n<h3>video placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[Intro: \"If you are a freshman in your first term in college, you may not have heard about course evaluations yet. These are surveys that students in a class fill out anonymously at the end of the course term to provide feedback about the course and instructor. If only a few students in the course complete the survey, it would be natural to question if those students' responses accurately represented the experience of the class in general or if they were just the students who had the strongest opinions (negative or positive) about the course. If so, the sample of responses would not be an accurate representation of the general experience. In this activity we'll use a data set of course evaluations to investigate the percentage of students who do tend to complete course evaluations to learn how statistical language can be used to describe a distribution based on its graphical display. Our descriptions will include the shaper, center, spread, and presence of any outliers in the distribution. We'll also see that we can identify a representation of the spread of a distribution based on its shape and outliers.\"]<\/span>\r\n\r\n<\/div>\r\nIn this activity, you'll see common statistical language used to describe a distribution based on what is observed from a graphical display, which you'll describe by identifying its shape, center, spread, and any outliers present. You'll also see that range (the difference between the maximum and minimum values) of a distribution that contains outliers or is skewed can be a misleading representation of spread.\r\n\r\nWe will investigate the question:\r\n<p style=\"text-align: center;\"><em>In general, what percentage of students completes course evaluations?<\/em><\/p>\r\nTo do so, we will use the <em>evals<\/em> data set[footnote]Professor evaluations and beauty. (n.d.). OpenIntro. Retrieved from https:\/\/www.openintro.org\/data\/indes.php?data=evals[\/footnote], which contains information collected from student evaluations for a sample of\u00a0[latex]463[\/latex] courses taught by\u00a0[latex]94[\/latex] professors at The University of Texas at Austin. Each row has a different course, and the columns have information about the professor and summaries from the evaluations. The first\u00a0[latex]10[\/latex] observations of the selected variables within the \u201cTeaching Evaluations\u201d data set are displayed in the following table.\r\n<div align=\"center\">\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td style=\"text-align: center;\" colspan=\"5\"><strong>Teaching Evaluations<\/strong><em>\r\n<\/em><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\"><strong><em>cls_did_eval<\/em><\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong><em>cls_perc_eval<\/em><\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong><em>age<\/em><\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong><em>cls_students<\/em><\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong><em>score<\/em><\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]24[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]55.81395[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]43[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.7[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]86[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]68.8[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]125[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.1[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]76[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]60.8[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]125[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]3.9[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]77[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]62.60163[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]123[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.8[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]17[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]85[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]59[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]20[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.6[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]35[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]87.5[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]59[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]40[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.3[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]39[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]88.63636[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]59[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]44[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]2.8[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]55[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]100[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]51[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]55[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.1[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]111[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]56.92308[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]51[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]195[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]3.4[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\">[latex]40[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]86.95652[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]40[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]46[\/latex]<\/td>\r\n<td style=\"text-align: center;\">[latex]4.5[\/latex]<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe following variables are used in this analysis:\r\n<ul>\r\n \t<li><strong><em>cls_did_eval<\/em><\/strong>: Number of students who completed evaluations<\/li>\r\n \t<li><strong><em>cls_perc_eval<\/em><\/strong>: Percentage of students who completed evaluations<\/li>\r\n \t<li><strong><em>age<\/em><\/strong>: Age of professor in years<\/li>\r\n \t<li><strong><em>cls_students<\/em><\/strong>: Total number of students in the course<\/li>\r\n \t<li><strong><em>score<\/em><\/strong>: Average professor evaluation score (1 to 5, where 1 is the lowest and 5 is the highest)<\/li>\r\n<\/ul>\r\nIf we are interested in course evaluation completion, we are naturally curious about how many students completed the evaluation.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 2<\/h3>\r\n[ohm_question hide_question_numbers=1]240875[\/ohm_question]\r\n\r\n[reveal-answer q=\"541378\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"541378\"]Recall <span style=\"background-color: #ffff00;\">[from 2A]<\/span> that a sample is representative if it tends to have the characteristics of the population from which it was drawn. Would the number of students who completed the evaluation or the percentage of students who did better help us to understand if the sample of responses accurately reflect the general class experience? [\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Examine the data<\/h3>\r\nNow, let's create a graph to visualize the distribution of the variable of interest.\r\n<div class=\"textbox\">Go to the Describing and Exploring Quantitative Variables tool at\u00a0\u00a0<a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\" target=\"_blank\" rel=\"noopener\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.\u00a0Select the\u00a0<strong>Single Group<\/strong> tab, and then the data set <strong>Teaching Evaluations \u2013 Percent Complete<\/strong> and make a <strong>Histogram<\/strong> of the distribution of <em>cls_perc_eval<\/em>, the percentage of students who completed the course evaluations. Under <strong>Select Binwidth for Histogram<\/strong>\u00a0use 6.<\/div>\r\nUse your histogram summarize the description of the distribution by answering Questions 3, 4, and 5.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 3<\/h3>\r\n[ohm_question hide_question_numbers=1]240617[\/ohm_question]\r\n\r\n[reveal-answer q=\"770342\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"770342\"]Make sure you have selected the correct data set, Teaching Evaluations - Percent Completed, and that you have selected binwidth = 6.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox tryit\">\r\n<h3>video placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[insert sub-summary: Good job using the technology without a list of instructions! I'd like to point out a feature of this histogram that could be confusing. Did you wonder why there was a bin (a bar) stretching beyond 100 if the range of student completions only went to 100%? It seems strange when you think about it. But, recall that each bin represents an interval of values, and only the left-most value is included in that interval. For example, for the bin that spans 40% to 45%, we would write that interval as [40,45) to indicate that the values including all the percentages from 40% up to through 44% are counted in that bin. In fact, we'd count 44.999% repeating in that bin, but not 45%. The next bin will pick up any value from 45% up to but not including 50%. So, you can see now that the last bin must stretch beyond 100 in order to include exactly 100%. What is the only possible completion rate that would be counted in the last bin? Would it make sense for a completion rate to be greater than 100%? As it turns out, the last bin in this case is the only one for which you'll know the exact count of a value.]<\/span>\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 4<\/h3>\r\n[ohm_question hide_question_numbers=1]240876[\/ohm_question]\r\n\r\n[reveal-answer q=\"362559\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"362559\"]Hover over the bars in your graph to see count of a bin.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 5<\/h3>\r\n[ohm_question hide_question_numbers=1]240878[\/ohm_question]\r\n\r\n[reveal-answer q=\"19751\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"19751\"]Hover over the bars in your graph to see count of a bin. Recall there were\u00a0[latex]463[\/latex] classes in total.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 6<\/h3>\r\n[ohm_question hide_question_numbers=1]240879[\/ohm_question]\r\n\r\n[reveal-answer q=\"472340\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"472340\"]Hover over the bars in your graph to see count of a bin. You can change the binwidth to make the counts easier to collect.[\/hidden-answer]\r\n\r\n<\/div>\r\nSo far, we\u2019ve been able to use the histogram to answer questions about the distribution of <em>cls_perc_eval<\/em>. The answers to these questions give us some information about the data; however, they do not give us a broad view of the overall distribution of the variable. In addition to visualizing the distribution with a graphical display, we can use common statistical language to describe the distribution.\r\n\r\nBefore diving into the details, consider why we might want to use words to describe a distribution.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 7<\/h3>\r\n[ohm_question hide_question_numbers=1]240880[\/ohm_question]\r\n\r\n[reveal-answer q=\"868765\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"868765\"]How could a verbal or written description be of help? [\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Describe the distribution<\/h3>\r\nNow, in Questions 8 - 12, use statistical terms to describe the distributions of the variable\u00a0<em>cls_perc_eval<\/em><em>. <\/em>If necessary, refer to the\u00a0<a href=\"#Reference: Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of this activity for details about how to describe a distribution.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 8<\/h3>\r\n[ohm_question hide_question_numbers=1]240881[\/ohm_question]\r\n[reveal-answer q=\"720276\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"720276\"]refer to the <a href=\"#Describing Distributions\">Describing Distributions<\/a> section at the end of this activity for details about how to describe the shape of a distribution. Note that the\u00a0description of shape includes two parts: the overall pattern and the number of peaks. [\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 9<\/h3>\r\n[ohm_question hide_question_numbers=1]240882[\/ohm_question]\r\n\r\n[reveal-answer q=\"82005\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"82005\"]Use the appearance of the graph and bar counts to estimate the center.[\/hidden-answer]\r\n\r\n<\/div>\r\nRecall that the <strong>spread<\/strong> is a measure of how much the values in a data set tend to differ from one another. One way we can describe the spread is by finding the minimum and maximum values in the data and calculating the difference between them. This difference is called the <strong>range<\/strong>.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 10<\/h3>\r\n[ohm_question hide_question_numbers=1]240883[\/ohm_question]\r\n\r\n[reveal-answer q=\"987802\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"987802\"]Recall that range is the difference between the minimum and maximum values in the data as displayed in the data analysis tool. Don't forget to include the units. [\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 11<\/h3>\r\n[ohm_question hide_question_numbers=1]240884[\/ohm_question]\r\n\r\n[reveal-answer q=\"59436\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"59436\"]Consider the size of the interval containing most the data in comparison to size of the range.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 12<\/h3>\r\n[ohm_question hide_question_numbers=1]240887[\/ohm_question]\r\n\r\n[reveal-answer q=\"941043\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"941043\"]We'll define <em>outlier<\/em> more carefully later but we understand it for now to be an unusual observation.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox tryit\">\r\n<h3>video placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[sub-summary: \"You've used all the features of a quantitative display to describe the distribution of the percentage of students who completed the evaluation.\" [voice over the distribution with a \"pointer\" to follow along this part --&gt;] \"You saw that the distribution was unimodal and left skewed. You can see the longer tail of smaller counts out to the left and the data sort of bunched up to the right. It looks like the center lies somewhere between about 75% and 80%. You noted that the range could be misleading because, while the range covers 90%, most of the data occur within the right-most 50% of the range. Finally, you were able to identify one outlier by the bin count of 1 in the left-most bin. There seems to be a course with a completion rate between 10% and 16%. In the next part of the activity, try describing the remaining quantitative variables in the data set on your own. You'll need to use the tool to create the distributions, then describe them by answering the questions below.\"]<\/span>\r\n\r\n<\/div>\r\n<h3 id=\"SummDistr\">Summarize the description of a distribution<\/h3>\r\nGood work. You've thoroughly described the variable <em>cls_perc_eval<\/em> using statistical language: shape, center, spread (range) and outliers.\u00a0Now it's your turn to try it on your own. Use the features you described in Questions 8 - 12 to describe the distribution of each of the following variables:\r\n<ul>\r\n \t<li><strong><em>age<\/em><\/strong>: Age of professor in years\r\n<ul>\r\n \t<li>Data Set = \u201cTeaching Evaluations \u2013 Age\u201d<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>cls_students<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: Total number of students in the course\u00a0\u00a0<\/span>\r\n<ul>\r\n \t<li><span style=\"font-size: 1rem; text-align: initial;\">Data Set = \u201cTeaching Evaluations \u2013 Students\u201d<\/span><\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>score<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: Average professor evaluation score (1 to 5, where 1 is the lowest)<\/span>\r\n<ul>\r\n \t<li><span style=\"font-size: 1rem; text-align: initial;\">Data Set = \u201cTeaching Evaluations \u2013 Scores\u201d<\/span><\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<div class=\"textbox\">\r\n\r\nFor each of the variables <em><strong>age<\/strong><\/em>,<strong><em> cls_students<\/em><\/strong>, and<strong><em> score<\/em><\/strong>,\r\n<ol>\r\n \t<li>Use the appropriate data tool to make a histogram of the distribution using the following binwidths:\r\n<ul>\r\n \t<li><strong><em>age<\/em><\/strong>: 5<\/li>\r\n \t<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>cls_students<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: 50<\/span><\/li>\r\n \t<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>score<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: 0.2<\/span><\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>Describe the distribution, including the shape, center, spread, and presence of outliers, using words.<\/li>\r\n<\/ol>\r\n<\/div>\r\nRecord your results in Questions 13 - 19 below.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 13<\/h3>\r\n[ohm_question hide_question_numbers=1]240618[\/ohm_question]\r\n\r\n[reveal-answer q=\"236266\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"236266\"]Make sure that you have selected the correct data set, Teaching Evaluations - Age. [\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 14<\/h3>\r\n[ohm_question hide_question_numbers=1]240888[\/ohm_question]\r\n\r\n[reveal-answer q=\"94012\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"94012\"]See the <a href=\"#Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of the activity for details about how to describe a distribution.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 15<\/h3>\r\n[ohm_question hide_question_numbers=1]240619[\/ohm_question]\r\n\r\n[reveal-answer q=\"720889\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"720889\"]Make sure that you have selected the correct data set, Teaching Evaluations - Students. [\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 16<\/h3>\r\n[ohm_question hide_question_numbers=1]240889[\/ohm_question]\r\n\r\n[reveal-answer q=\"505351\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"505351\"]See the <a href=\"#Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of the activity for details about how to describe a distribution.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 17<\/h3>\r\n[ohm_question hide_question_numbers=1]240890[\/ohm_question]\r\n\r\n[reveal-answer q=\"9476\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"9476\"]Make sure that you have selected the correct data set, Teaching Evaluations - Score.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 18<\/h3>\r\n[ohm_question hide_question_numbers=1]240891[\/ohm_question]\r\n\r\n[reveal-answer q=\"30301\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"30301\"]See the <a href=\"#Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of the activity for details about how to describe a distribution.[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3 id=\"ReprSpread\">Determine the appropriate representation of the spread of a distribution<\/h3>\r\n<div>We've seen that sometimes the range of a distribution can be a misleading representation of the spread. For example, recall your description of the variable\u00a0<em>cls_perc_eval\u00a0<\/em>earlier in this activity. The range of the distribution of that variable covered\u00a0[latex]90[\/latex]% of the horizonal axis, but the data were mostly bunched up within the highest 50%. Including this information in the description of a distribution is helpful for understanding whether the range is an appropriate representation of the spread of a distribution.<\/div>\r\n<div><\/div>\r\n<div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 19<\/h3>\r\n[ohm_question hide_question_numbers=1]240892[\/ohm_question]\r\n\r\n[reveal-answer q=\"666124\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"666124\"]Look for distributions in which the ranger covers a substantially wider portion of the graph than most of the data.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div><\/div>\r\n<div><\/div>\r\n<\/div>\r\n<div class=\"textbox tryit\">\r\n<h3>video placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[Wrap-up: You've had the chance to describe several differently shaped distributions in this activity using the statistical language of shape, center, spread, and the presence of outliers. Some of them were harder to describe than others, especially when it came to spread, which we described in this activity using the range. We'll see later that there are other measures of spread we can use as well. Let's recap the distributions of the other variables we looked at today. [voice over images of the distributions, one-by-one]. You may have found the variable \"age\" hard to describe. Even though it looks unusual, we would call this unimodal and roughly symmetric with the center at about 50 years. To find the shape, we just want to roughly draw a pen along the overall shape without paying too much attention to little bumps and dips along the way. The values range from about 29 to 73, with a range of about 44. There are some outliers above 70. Removing them would drop the range down. [course-size next] The distribution of course size is unimodal and right skewed. We see the center around 25 and a very wide range, almost 600, but that includes outliers between 500 and 600. In fact there are only a few courses with enrollment larger than 200, so the spread of most of the data is about 200. [average eval score next] Lastly, the distribution for average evaluation score is unimodal and left skewed with a center around 4.25. The range is about 2.75 (between 2.25 and 5). We could consider the few scores at the far left to be outliers. Hopefully you feel comfortable describing distributions using shape, center, spread, and the presence of outliers. And you should have a good idea now of when range can be used to appropriately describe the spread, and when you should make a note that the range could be misleading.\"]<\/span>\r\n\r\n<\/div>\r\n<span style=\"color: #077fab; font-size: 1.15em; font-weight: 600;\">Reference: Describing Distributions<\/span>\r\n\r\nThe features used to describe the distribution of a quantitative variable are the shape, center, spread, and presence of outliers.\r\n\r\n<strong>Shape<\/strong>: The overall pattern (left skewed, right skewed, symmetric) and the number of peaks (unimodal, bimodal, multimodal, uniform).\r\n\r\n<strong>Center<\/strong>: A measure that describes where the middle of the distribution is. The center is a number that describes a typical value. For example, one way to think about center is that it could be the point in the distribution where about half of the observations are below it and half are above it.\r\n\r\n<strong>Spread<\/strong>: A measure of how far apart the data are. In this lesson, the range is used to measure spread. The <strong>range<\/strong> is the difference between the maximum value and minimum value.\r\n\r\n<strong>Outliers<\/strong>: Unusual observations that are outside the general pattern of the distribution.\r\n\r\nThe description of shape includes two parts: (1) the overall pattern (left skewed, right skewed, symmetric) and (2) the number of peaks (unimodal, bimodal, multimodal, uniform).\r\n\r\nThe overall pattern can be described as one of the following:\r\n\r\n<strong>Symmetric<\/strong>: The left and right sides of the distribution (closely) mirror each other. If you drew a vertical line down the center of the distribution and folded the distribution in half, the left and right sides would closely match one another.\r\n\r\n<strong>Left skewed<\/strong>: The distribution has a longer tail to the left.\r\n\r\n<strong>Right skewed<\/strong>: The distribution has a longer tail to the right.\r\n\r\nIn addition to the overall pattern, the description of shape also includes the number of peaks. This is also known as the <strong>modality<\/strong>. The modality can be described as one of the following:\r\n\r\n<strong>Unimodal<\/strong>: There is one prominent peak.\r\n\r\n<strong>Bimodal<\/strong>: There are two prominent peaks.\r\n\r\n<strong>Multimodal<\/strong>: There are three or more prominent peaks.\r\n\r\n<strong>Uniform<\/strong>: There are no prominent peaks.\r\n\r\nThe next feature is the <strong>center<\/strong>. For now, we can use the histogram to get an approximate value of the center. (In a later activity, you will learn statistics used to describe the center more precisely.)\r\n\r\nWhen describing the <strong>spread<\/strong> of a distribution that is left skewed, right skewed, or has outliers, it can be misleading to only rely on the range to measure spread, since it is influenced by skewness and outliers. In this case, the range may make the spread appear to be larger than it is for a vast majority of the data.\r\n\r\nIf this is the case, in addition to reporting the range, you can include additional information about the spread of most of the data as well. This will give the reader a more accurate and complete picture of the true spread of the data. For example, in addition to reporting the range for the distribution of <em>cls_perc_eval<\/em>, we can also include information that most of the data are between about\u00a0[latex]50[\/latex]% and\u00a0[latex]100[\/latex]%, or within\u00a0[latex]50[\/latex]%. (In later activities, you will learn additional statistics to describe the typical spread of the data.)\r\n\r\nThe last feature in the description is the presence of <strong>outliers<\/strong>. Outliers are observations in the data that are unusual and outside the general pattern of the rest of the observations in the distribution. When working with a univariate distribution for a quantitative variable, an outlier is an observation that has an unusually high or unusually low value. It is good practice to make note of outliers, as these observations can sometimes influence the statistical results (e.g., the range).","rendered":"<div class=\"textbox learning-objectives\">\n<h3>objectives for this activity<\/h3>\n<p>During this activity, you will:<\/p>\n<ul>\n<li><a href=\"#SummDistr\">Summarize the description of a distribution of a quantitative variable using the shape, center, spread, and presence of outliers.<\/a><\/li>\n<li><a href=\"#ReprSpread\">Determine the appropriate representation of the spread based on the shape of the distribution and presence of outliers.<\/a><\/li>\n<\/ul>\n<p>Click on a skill above to jump to its location in this activity.<\/p>\n<\/div>\n<p>In the previous section, <a href=\"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/what-to-know-about-applications-of-histograms-3d\/\"><em>What to Know About Applications of Histograms: 3D<\/em><\/a>, you practiced using histograms to describe a quantitative data set. You described the shape, estimated the center and spread, and identified any\u00a0outliers in the distribution. Now it&#8217;s time to use the skills you learned on a data set involving information collected from student evaluations of their classes at a Texas University.<\/p>\n<h2>What Do Students Think?<\/h2>\n<p>Before you begin this activity, take a moment to think about a scenario in which only a low percentage (e.g., fewer than\u00a0[latex]10[\/latex]%) of students in your class completes the course evaluation at the end of the semester.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-971\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/11184039\/Picture111-300x185.jpg\" alt=\"A woman working on a laptop and checking her phone. On the left, there are three check boxes displayed: a frowning face, a neutral face, and a smiling face. The smiling face check box is marked with a check, and the other two are marked with an X.\" width=\"456\" height=\"281\" \/><\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 1<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240874\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240874&theme=oea&iframe_resize_id=ohm240874\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q854955\">Hint<\/span><\/p>\n<div id=\"q854955\" class=\"hidden-answer\" style=\"display: none\">What do <em>you<\/em> think? Consider whether or not the results would be different if a large percentage (e.g., more than\u00a0[latex]80[\/latex]%) of students had responded.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox tryit\">\n<h3>video placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[Intro: &#8220;If you are a freshman in your first term in college, you may not have heard about course evaluations yet. These are surveys that students in a class fill out anonymously at the end of the course term to provide feedback about the course and instructor. If only a few students in the course complete the survey, it would be natural to question if those students&#8217; responses accurately represented the experience of the class in general or if they were just the students who had the strongest opinions (negative or positive) about the course. If so, the sample of responses would not be an accurate representation of the general experience. In this activity we&#8217;ll use a data set of course evaluations to investigate the percentage of students who do tend to complete course evaluations to learn how statistical language can be used to describe a distribution based on its graphical display. Our descriptions will include the shaper, center, spread, and presence of any outliers in the distribution. We&#8217;ll also see that we can identify a representation of the spread of a distribution based on its shape and outliers.&#8221;]<\/span><\/p>\n<\/div>\n<p>In this activity, you&#8217;ll see common statistical language used to describe a distribution based on what is observed from a graphical display, which you&#8217;ll describe by identifying its shape, center, spread, and any outliers present. You&#8217;ll also see that range (the difference between the maximum and minimum values) of a distribution that contains outliers or is skewed can be a misleading representation of spread.<\/p>\n<p>We will investigate the question:<\/p>\n<p style=\"text-align: center;\"><em>In general, what percentage of students completes course evaluations?<\/em><\/p>\n<p>To do so, we will use the <em>evals<\/em> data set<a class=\"footnote\" title=\"Professor evaluations and beauty. (n.d.). OpenIntro. Retrieved from https:\/\/www.openintro.org\/data\/indes.php?data=evals\" id=\"return-footnote-263-1\" href=\"#footnote-263-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a>, which contains information collected from student evaluations for a sample of\u00a0[latex]463[\/latex] courses taught by\u00a0[latex]94[\/latex] professors at The University of Texas at Austin. Each row has a different course, and the columns have information about the professor and summaries from the evaluations. The first\u00a0[latex]10[\/latex] observations of the selected variables within the \u201cTeaching Evaluations\u201d data set are displayed in the following table.<\/p>\n<div style=\"margin: auto;\">\n<table>\n<tbody>\n<tr>\n<td style=\"text-align: center;\" colspan=\"5\"><strong>Teaching Evaluations<\/strong><em><br \/>\n<\/em><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\"><strong><em>cls_did_eval<\/em><\/strong><\/td>\n<td style=\"text-align: center;\"><strong><em>cls_perc_eval<\/em><\/strong><\/td>\n<td style=\"text-align: center;\"><strong><em>age<\/em><\/strong><\/td>\n<td style=\"text-align: center;\"><strong><em>cls_students<\/em><\/strong><\/td>\n<td style=\"text-align: center;\"><strong><em>score<\/em><\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]24[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]55.81395[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]43[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.7[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]86[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]68.8[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]125[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.1[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]76[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]60.8[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]125[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]3.9[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]77[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]62.60163[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]36[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]123[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.8[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]17[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]85[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]59[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]20[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.6[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]35[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]87.5[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]59[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]40[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.3[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]39[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]88.63636[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]59[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]44[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]2.8[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]55[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]100[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]51[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]55[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.1[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]111[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]56.92308[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]51[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]195[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]3.4[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]40[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]86.95652[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]40[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]46[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]4.5[\/latex]<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The following variables are used in this analysis:<\/p>\n<ul>\n<li><strong><em>cls_did_eval<\/em><\/strong>: Number of students who completed evaluations<\/li>\n<li><strong><em>cls_perc_eval<\/em><\/strong>: Percentage of students who completed evaluations<\/li>\n<li><strong><em>age<\/em><\/strong>: Age of professor in years<\/li>\n<li><strong><em>cls_students<\/em><\/strong>: Total number of students in the course<\/li>\n<li><strong><em>score<\/em><\/strong>: Average professor evaluation score (1 to 5, where 1 is the lowest and 5 is the highest)<\/li>\n<\/ul>\n<p>If we are interested in course evaluation completion, we are naturally curious about how many students completed the evaluation.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 2<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240875\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240875&theme=oea&iframe_resize_id=ohm240875\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q541378\">Hint<\/span><\/p>\n<div id=\"q541378\" class=\"hidden-answer\" style=\"display: none\">Recall <span style=\"background-color: #ffff00;\">[from 2A]<\/span> that a sample is representative if it tends to have the characteristics of the population from which it was drawn. Would the number of students who completed the evaluation or the percentage of students who did better help us to understand if the sample of responses accurately reflect the general class experience? <\/div>\n<\/div>\n<\/div>\n<h3>Examine the data<\/h3>\n<p>Now, let&#8217;s create a graph to visualize the distribution of the variable of interest.<\/p>\n<div class=\"textbox\">Go to the Describing and Exploring Quantitative Variables tool at\u00a0\u00a0<a href=\"https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/\" target=\"_blank\" rel=\"noopener\">https:\/\/dcmathpathways.shinyapps.io\/EDA_quantitative\/<\/a>.\u00a0Select the\u00a0<strong>Single Group<\/strong> tab, and then the data set <strong>Teaching Evaluations \u2013 Percent Complete<\/strong> and make a <strong>Histogram<\/strong> of the distribution of <em>cls_perc_eval<\/em>, the percentage of students who completed the course evaluations. Under <strong>Select Binwidth for Histogram<\/strong>\u00a0use 6.<\/div>\n<p>Use your histogram summarize the description of the distribution by answering Questions 3, 4, and 5.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 3<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240617\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240617&theme=oea&iframe_resize_id=ohm240617\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q770342\">Hint<\/span><\/p>\n<div id=\"q770342\" class=\"hidden-answer\" style=\"display: none\">Make sure you have selected the correct data set, Teaching Evaluations &#8211; Percent Completed, and that you have selected binwidth = 6.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox tryit\">\n<h3>video placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[insert sub-summary: Good job using the technology without a list of instructions! I&#8217;d like to point out a feature of this histogram that could be confusing. Did you wonder why there was a bin (a bar) stretching beyond 100 if the range of student completions only went to 100%? It seems strange when you think about it. But, recall that each bin represents an interval of values, and only the left-most value is included in that interval. For example, for the bin that spans 40% to 45%, we would write that interval as [40,45) to indicate that the values including all the percentages from 40% up to through 44% are counted in that bin. In fact, we&#8217;d count 44.999% repeating in that bin, but not 45%. The next bin will pick up any value from 45% up to but not including 50%. So, you can see now that the last bin must stretch beyond 100 in order to include exactly 100%. What is the only possible completion rate that would be counted in the last bin? Would it make sense for a completion rate to be greater than 100%? As it turns out, the last bin in this case is the only one for which you&#8217;ll know the exact count of a value.]<\/span><\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 4<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240876\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240876&theme=oea&iframe_resize_id=ohm240876\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q362559\">Hint<\/span><\/p>\n<div id=\"q362559\" class=\"hidden-answer\" style=\"display: none\">Hover over the bars in your graph to see count of a bin.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 5<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240878\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240878&theme=oea&iframe_resize_id=ohm240878\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q19751\">Hint<\/span><\/p>\n<div id=\"q19751\" class=\"hidden-answer\" style=\"display: none\">Hover over the bars in your graph to see count of a bin. Recall there were\u00a0[latex]463[\/latex] classes in total.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 6<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240879\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240879&theme=oea&iframe_resize_id=ohm240879\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q472340\">Hint<\/span><\/p>\n<div id=\"q472340\" class=\"hidden-answer\" style=\"display: none\">Hover over the bars in your graph to see count of a bin. You can change the binwidth to make the counts easier to collect.<\/div>\n<\/div>\n<\/div>\n<p>So far, we\u2019ve been able to use the histogram to answer questions about the distribution of <em>cls_perc_eval<\/em>. The answers to these questions give us some information about the data; however, they do not give us a broad view of the overall distribution of the variable. In addition to visualizing the distribution with a graphical display, we can use common statistical language to describe the distribution.<\/p>\n<p>Before diving into the details, consider why we might want to use words to describe a distribution.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 7<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240880\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240880&theme=oea&iframe_resize_id=ohm240880\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q868765\">Hint<\/span><\/p>\n<div id=\"q868765\" class=\"hidden-answer\" style=\"display: none\">How could a verbal or written description be of help? <\/div>\n<\/div>\n<\/div>\n<h3>Describe the distribution<\/h3>\n<p>Now, in Questions 8 &#8211; 12, use statistical terms to describe the distributions of the variable\u00a0<em>cls_perc_eval<\/em><em>. <\/em>If necessary, refer to the\u00a0<a href=\"#Reference: Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of this activity for details about how to describe a distribution.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 8<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240881\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240881&theme=oea&iframe_resize_id=ohm240881\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q720276\">Hint<\/span><\/p>\n<div id=\"q720276\" class=\"hidden-answer\" style=\"display: none\">refer to the <a href=\"#Describing Distributions\">Describing Distributions<\/a> section at the end of this activity for details about how to describe the shape of a distribution. Note that the\u00a0description of shape includes two parts: the overall pattern and the number of peaks. <\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 9<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240882\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240882&theme=oea&iframe_resize_id=ohm240882\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q82005\">Hint<\/span><\/p>\n<div id=\"q82005\" class=\"hidden-answer\" style=\"display: none\">Use the appearance of the graph and bar counts to estimate the center.<\/div>\n<\/div>\n<\/div>\n<p>Recall that the <strong>spread<\/strong> is a measure of how much the values in a data set tend to differ from one another. One way we can describe the spread is by finding the minimum and maximum values in the data and calculating the difference between them. This difference is called the <strong>range<\/strong>.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 10<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240883\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240883&theme=oea&iframe_resize_id=ohm240883\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q987802\">Hint<\/span><\/p>\n<div id=\"q987802\" class=\"hidden-answer\" style=\"display: none\">Recall that range is the difference between the minimum and maximum values in the data as displayed in the data analysis tool. Don&#8217;t forget to include the units. <\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 11<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240884\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240884&theme=oea&iframe_resize_id=ohm240884\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q59436\">Hint<\/span><\/p>\n<div id=\"q59436\" class=\"hidden-answer\" style=\"display: none\">Consider the size of the interval containing most the data in comparison to size of the range.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 12<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240887\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240887&theme=oea&iframe_resize_id=ohm240887\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q941043\">Hint<\/span><\/p>\n<div id=\"q941043\" class=\"hidden-answer\" style=\"display: none\">We&#8217;ll define <em>outlier<\/em> more carefully later but we understand it for now to be an unusual observation.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox tryit\">\n<h3>video placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[sub-summary: &#8220;You&#8217;ve used all the features of a quantitative display to describe the distribution of the percentage of students who completed the evaluation.&#8221; [voice over the distribution with a &#8220;pointer&#8221; to follow along this part &#8211;&gt;] &#8220;You saw that the distribution was unimodal and left skewed. You can see the longer tail of smaller counts out to the left and the data sort of bunched up to the right. It looks like the center lies somewhere between about 75% and 80%. You noted that the range could be misleading because, while the range covers 90%, most of the data occur within the right-most 50% of the range. Finally, you were able to identify one outlier by the bin count of 1 in the left-most bin. There seems to be a course with a completion rate between 10% and 16%. In the next part of the activity, try describing the remaining quantitative variables in the data set on your own. You&#8217;ll need to use the tool to create the distributions, then describe them by answering the questions below.&#8221;]<\/span><\/p>\n<\/div>\n<h3 id=\"SummDistr\">Summarize the description of a distribution<\/h3>\n<p>Good work. You&#8217;ve thoroughly described the variable <em>cls_perc_eval<\/em> using statistical language: shape, center, spread (range) and outliers.\u00a0Now it&#8217;s your turn to try it on your own. Use the features you described in Questions 8 &#8211; 12 to describe the distribution of each of the following variables:<\/p>\n<ul>\n<li><strong><em>age<\/em><\/strong>: Age of professor in years\n<ul>\n<li>Data Set = \u201cTeaching Evaluations \u2013 Age\u201d<\/li>\n<\/ul>\n<\/li>\n<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>cls_students<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: Total number of students in the course\u00a0\u00a0<\/span>\n<ul>\n<li><span style=\"font-size: 1rem; text-align: initial;\">Data Set = \u201cTeaching Evaluations \u2013 Students\u201d<\/span><\/li>\n<\/ul>\n<\/li>\n<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>score<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: Average professor evaluation score (1 to 5, where 1 is the lowest)<\/span>\n<ul>\n<li><span style=\"font-size: 1rem; text-align: initial;\">Data Set = \u201cTeaching Evaluations \u2013 Scores\u201d<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<div class=\"textbox\">\n<p>For each of the variables <em><strong>age<\/strong><\/em>,<strong><em> cls_students<\/em><\/strong>, and<strong><em> score<\/em><\/strong>,<\/p>\n<ol>\n<li>Use the appropriate data tool to make a histogram of the distribution using the following binwidths:\n<ul>\n<li><strong><em>age<\/em><\/strong>: 5<\/li>\n<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>cls_students<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: 50<\/span><\/li>\n<li><strong style=\"font-size: 1rem; text-align: initial;\"><em>score<\/em><\/strong><span style=\"font-size: 1rem; text-align: initial;\">: 0.2<\/span><\/li>\n<\/ul>\n<\/li>\n<li>Describe the distribution, including the shape, center, spread, and presence of outliers, using words.<\/li>\n<\/ol>\n<\/div>\n<p>Record your results in Questions 13 &#8211; 19 below.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 13<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240618\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240618&theme=oea&iframe_resize_id=ohm240618\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q236266\">Hint<\/span><\/p>\n<div id=\"q236266\" class=\"hidden-answer\" style=\"display: none\">Make sure that you have selected the correct data set, Teaching Evaluations &#8211; Age. <\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 14<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240888\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240888&theme=oea&iframe_resize_id=ohm240888\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q94012\">Hint<\/span><\/p>\n<div id=\"q94012\" class=\"hidden-answer\" style=\"display: none\">See the <a href=\"#Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of the activity for details about how to describe a distribution.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 15<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240619\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240619&theme=oea&iframe_resize_id=ohm240619\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q720889\">Hint<\/span><\/p>\n<div id=\"q720889\" class=\"hidden-answer\" style=\"display: none\">Make sure that you have selected the correct data set, Teaching Evaluations &#8211; Students. <\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 16<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240889\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240889&theme=oea&iframe_resize_id=ohm240889\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q505351\">Hint<\/span><\/p>\n<div id=\"q505351\" class=\"hidden-answer\" style=\"display: none\">See the <a href=\"#Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of the activity for details about how to describe a distribution.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 17<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240890\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240890&theme=oea&iframe_resize_id=ohm240890\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q9476\">Hint<\/span><\/p>\n<div id=\"q9476\" class=\"hidden-answer\" style=\"display: none\">Make sure that you have selected the correct data set, Teaching Evaluations &#8211; Score.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 18<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240891\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240891&theme=oea&iframe_resize_id=ohm240891\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q30301\">Hint<\/span><\/p>\n<div id=\"q30301\" class=\"hidden-answer\" style=\"display: none\">See the <a href=\"#Describing Distributions\">Describing Distributions<\/a>\u00a0section at the end of the activity for details about how to describe a distribution.<\/div>\n<\/div>\n<\/div>\n<h3 id=\"ReprSpread\">Determine the appropriate representation of the spread of a distribution<\/h3>\n<div>We&#8217;ve seen that sometimes the range of a distribution can be a misleading representation of the spread. For example, recall your description of the variable\u00a0<em>cls_perc_eval\u00a0<\/em>earlier in this activity. The range of the distribution of that variable covered\u00a0[latex]90[\/latex]% of the horizonal axis, but the data were mostly bunched up within the highest 50%. Including this information in the description of a distribution is helpful for understanding whether the range is an appropriate representation of the spread of a distribution.<\/div>\n<div><\/div>\n<div>\n<div class=\"textbox key-takeaways\">\n<h3>question 19<\/h3>\n<p><iframe loading=\"lazy\" id=\"ohm240892\" class=\"resizable\" src=\"https:\/\/ohm.lumenlearning.com\/multiembedq.php?id=240892&theme=oea&iframe_resize_id=ohm240892\" width=\"100%\" height=\"150\"><\/iframe><\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q666124\">Hint<\/span><\/p>\n<div id=\"q666124\" class=\"hidden-answer\" style=\"display: none\">Look for distributions in which the ranger covers a substantially wider portion of the graph than most of the data.<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div><\/div>\n<\/div>\n<div class=\"textbox tryit\">\n<h3>video placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[Wrap-up: You&#8217;ve had the chance to describe several differently shaped distributions in this activity using the statistical language of shape, center, spread, and the presence of outliers. Some of them were harder to describe than others, especially when it came to spread, which we described in this activity using the range. We&#8217;ll see later that there are other measures of spread we can use as well. Let&#8217;s recap the distributions of the other variables we looked at today. [voice over images of the distributions, one-by-one]. You may have found the variable &#8220;age&#8221; hard to describe. Even though it looks unusual, we would call this unimodal and roughly symmetric with the center at about 50 years. To find the shape, we just want to roughly draw a pen along the overall shape without paying too much attention to little bumps and dips along the way. The values range from about 29 to 73, with a range of about 44. There are some outliers above 70. Removing them would drop the range down. [course-size next] The distribution of course size is unimodal and right skewed. We see the center around 25 and a very wide range, almost 600, but that includes outliers between 500 and 600. In fact there are only a few courses with enrollment larger than 200, so the spread of most of the data is about 200. [average eval score next] Lastly, the distribution for average evaluation score is unimodal and left skewed with a center around 4.25. The range is about 2.75 (between 2.25 and 5). We could consider the few scores at the far left to be outliers. Hopefully you feel comfortable describing distributions using shape, center, spread, and the presence of outliers. And you should have a good idea now of when range can be used to appropriately describe the spread, and when you should make a note that the range could be misleading.&#8221;]<\/span><\/p>\n<\/div>\n<p><span style=\"color: #077fab; font-size: 1.15em; font-weight: 600;\">Reference: Describing Distributions<\/span><\/p>\n<p>The features used to describe the distribution of a quantitative variable are the shape, center, spread, and presence of outliers.<\/p>\n<p><strong>Shape<\/strong>: The overall pattern (left skewed, right skewed, symmetric) and the number of peaks (unimodal, bimodal, multimodal, uniform).<\/p>\n<p><strong>Center<\/strong>: A measure that describes where the middle of the distribution is. The center is a number that describes a typical value. For example, one way to think about center is that it could be the point in the distribution where about half of the observations are below it and half are above it.<\/p>\n<p><strong>Spread<\/strong>: A measure of how far apart the data are. In this lesson, the range is used to measure spread. The <strong>range<\/strong> is the difference between the maximum value and minimum value.<\/p>\n<p><strong>Outliers<\/strong>: Unusual observations that are outside the general pattern of the distribution.<\/p>\n<p>The description of shape includes two parts: (1) the overall pattern (left skewed, right skewed, symmetric) and (2) the number of peaks (unimodal, bimodal, multimodal, uniform).<\/p>\n<p>The overall pattern can be described as one of the following:<\/p>\n<p><strong>Symmetric<\/strong>: The left and right sides of the distribution (closely) mirror each other. If you drew a vertical line down the center of the distribution and folded the distribution in half, the left and right sides would closely match one another.<\/p>\n<p><strong>Left skewed<\/strong>: The distribution has a longer tail to the left.<\/p>\n<p><strong>Right skewed<\/strong>: The distribution has a longer tail to the right.<\/p>\n<p>In addition to the overall pattern, the description of shape also includes the number of peaks. This is also known as the <strong>modality<\/strong>. The modality can be described as one of the following:<\/p>\n<p><strong>Unimodal<\/strong>: There is one prominent peak.<\/p>\n<p><strong>Bimodal<\/strong>: There are two prominent peaks.<\/p>\n<p><strong>Multimodal<\/strong>: There are three or more prominent peaks.<\/p>\n<p><strong>Uniform<\/strong>: There are no prominent peaks.<\/p>\n<p>The next feature is the <strong>center<\/strong>. For now, we can use the histogram to get an approximate value of the center. (In a later activity, you will learn statistics used to describe the center more precisely.)<\/p>\n<p>When describing the <strong>spread<\/strong> of a distribution that is left skewed, right skewed, or has outliers, it can be misleading to only rely on the range to measure spread, since it is influenced by skewness and outliers. In this case, the range may make the spread appear to be larger than it is for a vast majority of the data.<\/p>\n<p>If this is the case, in addition to reporting the range, you can include additional information about the spread of most of the data as well. This will give the reader a more accurate and complete picture of the true spread of the data. For example, in addition to reporting the range for the distribution of <em>cls_perc_eval<\/em>, we can also include information that most of the data are between about\u00a0[latex]50[\/latex]% and\u00a0[latex]100[\/latex]%, or within\u00a0[latex]50[\/latex]%. (In later activities, you will learn additional statistics to describe the typical spread of the data.)<\/p>\n<p>The last feature in the description is the presence of <strong>outliers<\/strong>. Outliers are observations in the data that are unusual and outside the general pattern of the rest of the observations in the distribution. When working with a univariate distribution for a quantitative variable, an outlier is an observation that has an unusually high or unusually low value. It is good practice to make note of outliers, as these observations can sometimes influence the statistical results (e.g., the range).<\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-263-1\">Professor evaluations and beauty. (n.d.). OpenIntro. Retrieved from https:\/\/www.openintro.org\/data\/indes.php?data=evals <a href=\"#return-footnote-263-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":175116,"menu_order":20,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-263","chapter","type-chapter","status-publish","hentry"],"part":3,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/chapters\/263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/wp\/v2\/users\/175116"}],"version-history":[{"count":9,"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/chapters\/263\/revisions"}],"predecessor-version":[{"id":807,"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/chapters\/263\/revisions\/807"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/parts\/3"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/chapters\/263\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/wp\/v2\/media?parent=263"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/pressbooks\/v2\/chapter-type?post=263"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/wp\/v2\/contributor?post=263"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/exemplarstatistics\/wp-json\/wp\/v2\/license?post=263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}