{"id":85,"date":"2017-05-11T17:06:40","date_gmt":"2017-05-11T17:06:40","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/chapter\/chapter-1-descriptive-statistics-and-the-normal-distribution\/"},"modified":"2017-05-11T18:17:47","modified_gmt":"2017-05-11T18:17:47","slug":"chapter-1-descriptive-statistics-and-the-normal-distribution","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/chapter\/chapter-1-descriptive-statistics-and-the-normal-distribution\/","title":{"raw":"Chapter 1: Descriptive Statistics and the Normal Distribution","rendered":"Chapter 1: Descriptive Statistics and the Normal Distribution"},"content":{"raw":"<div class=\"Basic-Text-Frame\">\r\n\r\nStatistics has become the universal language of the sciences, and data analysis can lead to powerful results. As scientists, researchers, and managers working in the natural resources sector, we all rely on statistical analysis to help us answer the questions that arise in the populations we manage. For example:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">Has there been a significant change in the mean sawtimber volume in the red pine stands?<\/li>\r\n \t<li class=\"List-Paragraph\">Has there been an increase in the number of invasive species found in the Great Lakes?<\/li>\r\n \t<li class=\"List-Paragraph\">What proportion of white tail deer in New Hampshire have weights below the limit considered healthy?<\/li>\r\n \t<li class=\"List-Paragraph\">Did fertilizer A, B, or C have an effect on the corn yield?<\/li>\r\n<\/ul>\r\nThese are typical questions that require statistical analysis for the answers. In order to answer these questions, a good random sample must be collected from the population of interests. We then use descriptive statistics to organize and summarize our sample data. The next step is inferential statistics, which allows us to use our sample statistics and extend the results to the population, while measuring the reliability of the result. But before we begin exploring different types of statistical methods, a brief review of descriptive statistics is needed.\r\n<p class=\"Callout\"><span class=\"pullquote-left\">Statistics is the science of collecting, organizing, summarizing, analyzing, and interpreting information.<\/span><\/p>\r\nGood statistics come from good samples, and are used to draw conclusions or answer questions about a population. We use sample statistics to estimate population parameters (the truth). So let\u2019s begin there\u2026\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"627\"]<img class=\"frame-4\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170447\/Image35759_fmt.png\" alt=\"Image35759.PNG\" width=\"627\" height=\"488\" \/> Figure 1. Using sample statistics to estimate population parameters.<em>\u00a0<\/em>[\/caption]\r\n<h1>Section 1<\/h1>\r\n<h2>Descriptive Statistics<\/h2>\r\nA population is the group to be studied, and population data is a collection of all elements in the population. For example:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">All the fish in Long Lake.<\/li>\r\n \t<li class=\"List-Paragraph\">All the lakes in the Adirondack Park.<\/li>\r\n \t<li class=\"List-Paragraph\">All the grizzly bears in Yellowstone National Park.<\/li>\r\n<\/ul>\r\nA sample is a subset of data drawn from the population of interest. For example:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">100 fish randomly sampled from Long Lake.<\/li>\r\n \t<li class=\"List-Paragraph\">25 lakes randomly selected from the Adirondack Park.<\/li>\r\n \t<li class=\"List-Paragraph\">60 grizzly bears with a home range in Yellowstone National Park.<\/li>\r\n<\/ul>\r\nPopulations are characterized by descriptive measures called parameters. Inferences about parameters are based on sample statistics. For example, the population mean (\u00b5) is estimated by the sample mean (<span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>). The population variance (\u03c3<span class=\"Superscript SmallText\">2<\/span>) is estimated by the sample variance (s<span class=\"Superscript SmallText\">2<\/span>).\r\n\r\nVariables are the characteristics we are interested in. For example:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">The length of fish in Long Lake.<\/li>\r\n \t<li class=\"List-Paragraph\">The pH of lakes in the Adirondack Park.<\/li>\r\n \t<li class=\"List-Paragraph\">The weight of grizzly bears in Yellowstone National Park.<\/li>\r\n<\/ul>\r\nVariables are divided into two major groups: qualitative and quantitative. Qualitative variables have values that are attributes or categories. Mathematical operations cannot be applied to qualitative variables. Examples of qualitative variables are gender, race, and petal color. Quantitative variables have values that are typically numeric, such as measurements. Mathematical operations can be applied to these data. Examples of quantitative variables are age, height, and length.\r\n\r\nQuantitative variables can be broken down further into two more categories: discrete and continuous variables. Discrete variables have a finite or countable number of possible values. Think of discrete variables as \u201chens.\u201d Hens can lay 1 egg, or 2 eggs, or 13 eggs\u2026 There are a limited, definable number of values that the variable could take on.\r\n<p class=\"Centered\"><em><span class=\"Picture\"><img class=\"frame-6 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170449\/958.png\" alt=\"958.png\" \/><\/span>\u00a0<\/em><\/p>\r\nContinuous variables have an infinite number of possible values. Think of continuous variables as \u201ccows.\u201d Cows can give 4.6713245 gallons of milk, or 7.0918754 gallons of milk, or 13.272698 gallons of milk \u2026 There are an almost infinite number of values that a continuous variable could take on.\r\n<p class=\"Centered\"><span class=\"Picture\"><img class=\"frame-7 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170450\/948.png\" alt=\"948.png\" \/><\/span><\/p>\r\n\r\n<div class=\"textbox examples\">\r\n<h3>Examples<\/h3>\r\n<p class=\"Example\">Is the variable qualitative or quantitative?<\/p>\r\n\r\n<table id=\"Table\" class=\"Table\" style=\"margin-left: 23px\"><colgroup> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<p class=\"Table-Heading\">Species<\/p>\r\n<\/td>\r\n<td>\r\n<p class=\"Table-Heading\">Weight<\/p>\r\n<\/td>\r\n<td>\r\n<p class=\"Table-Heading\">Diameter<\/p>\r\n<\/td>\r\n<td>\r\n<p class=\"Table-Heading\">Zip Code<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>\r\n<p class=\"Table-Heading\">(qualitative<\/p>\r\n<\/td>\r\n<td>\r\n<p class=\"Table-Heading\">quantitative,<\/p>\r\n<\/td>\r\n<td>\r\n<p class=\"Table-Heading\">quantitative,<\/p>\r\n<\/td>\r\n<td>\r\n<p class=\"Table-Heading\">qualitative)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<h2 class=\"Example\">Descriptive Measures<\/h2>\r\nDescriptive measures of populations are called parameters and are typically written using Greek letters. The population mean is \u03bc (mu). The population variance is <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong><span class=\"Superscript SmallText\">2<\/span> (sigma squared) and population standard deviation is <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> (sigma).\r\n\r\nDescriptive measures of samples are called statistics and are typically written using Roman letters. The sample mean is <span class=\"Inline-Equation\"><img class=\"frame-5\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170451\/927.png\" alt=\"927.png\" \/><\/span>(x-bar). The sample variance is <span class=\"BoldItalic Strong-2\">s<\/span><span class=\"Superscript SmallText\">2<\/span> and the sample standard deviation is <span class=\"BoldItalic Strong-2\">s<\/span>. Sample statistics are used to estimate unknown population parameters.\r\n\r\nIn this section, we will examine descriptive statistics in terms of measures of center and measures of dispersion. These descriptive statistics help us to identify the center and spread of the data.\r\n<h2>Measures of Center<\/h2>\r\n<h3>Mean<\/h3>\r\nThe arithmetic mean of a variable, often called the average, is computed by adding up all the values and dividing by the total number of values.\r\n\r\nThe population mean is represented by the Greek letter \u03bc (mu). The sample mean is represented by <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>(x-bar). The sample mean is usually the best, unbiased estimate of the population mean. However, the mean is influenced by extreme values (outliers) and may not be the best measure of center with strongly skewed data. The following equations compute the population mean and sample mean.\r\n<p class=\"Centered\"><span class=\"Inline-Equation-Large\"><img class=\"frame-71 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170452\/910.png\" alt=\"910.png\" \/><\/span>\u00a0 \u00a0<span class=\"Inline-Equation-Large\"><img class=\"frame-71 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170452\/902.png\" alt=\"902.png\" \/><\/span><\/p>\r\nwhere <em>x<\/em><span class=\"Subscript SmallText\">i<\/span> is an element in the data set, <em>N<\/em> is the number of elements in the population, and <em>n<\/em> is the number of elements in the sample data set.\r\n<div class=\"textbox examples\">\r\n<h3>Example 2<\/h3>\r\n<p class=\"Example\">Find the mean for the following sample data set: 6.4, 5.2, 7.9, 3.4<\/p>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-10 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170453\/893.png\" alt=\"893.png\" \/><\/span><\/p>\r\n\r\n<\/div>\r\n<h3>Median<\/h3>\r\nThe median of a variable is the middle value of the data set when the data are sorted in order from least to greatest. It splits the data into two equal halves with 50% of the data below the median and 50% above the median. The median is resistant to the influence of outliers, and may be a better measure of center with strongly skewed data.\r\n<p class=\"Centered\"><img class=\"frame-11 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170455\/Image35835_fmt.png\" alt=\"Image35835.PNG\" \/><\/p>\r\nThe calculation of the median depends on the number of observations in the data set.\r\n\r\nTo calculate the median with an odd number of values (<em>n<\/em> is odd), first sort the data from smallest to largest.\r\n<div class=\"textbox examples\">\r\n<h3>Example 3<\/h3>\r\n<p class=\"ExampleHeading\" style=\"text-align: center\">23, 27, 29, 31, 35, 39, 40, 42, 44, 47, 51<\/p>\r\n<p class=\"Example\">The median is 39. It is the middle value that separates the lower 50% of the data from the upper 50% of the data.<\/p>\r\n\r\n<\/div>\r\nTo calculate the median with an even number of values (<em>n<\/em> is even), first sort the data from smallest to largest and take the average of the two middle values.\r\n<div class=\"textbox examples\">\r\n<h3>Example 4<\/h3>\r\n<p class=\"ExampleHeading\" style=\"text-align: center\">23, 27, 29, 31, 35, 39, 40, 42, 44, 47<\/p>\r\n<p class=\"Caption\"><span class=\"Picture\"><img class=\"frame-12 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170456\/877.png\" alt=\"877.png\" \/><\/span><\/p>\r\n\r\n<\/div>\r\n<h3>Mode<\/h3>\r\nThe mode is the most frequently occurring value and is commonly used with qualitative data as the values are categorical. Categorical data cannot be added, subtracted, multiplied or divided, so the mean and median cannot be computed. The mode is less commonly used with quantitative data as a measure of center. Sometimes each value occurs only once and the mode will not be meaningful.\r\n\r\nUnderstanding the relationship between the mean and median is important. It gives us insight into the distribution of the variable. For example, if the distribution is skewed right (positively skewed), the mean will increase to account for the few larger observations that pull the distribution to the right. The median will be less affected by these extreme large values, so in this situation, the mean will be larger than the median. In a symmetric distribution, the mean, median, and mode will all be similar in value. If the distribution is skewed left (negatively skewed), the mean will decrease to account for the few smaller observations that pull the distribution to the left. Again, the median will be less affected by these extreme small observations, and in this situation, the mean will be less than the median.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"840\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170458\/Image35846_fmt.png\" alt=\"Image35846.PNG\" width=\"840\" height=\"229\" \/> Figure 2. Illustration of skewed and symmetric distributions.[\/caption]\r\n<h2>Measures of Dispersion<\/h2>\r\nMeasures of center look at the average or middle values of a data set. Measures of dispersion look at the spread or variation of the data. Variation refers to the amount that the values vary among themselves. Values in a data set that are relatively close to each other have lower measures of variation. Values that are spread farther apart have higher measures of variation.\r\n\r\nExamine the two histograms below. Both groups have the same mean weight, but the values of Group A are more spread out compared to the values in Group B. Both groups have an average weight of 267 lb. but the weights of Group A are more variable.\r\n\r\n[caption id=\"\" align=\"alignnone\" width=\"908\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170501\/860.png\" alt=\"860.png\" width=\"908\" height=\"295\" \/> Figure 3. Histograms of Group A and Group B.[\/caption]\r\n\r\nThis section will examine five measures of dispersion: range, variance, standard deviation, standard error, and coefficient of variation.\r\n<h3>Range<\/h3>\r\nThe range of a variable is the largest value minus the smallest value. It is the simplest measure and uses only these two values in a quantitative data set.\r\n<div class=\"textbox examples\">\r\n<h3>Example 5<\/h3>\r\n<p class=\"Example\">Find the range for the given data set.<\/p>\r\n<p class=\"Example\" style=\"text-align: center\">12, 29, 32, 34, 38, 49, 57<\/p>\r\n<p class=\"Example\">Range = 57 \u2013 12 = 45<\/p>\r\n\r\n<\/div>\r\n<h3>Variance<\/h3>\r\nThe variance uses the difference between each value and its arithmetic mean. The differences are squared to deal with positive and negative differences. The sample variance (s<span class=\"Superscript SmallText\">2<\/span>) is an unbiased estimator of the population variance (<strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong><span class=\"Superscript SmallText\">2<\/span>), with n-1 degrees of freedom.\r\n<p class=\"Callout\"><span class=\"pullquote-left\">Degrees of freedom: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.<\/span><\/p>\r\nThe sample variance is unbiased due to the difference in the denominator. If we used \u201cn\u201d in the denominator instead of \u201cn - 1\u201d, we would consistently underestimate the true population variance. To correct this bias, the denominator is modified to \u201cn - 1\u201d.\r\n\r\nPopulation variance \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Sample variance\r\n<p class=\"Centered\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span><span class=\"Superscript SmallText\">2<\/span> = <span class=\"Inline-Equation-Large\"><img class=\"frame-58\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170502\/852.png\" alt=\"852.png\" \/><\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0s<span class=\"Superscript SmallText\">2<\/span> = <span class=\"Inline-Equation-Large\"><img class=\"frame-57\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170503\/842.png\" alt=\"842.png\" \/><\/span><\/p>\r\n\r\n<div class=\"textbox examples\">\r\n<h3>Example 6<\/h3>\r\n<p class=\"Example\">Compute the variance of the sample data: 3, 5, 7. The sample mean is 5.<\/p>\r\n<p class=\"ExampleCenter\"><span class=\"Picture\"><img class=\"frame-32 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170504\/832.png\" alt=\"832.png\" \/>\u00a0<\/span><\/p>\r\n\r\n<\/div>\r\n<h3>Standard Deviation<\/h3>\r\nThe standard deviation is the square root of the variance (both population and sample). While the sample variance is the positive, unbiased estimator for the population variance, the units for the variance are squared. The standard deviation is a common method for numerically describing the distribution of a variable. The population standard deviation is <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span> (sigma) and sample standard deviation is <em>s<\/em>.\r\n\r\nPopulation standard deviation \u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0Sample standard deviation\r\n<p class=\"Centered\"><img class=\"frame-12\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170506\/823.png\" alt=\"823.png\" \/>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<img class=\"frame-12\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170507\/816.png\" alt=\"816.png\" \/><\/p>\r\n\r\n<div class=\"textbox examples\">\r\n<h3>Example 7<\/h3>\r\n<p class=\"Example\">Compute the standard deviation of the sample data: 3, 5, 7 with a sample mean of 5.<\/p>\r\n<p class=\"ExampleCenter\"><span class=\"Picture\"><img class=\"frame-19 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170508\/809.png\" alt=\"809.png\" \/><\/span><\/p>\r\n\r\n<\/div>\r\n<h3>Standard Error of the Means<\/h3>\r\nCommonly, we use the sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span> to estimate the population mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>\u03bc<\/em><\/span>. For example, if we want to estimate the heights of eighty-year-old cherry trees, we can proceed as follows:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">Randomly select 100 trees<\/li>\r\n \t<li class=\"List-Paragraph\">Compute the sample mean of the 100 heights<\/li>\r\n \t<li class=\"List-Paragraph\">Use that as our estimate<\/li>\r\n<\/ul>\r\nWe want to use this sample mean to estimate the true but unknown population mean. But our sample of 100 trees is just one of many possible samples (of the same size) that could have been randomly selected. Imagine if we take a series of different random samples from the same population and all the same size:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">Sample 1\u2014we compute sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span><\/li>\r\n \t<li class=\"List-Paragraph\">Sample 2\u2014we compute sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span><\/li>\r\n \t<li class=\"List-Paragraph\">Sample 3\u2014we compute sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span><\/li>\r\n \t<li class=\"List-Paragraph\">Etc.<\/li>\r\n<\/ul>\r\nEach time we sample, we may get a different result as we are using a different subset of data to compute the sample mean. This shows us that the sample mean is a random variable!\r\n\r\nThe sample mean (<span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>) is a random variable with its own probability distribution called the sampling distribution of the sample mean. The distribution of the sample mean will have a mean equal to \u00b5 and a standard deviation equal to <span class=\"Inline-Equation\"><img class=\"frame-42\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170509\/761.png\" alt=\"761.png\" width=\"27\" height=\"25\" \/><\/span>.\r\n<p class=\"Callout\"><span class=\"pullquote-left\">The standard error <span class=\"Inline-Equation\"><img class=\"frame-42\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170510\/750.png\" alt=\"750.png\" width=\"28\" height=\"25\" \/><\/span> is the standard deviation of all possible sample means.<\/span><\/p>\r\nIn reality, we would only take one sample, but we need to understand and quantify the sample to sample variability that occurs in the sampling process.\r\n\r\nThe standard error is the standard deviation of the sample means and can be expressed in different ways.\r\n<p class=\"Centered\"><img class=\"frame-45 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170510\/Image35864_fmt.png\" alt=\"Image35864.PNG\" \/><\/p>\r\nNote: <em>s<\/em><span class=\"Superscript SmallText\">2<\/span> is the sample variance and <em>s<\/em> is the sample standard deviation\r\n<div class=\"textbox examples\">\r\n<h3>Example 8<\/h3>\r\n<p class=\"Example\">Describe the distribution of the sample mean.<\/p>\r\n<p class=\"Example\">A population of fish has weights that are normally distributed with \u00b5 = 8 lb. and s = 2.6 lb. If you take a sample of size n=6, the sample mean will have a normal distribution with a mean of 8 and a standard deviation (standard error) of <span class=\"Inline-Equation\"><img class=\"frame-18\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170511\/728.png\" alt=\"728.png\" \/><\/span>= 1.061 lb.<\/p>\r\n<p class=\"Example\">If you increase the sample size to 10, the sample mean will be normally distributed with a mean of 8 lb. and a standard deviation (standard error) of <span class=\"Inline-Equation\"><img class=\"frame-18\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170512\/721.png\" alt=\"721.png\" \/><\/span> = 0.822 lb.<\/p>\r\n<p class=\"Example\">Notice how the standard error decreases as the sample size increases.<\/p>\r\n\r\n<\/div>\r\nThe Central Limit Theorem (CLT) states that the sampling distribution of the sample means will approach a normal distribution as the sample size increases. If we do not have a normal distribution, or know nothing about our distribution of our random variable, the CLT tells us that the distribution of the <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>\u2019s will become normal as <em>n<\/em> increases. How large does <em>n<\/em> have to be? A general rule of thumb tells us that <em>n<\/em> \u2265 30.\r\n<p class=\"Callout\"><span class=\"pullquote-left\">The Central Limit Theorem tells us that regardless of the shape of our population, the sampling distribution of the sample mean will be normal as the sample size increases.<\/span><\/p>\r\n\r\n<h3>Coefficient of Variation<\/h3>\r\nTo compare standard deviations between different populations or samples is difficult because the standard deviation depends on units of measure. The coefficient of variation expresses the standard deviation as a percentage of the sample or population mean. It is a unitless measure.\r\n\r\nPopulation data\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Sample data\r\n<p class=\"Centered\">CV = <img class=\"frame-23\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170512\/703.png\" alt=\"703.png\" \/>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CV = <img class=\"frame-23\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170513\/694.png\" alt=\"694.png\" \/><\/p>\r\n\r\n<div class=\"textbox examples\">\r\n<h3>Example 9<\/h3>\r\n<p class=\"Example\">Fisheries biologists were studying the length and weight of Pacific salmon. They took a random sample and computed the mean and standard deviation for length and weight (given below). While the standard deviations are similar, the differences in units between lengths and weights make it difficult to compare the variability. Computing the coefficient of variation for each variable allows the biologists to determine which variable has the greater standard deviation.<\/p>\r\n\r\n<table id=\"table-2\" class=\"Table\" style=\"margin-left: 23px\"><colgroup> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><\/td>\r\n<td>Sample mean<\/td>\r\n<td>Sample standard deviation<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Length<\/td>\r\n<td>63 cm<\/td>\r\n<td>19.97 cm<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Weight<\/td>\r\n<td>37.6 kg<\/td>\r\n<td>19.39 kg<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><\/td>\r\n<td><span class=\"Superscript SmallText\"><img class=\"frame-24\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170515\/685.png\" alt=\"685.png\" width=\"207\" height=\"49\" \/>\u00a0<\/span><\/td>\r\n<td><span class=\"Superscript SmallText\"><img class=\"frame-25\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170516\/678.png\" alt=\"678.png\" width=\"211\" height=\"49\" \/>\u00a0<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p class=\"Example\">There is greater variability in Pacific salmon weight compared to length.<\/p>\r\n\r\n<\/div>\r\n<h3>Variability<\/h3>\r\nVariability is described in many different ways. Standard deviation measures point to point variability <span class=\"Red Strong-2\">within a sample<\/span>, i.e., variation among individual sampling units. Coefficient of variation also measures point to point variability but on a relative basis (relative to the mean), and is not influenced by measurement units. Standard error measures the <span class=\"Red Strong-2\">sample to sample variability<\/span>, i.e. variation among repeated samples in the sampling process. Typically, we only have one sample and standard error allows us to quantify the uncertainty in our sampling process.\r\n<h3>Basic Statistics Example using Excel and Minitab Software<\/h3>\r\nConsider the following tally from 11 sample plots on Heiburg Forest, where X<span class=\"Subscript SmallText\">i<\/span> is the number of downed logs per acre. Compute basic statistics for the sample plots.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"832\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170519\/661.png\" alt=\"661.png\" width=\"832\" height=\"851\" \/> Table 1. Sample data on number of downed logs per acre from Heiburg Forest.[\/caption]\r\n\r\n(1) Sample mean:\u00a0<span class=\"Inline-Equation\"><img class=\"frame-26\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170520\/654.png\" alt=\"654.png\" width=\"192\" height=\"80\" \/><\/span>\r\n\r\n(2) Median = 35\r\n\r\n(3) Variance:\r\n\r\n<span class=\"Picture\"><img class=\"frame-27 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170522\/644.png\" alt=\"644.png\" width=\"436\" height=\"200\" \/><\/span>\r\n\r\n(4) Standard deviation: \u00a0<span class=\"Inline-Equation\"><img class=\"frame-28\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170524\/634.png\" alt=\"634.png\" width=\"258\" height=\"35\" \/><\/span>\r\n\r\n(5) Range: 55 \u2013 5 = 50\r\n\r\n(6) Coefficient of variation:\r\n<p class=\"Equation\"><span class=\"Inline-Equation\"><img class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170526\/625.png\" alt=\"625.png\" width=\"325\" height=\"53\" \/><\/span><\/p>\r\n(7) Standard error of the mean:\r\n<p class=\"Side-by-Side-Equations\"><span class=\"Picture\"><img class=\"frame-10 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170528\/618.png\" alt=\"618.png\" width=\"262\" height=\"119\" \/><\/span><\/p>\r\n\r\n<h2>Software Solutions<\/h2>\r\n<h3>Minitab<\/h3>\r\nOpen Minitab and enter data in the spreadsheet. Select STAT&gt;Descriptive stats and check all statistics required.\r\n<p class=\"No-Caption\"><span class=\"Equation-Left\"><img class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170530\/008_1_fmt.png\" alt=\"008_1.tif\" \/><\/span><span class=\"Equation-Right\"><img class=\"frame-1 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170534\/008_2_fmt.png\" alt=\"008_2.tif\" \/><\/span><\/p>\r\n\r\n<h4>Descriptive Statistics: Data<\/h4>\r\n<table class=\"Table\" style=\"font-size: 0.5em\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Variable<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">N<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">N*<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Mean<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SE Mean<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">StDev<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Variance<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">CoefVar<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Minimum<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Q1<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Data<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">11<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">32.27<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4.83<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">16.03<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">256.82<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">49.66<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5.00<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">20.00<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<table id=\"table-4\" class=\"Table\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr style=\"height: 43.2188px\">\r\n<td class=\"Table\" style=\"height: 43.2188px\">\r\n<p class=\"Table\">Variable<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43.2188px\">\r\n<p class=\"Table\">Median<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43.2188px\">\r\n<p class=\"Table\">Q3<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43.2188px\">\r\n<p class=\"Table\">Maximum<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43.2188px\">\r\n<p class=\"Table\">IQR<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr style=\"height: 43px\">\r\n<td class=\"Table\" style=\"height: 43px\">\r\n<p class=\"Table\">Data<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43px\">\r\n<p class=\"Table\">35.00<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43px\">\r\n<p class=\"Table\">45.00<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43px\">\r\n<p class=\"Table\">55.00<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43px\">\r\n<p class=\"Table\">25.00<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h3>Excel<\/h3>\r\nOpen up Excel and enter the data in the first column of the spreadsheet. Select DATA&gt;Data Analysis&gt;Descriptive Statistics. For the Input Range, select data in column A. Check \u201cLabels in First Row\u201d and \u201cSummary Statistics\u201d. Also check \u201cOutput Range\u201d and select location for output.\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-50 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170537\/009_2_fmt.png\" alt=\"009_2.tif\" \/><\/span><\/p>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-50 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170540\/009_1_fmt.png\" alt=\"009_1.tif\" \/><\/span><\/p>\r\n\r\n<table id=\"table-5\" class=\"Table\"><colgroup> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"2\">\r\n<p class=\"Table-Heading\" style=\"text-align: center\">Data<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Mean<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">32.27273<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Standard Error<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4.831884<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Median<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">35<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Mode<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">25<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Standard Deviation<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">16.02555<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Sample Variance<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">256.8182<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Kurtosis<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-0.73643<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Skewness<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-0.05982<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Range<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">50<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Minimum<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Maximum<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">55<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Sum<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">355<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Count<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">11<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h2>Graphical Representation<\/h2>\r\nData organization and summarization can be done graphically, as well as numerically. Tables and graphs allow for a quick overview of the information collected and support the presentation of the data used in the project. While there are a multitude of available graphics, this chapter will focus on a specific few commonly used tools.\r\n<h3>Pie Charts<\/h3>\r\nPie charts are a good visual tool allowing the reader to quickly see the relationship between categories. It is important to clearly label each category, and adding the frequency or relative frequency is often helpful. However, too many categories can be confusing. Be careful of putting too much information in a pie chart. The first pie chart gives a clear idea of the representation of fish types relative to the whole sample. The second pie chart is more difficult to interpret, with too many categories. It is important to select the best graphic when presenting the information to the reader.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"1003\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170544\/542.png\" alt=\"542.png\" width=\"1003\" height=\"371\" \/> Figure 4. Comparison of pie charts.[\/caption]\r\n<h3>Bar Charts and Histograms<\/h3>\r\nBar charts graphically describe the distribution of a qualitative variable (fish type) while histograms describe the distribution of a quantitative variable discrete or continuous variables (bear weight).\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"1052\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170547\/534.png\" alt=\"534.png\" width=\"1052\" height=\"389\" \/> Figure 5. Comparison of a bar chart for qualitative data and a histogram for quantitative data.[\/caption]\r\n\r\nIn both cases, the bars\u2019 equal width and the y-axis are clearly defined. With qualitative data, each category is represented by a specific bar. With continuous data, lower and upper class limits must be defined with equal class widths. There should be no gaps between classes and each observation should fall into one, and only one, class.\r\n<h3>Boxplots<\/h3>\r\nBoxplots use the 5-number summary (minimum and maximum values with the three quartiles) to illustrate the center, spread, and distribution of your data. When paired with histograms, they give an excellent description, both numerically and graphically, of the data.\r\n\r\nWith symmetric data, the distribution is bell-shaped and somewhat symmetric. In the boxplot, we see that Q1 and Q3 are approximately equidistant from the median, as are the minimum and maximum values. Also, both whiskers (lines extending from the boxes) are approximately equal in length.\r\n<p class=\"Caption\"><span class=\"Equation-Right\"><img class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170550\/012_2_fmt.png\" alt=\"012_2.tif\" \/><\/span><\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"402\"]<img class=\"frame-29\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170552\/012_1_fmt.png\" alt=\"012_1.tif\" width=\"402\" height=\"344\" \/> Figure 6. A histogram and boxplot of a normal distribution.[\/caption]\r\n\r\nWith skewed left distributions, we see that the histogram looks \u201cpulled\u201d to the left. In the boxplot, Q1 is farther away from the median as are the minimum values, and the left whisker is longer than the right whisker.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"423\"]<img class=\"frame-19\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170554\/013_2_fmt.png\" alt=\"013_2.tif\" width=\"423\" height=\"350\" \/> Figure 7. A histogram and boxplot of a skewed left distribution.[\/caption]\r\n<p class=\"Caption\"><span class=\"Equation-Left\"><img class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170556\/013_1_fmt.png\" alt=\"013_1.tif\" \/><\/span><\/p>\r\nWith skewed right distributions, we see that the histogram looks \u201cpulled\u201d to the right. In the boxplot, Q3 is farther away from the median, as is the maximum value, and the right whisker is longer than the left whisker.\r\n<p class=\"Caption\"><span class=\"Equation-Right\"><img class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170558\/014_2_fmt.png\" alt=\"014_2.tif\" \/><\/span><\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"413\"]<img class=\"frame-52\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170600\/014_1_fmt.png\" alt=\"014_1.tif\" width=\"413\" height=\"366\" \/> Figure 8. A histogram and boxplot of a skewed right distribution.[\/caption]\r\n<h1>Section 2<\/h1>\r\n<h2>Probability Distribution<\/h2>\r\nOnce we have organized and summarized your sample data, the next step is to identify the underlying distribution of our random variable. Computing probabilities for continuous random variables are complicated by the fact that there are an infinite number of possible values that our random variable can take on, so the probability of observing a particular value for a random variable is zero. Therefore, to find the probabilities associated with a continuous random variable, we use a probability density function (PDF).\r\n\r\nA PDF is an equation used to find probabilities for continuous random variables. The PDF must satisfy the following two rules:\r\n<ol>\r\n \t<li>The area under the curve must equal one (over all possible values of the random variable).<\/li>\r\n \t<li class=\"p1\">The probabilities must be equal to or greater than zero for all possible values of the random variable.<\/li>\r\n<\/ol>\r\n<p class=\"Callout\"><span class=\"pullquote-left\">The area under the curve of the probability density function over some interval represents the probability of observing those values of the random variable in that interval.<\/span><\/p>\r\n\r\n<h2>The Normal Distribution<\/h2>\r\nMany continuous random variables have a bell-shaped or somewhat symmetric distribution. This is a normal distribution. In other words, the probability distribution of its relative frequency histogram follows a normal curve. The curve is bell-shaped, symmetric about the mean, and defined by \u00b5 and \u03c3 (the mean and standard deviation).\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"550\"]<img class=\"frame-27\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170603\/Kiernan_media015_fmt.png\" alt=\"Kiernan_media015.png\" width=\"550\" height=\"350\" \/> Figure 9. A normal distribution.[\/caption]\r\n\r\nThere are normal curves for every combination of \u00b5 and \u03c3. The mean (\u00b5) shifts the curve to the left or right. The standard deviation (\u03c3) alters the spread of the curve. The first pair of curves have different means but the same standard deviation. The second pair of curves share the same mean (\u00b5) but have different standard deviations. The pink curve has a smaller standard deviation. It is narrower and taller, and the probability is spread over a smaller range of values. The blue curve has a larger standard deviation. The curve is flatter and the tails are thicker. The probability is spread over a larger range of values.\r\n<p class=\"Caption\"><span class=\"Picture\"><img class=\"frame-53 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170605\/Image36036_fmt.png\" alt=\"07_fig05a\" \/><\/span><\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"597\"]<img class=\"frame-53\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170607\/Image36045_fmt.png\" alt=\"07_fig05b\" width=\"597\" height=\"371\" \/> Figure 10. A comparison of normal curves.[\/caption]\r\n\r\nProperties of the normal curve:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">The mean is the center of this distribution and the highest point.<\/li>\r\n \t<li class=\"List-Paragraph\">The curve is symmetric about the mean. (The area to the left of the mean equals the area to the right of the mean.)<\/li>\r\n \t<li class=\"List-Paragraph\">The total area under the curve is equal to one.<\/li>\r\n \t<li class=\"List-Paragraph\">As <em>x<\/em> increases and decreases, the curve goes to zero but never touches.<\/li>\r\n \t<li class=\"List-Paragraph\">The PDF of a normal curve is <img class=\"frame-12\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170609\/438.png\" alt=\"438.png\" width=\"120\" height=\"53\" \/>.<\/li>\r\n \t<li class=\"List-Paragraph\">A normal curve can be used to estimate probabilities.<\/li>\r\n \t<li class=\"List-Paragraph\">A normal curve can be used to estimate proportions of a population that have certain x-values.<\/li>\r\n<\/ul>\r\n<h2>The Standard Normal Distribution<\/h2>\r\nThere are millions of possible combinations of means and standard deviations for continuous random variables. Finding probabilities associated with these variables would require us to integrate the PDF over the range of values we are interested in. To avoid this, we can rely on the standard normal distribution. The standard normal distribution is a special normal distribution with a \u00b5 = 0 and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 1. We can use the Z-score to standardize any normal random variable, converting the x-values to Z-scores, thus allowing us to use probabilities from the standard normal table. So how do we find area under the curve associated with a Z-score?\r\n<h4>Standard Normal Table<\/h4>\r\n<ul>\r\n \t<li class=\"List-Paragraph-Bullet-level-2\">The standard normal table gives probabilities associated with specific Z-scores.<\/li>\r\n \t<li class=\"List-Paragraph-Bullet-level-2\">The table we use is cumulative from the left.<\/li>\r\n \t<li class=\"List-Paragraph-Bullet-level-2\">The negative side is for all Z-scores less than zero (all values less than the mean).<\/li>\r\n \t<li class=\"List-Paragraph-Bullet-level-2\">The positive side is for all Z-scores greater than zero (all values greater than the mean).<\/li>\r\n \t<li class=\"List-Paragraph-Bullet-level-2\">Not all standard normal tables work the same way.<\/li>\r\n<\/ul>\r\n<div class=\"textbox examples\">\r\n<h3>Example 10<\/h3>\r\n<p class=\"ExampleHeading\">What is the area associated with the Z-score 1.62?<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"983\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170612\/429.png\" alt=\"429.png\" width=\"983\" height=\"625\" \/> Figure 11. The standard normal table and associated area for z = 1.62.[\/caption]\r\n\r\n<\/div>\r\n<h4>Reading the Standard Normal Table<\/h4>\r\n<ul>\r\n \t<li class=\"List-Paragraph\">Read down the Z-column to get the first part of the Z-score (1.6).<\/li>\r\n \t<li class=\"List-Paragraph\">Read across the top row to get the second decimal place in the Z-score (0.02).<\/li>\r\n \t<li class=\"List-Paragraph\">The intersection of this row and column gives the area under the curve to the left of the Z-score.<\/li>\r\n<\/ul>\r\n<h3>Finding Z-scores for a Given Area<\/h3>\r\n<ul>\r\n \t<li class=\"List-Paragraph\">What if we have an area and we want to find the Z-score associated with that area?<\/li>\r\n \t<li class=\"List-Paragraph\">Instead of Z-score \u2192 area, we want area \u2192 Z-score.<\/li>\r\n \t<li class=\"List-Paragraph\">We can use the standard normal table to find the area in the body of values and read backwards to find the associated Z-score.<\/li>\r\n \t<li class=\"List-Paragraph\">Using the table, search the probabilities to find an area that is closest to the probability you are interested in.<\/li>\r\n<\/ul>\r\n<div class=\"textbox examples\">\r\n<h3>Example 11<\/h3>\r\n<p class=\"Example\">To find a Z-score for which the area to the right is 5%:<\/p>\r\n<p class=\"Example\">Since the table is cumulative from the left, you must use the complement of 5%.<\/p>\r\n<p class=\"ExampleCenter\" style=\"text-align: center\">1.000 \u2013 0.05 = 0.9500<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"416\"]<img class=\"frame-104\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170614\/Image36062_fmt.png\" alt=\"Image36062.PNG\" width=\"416\" height=\"248\" \/> Figure 12. The upper 5% of the area under a normal curve.[\/caption]\r\n<ul>\r\n \t<li class=\"ExampleList\">Find the Z-score for the area of 0.9500.<\/li>\r\n \t<li class=\"ExampleList\">Look at the probabilities and find a value as close to 0.9500 as possible.<\/li>\r\n<\/ul>\r\n[caption id=\"\" align=\"aligncenter\" width=\"578\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170616\/Image36070_fmt.png\" alt=\"Image36070.PNG\" width=\"578\" height=\"197\" \/> Figure 13. The standard normal table.[\/caption]\r\n<p class=\"ExampleCenter\">The Z-score for the 95<span class=\"Superscript SmallText\">th<\/span> percentile is 1.64.<\/p>\r\n\r\n<\/div>\r\n<h3>Area in between Two Z-scores<\/h3>\r\n<div class=\"textbox examples\">\r\n<h3>Example 12<\/h3>\r\n<p class=\"Example\">To find Z-scores that limit the middle 95%:<\/p>\r\n\r\n<ul>\r\n \t<li class=\"ExampleList\">The middle 95% has 2.5% on the right and 2.5% on the left.<\/li>\r\n \t<li class=\"ExampleList\">Use the symmetry of the curve.<\/li>\r\n<\/ul>\r\n[caption id=\"\" align=\"aligncenter\" width=\"460\"]<img class=\"frame-104\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170618\/Image36080_fmt.png\" alt=\"Image36080.PNG\" width=\"460\" height=\"242\" \/> Figure 14. The middle 95% of the area under a normal curve.[\/caption]\r\n<ul>\r\n \t<li class=\"ExampleList\">Look at your standard normal table. Since the table is cumulative from the left, it is easier to find the area to the left first.<\/li>\r\n \t<li class=\"ExampleList\">Find the area of 0.025 on the negative side of the table.<\/li>\r\n \t<li class=\"ExampleList\">The Z-score for the area to the left is -1.96.<\/li>\r\n \t<li class=\"ExampleList\">Since the curve is symmetric, the Z-score for the area to the right is 1.96.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<h3 class=\"Example\">Common Z-scores<\/h3>\r\nThere are many commonly used Z-scores:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">Z<span class=\"Subscript SmallText\">.05<\/span> = 1.645 and the area between -1.645 and 1.645 is 90%<\/li>\r\n \t<li class=\"List-Paragraph\">Z<span class=\"Subscript SmallText\">.025<\/span> = 1.96 and the area between -1.96 and 1.96 is 95%<\/li>\r\n \t<li class=\"List-Paragraph\">Z<span class=\"Subscript SmallText\">.005<\/span> = 2.575 and the area between -2.575 and 2.575 is 99%<\/li>\r\n<\/ul>\r\n<h2>Applications of the Normal Distribution<\/h2>\r\nTypically, our normally distributed data do not have \u03bc = 0 and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 1, but we can relate any normal distribution to the standard normal distributions using the Z-score. We can transform values of x to values of z.\r\n<p class=\"Centered\"><img class=\"frame-17 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170620\/393.png\" alt=\"393.png\" \/><\/p>\r\nFor example, if a normally distributed random variable has a \u03bc = 6 and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 2, then a value of x = 7 corresponds to a Z-score of 0.5.\r\n<p class=\"Centered\"><img class=\"frame-6 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170621\/386.png\" alt=\"386.png\" \/><\/p>\r\nThis tells you that 7 is one-half a standard deviation above its mean. We can use this relationship to find probabilities for any normal random variable.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"902\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170623\/Image36090_fmt.png\" alt=\"07_fig33\" width=\"902\" height=\"325\" \/> Figure 15. A normal and standard normal curve.[\/caption]\r\n\r\nTo find the area for values of X, a normal random variable, draw a picture of the area of interest, convert the x-values to Z-scores using the Z-score and then use the standard normal table to find areas to the left, to the right, or in between.\r\n<p class=\"Centered\"><span class=\"Picture\"><img class=\"frame-17 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170625\/369.png\" alt=\"369.png\" \/><\/span><\/p>\r\n\r\n<div class=\"textbox examples\">\r\n<h3>Example 13<\/h3>\r\n<p class=\"Example\">Adult deer population weights are normally distributed with \u00b5 = 110 lb. and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 29.7 lb. As a biologist you determine that a weight less than 82 lb. is unhealthy and you want to know what proportion of your population is unhealthy.<\/p>\r\n<p class=\"ExampleCenter\">P(x&lt;82)<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"453\"]<img class=\"frame-59\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170627\/Image36098_fmt.png\" alt=\"Image36098.PNG\" width=\"453\" height=\"241\" \/> Figure 16. The area under a normal curve for P(x&lt;82).[\/caption]\r\n<p class=\"ExampleCenter\">Convert 82 to a Z-score <span class=\"Inline-Equation-Large\"><img class=\"frame-58\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170629\/352.png\" alt=\"352.png\" width=\"166\" height=\"51\" \/><\/span><\/p>\r\n<p class=\"Example\">The <em>x<\/em> value of 82 is 0.94 standard deviations below the mean.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"467\"]<img class=\"frame-59\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170630\/Image36106_fmt.png\" alt=\"Image36106.PNG\" width=\"467\" height=\"241\" \/> Figure 17. Area under a standard normal curve for P(z&lt;-0.94).[\/caption]\r\n<p class=\"Example\">Go to the standard normal table (negative side) and find the area associated with a Z-score of -0.94.<\/p>\r\n<p class=\"Example\">This is an \u201carea to the left\u201d problem so you can read directly from the table to get the probability.<\/p>\r\n<p class=\"ExampleCenter\">P(x&lt;82) = 0.1736<\/p>\r\n<p class=\"Example\">Approximately 17.36% of the population of adult deer is underweight, OR one deer chosen at random will have a 17.36% chance of weighing less than 82 lb.<\/p>\r\n\r\n<\/div>\r\n<div class=\"textbox examples\">\r\n<h3>Example 14<\/h3>\r\n<p class=\"Example\">Statistics from the Midwest Regional Climate Center indicate that Jones City, which has a large wildlife refuge, gets an average of 36.7 in. of rain each year with a standard deviation of 5.1 in. The amount of rain is normally distributed. During what percent of the years does Jones City get more than 40 in. of rain?<\/p>\r\n<p class=\"ExampleCenter\">P(x &gt; 40)<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"396\"]<img class=\"frame-59\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170632\/Image36118_fmt.png\" alt=\"Image36118.PNG\" width=\"396\" height=\"243\" \/> Figure 18. Area under a normal curve for P(x&gt;40).[\/caption]\r\n<p class=\"ExampleCenter\"><span class=\"Inline-Equation-Large\"><img class=\"frame-58\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170634\/325.png\" alt=\"325.png\" width=\"148\" height=\"47\" \/><\/span>\u00a0\u00a0\u00a0\u00a0 P(x&gt;40) = (1-0.7422) = 0.2578<\/p>\r\n<p class=\"Example\">For approximately 25.78% of the years, Jones City will get more than 40 in. of rain.<\/p>\r\n\r\n<\/div>\r\n<h2 class=\"Example\">Assessing Normality<\/h2>\r\nIf the distribution is unknown and the sample size is not greater than 30 (Central Limit Theorem), we have to assess the assumption of normality. Our primary method is the normal probability plot. This plot graphs the observed data, ranked in ascending order, against the \u201cexpected\u201d Z-score of that rank. If the sample data were taken from a normally distributed random variable, then the plot would be approximately linear.\r\n\r\nExamine the following probability plot. The center line is the relationship we would expect to see if the data were drawn from a perfectly normal distribution. Notice how the observed data (red dots) loosely follow this linear relationship. Minitab also computes an Anderson-Darling test to assess normality. The null hypothesis for this test is that the sample data have been drawn from a normally distributed population. A p-value greater than 0.05 supports the assumption of normality.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"749\"]<img class=\"frame-40\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170636\/314.png\" alt=\"314.png\" width=\"749\" height=\"497\" \/> Figure 19. A normal probability plot generated using Minitab 16.[\/caption]\r\n\r\nCompare the histogram and the normal probability plot in this next example. The histogram indicates a skewed right distribution.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"840\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170638\/304.png\" alt=\"304.png\" width=\"840\" height=\"290\" \/> Figure 20. Histogram and normal probability plot for skewed right data.[\/caption]\r\n\r\nThe observed data do not follow a linear pattern and the p-value for the A-D test is less than 0.005 indicating a non-normal population distribution.\r\n\r\nNormality cannot be assumed. You must always verify this assumption. Remember, the probabilities we are finding come from the standard NORMAL table. If our data are NOT normally distributed, then these probabilities DO NOT APPLY.\r\n<ul>\r\n \t<li class=\"List-Paragraph\">Do you know if the population is normally distributed?<\/li>\r\n \t<li class=\"List-Paragraph\">Do you have a large enough sample size (n\u226530)? Remember the Central Limit Theorem?<\/li>\r\n \t<li class=\"List-Paragraph\">Did you construct a normal probability plot?<\/li>\r\n<\/ul>\r\n<\/div>","rendered":"<div class=\"Basic-Text-Frame\">\n<p>Statistics has become the universal language of the sciences, and data analysis can lead to powerful results. As scientists, researchers, and managers working in the natural resources sector, we all rely on statistical analysis to help us answer the questions that arise in the populations we manage. For example:<\/p>\n<ul>\n<li class=\"List-Paragraph\">Has there been a significant change in the mean sawtimber volume in the red pine stands?<\/li>\n<li class=\"List-Paragraph\">Has there been an increase in the number of invasive species found in the Great Lakes?<\/li>\n<li class=\"List-Paragraph\">What proportion of white tail deer in New Hampshire have weights below the limit considered healthy?<\/li>\n<li class=\"List-Paragraph\">Did fertilizer A, B, or C have an effect on the corn yield?<\/li>\n<\/ul>\n<p>These are typical questions that require statistical analysis for the answers. In order to answer these questions, a good random sample must be collected from the population of interests. We then use descriptive statistics to organize and summarize our sample data. The next step is inferential statistics, which allows us to use our sample statistics and extend the results to the population, while measuring the reliability of the result. But before we begin exploring different types of statistical methods, a brief review of descriptive statistics is needed.<\/p>\n<p class=\"Callout\"><span class=\"pullquote-left\">Statistics is the science of collecting, organizing, summarizing, analyzing, and interpreting information.<\/span><\/p>\n<p>Good statistics come from good samples, and are used to draw conclusions or answer questions about a population. We use sample statistics to estimate population parameters (the truth). So let\u2019s begin there\u2026<\/p>\n<div style=\"width: 637px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-4\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170447\/Image35759_fmt.png\" alt=\"Image35759.PNG\" width=\"627\" height=\"488\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 1. Using sample statistics to estimate population parameters.<em>\u00a0<\/em><\/p>\n<\/div>\n<h1>Section 1<\/h1>\n<h2>Descriptive Statistics<\/h2>\n<p>A population is the group to be studied, and population data is a collection of all elements in the population. For example:<\/p>\n<ul>\n<li class=\"List-Paragraph\">All the fish in Long Lake.<\/li>\n<li class=\"List-Paragraph\">All the lakes in the Adirondack Park.<\/li>\n<li class=\"List-Paragraph\">All the grizzly bears in Yellowstone National Park.<\/li>\n<\/ul>\n<p>A sample is a subset of data drawn from the population of interest. For example:<\/p>\n<ul>\n<li class=\"List-Paragraph\">100 fish randomly sampled from Long Lake.<\/li>\n<li class=\"List-Paragraph\">25 lakes randomly selected from the Adirondack Park.<\/li>\n<li class=\"List-Paragraph\">60 grizzly bears with a home range in Yellowstone National Park.<\/li>\n<\/ul>\n<p>Populations are characterized by descriptive measures called parameters. Inferences about parameters are based on sample statistics. For example, the population mean (\u00b5) is estimated by the sample mean (<span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>). The population variance (\u03c3<span class=\"Superscript SmallText\">2<\/span>) is estimated by the sample variance (s<span class=\"Superscript SmallText\">2<\/span>).<\/p>\n<p>Variables are the characteristics we are interested in. For example:<\/p>\n<ul>\n<li class=\"List-Paragraph\">The length of fish in Long Lake.<\/li>\n<li class=\"List-Paragraph\">The pH of lakes in the Adirondack Park.<\/li>\n<li class=\"List-Paragraph\">The weight of grizzly bears in Yellowstone National Park.<\/li>\n<\/ul>\n<p>Variables are divided into two major groups: qualitative and quantitative. Qualitative variables have values that are attributes or categories. Mathematical operations cannot be applied to qualitative variables. Examples of qualitative variables are gender, race, and petal color. Quantitative variables have values that are typically numeric, such as measurements. Mathematical operations can be applied to these data. Examples of quantitative variables are age, height, and length.<\/p>\n<p>Quantitative variables can be broken down further into two more categories: discrete and continuous variables. Discrete variables have a finite or countable number of possible values. Think of discrete variables as \u201chens.\u201d Hens can lay 1 egg, or 2 eggs, or 13 eggs\u2026 There are a limited, definable number of values that the variable could take on.<\/p>\n<p class=\"Centered\"><em><span class=\"Picture\"><img decoding=\"async\" class=\"frame-6 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170449\/958.png\" alt=\"958.png\" \/><\/span>\u00a0<\/em><\/p>\n<p>Continuous variables have an infinite number of possible values. Think of continuous variables as \u201ccows.\u201d Cows can give 4.6713245 gallons of milk, or 7.0918754 gallons of milk, or 13.272698 gallons of milk \u2026 There are an almost infinite number of values that a continuous variable could take on.<\/p>\n<p class=\"Centered\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-7 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170450\/948.png\" alt=\"948.png\" \/><\/span><\/p>\n<div class=\"textbox examples\">\n<h3>Examples<\/h3>\n<p class=\"Example\">Is the variable qualitative or quantitative?<\/p>\n<table id=\"Table\" class=\"Table\" style=\"margin-left: 23px\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td>\n<p class=\"Table-Heading\">Species<\/p>\n<\/td>\n<td>\n<p class=\"Table-Heading\">Weight<\/p>\n<\/td>\n<td>\n<p class=\"Table-Heading\">Diameter<\/p>\n<\/td>\n<td>\n<p class=\"Table-Heading\">Zip Code<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"Table-Heading\">(qualitative<\/p>\n<\/td>\n<td>\n<p class=\"Table-Heading\">quantitative,<\/p>\n<\/td>\n<td>\n<p class=\"Table-Heading\">quantitative,<\/p>\n<\/td>\n<td>\n<p class=\"Table-Heading\">qualitative)<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2 class=\"Example\">Descriptive Measures<\/h2>\n<p>Descriptive measures of populations are called parameters and are typically written using Greek letters. The population mean is \u03bc (mu). The population variance is <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong><span class=\"Superscript SmallText\">2<\/span> (sigma squared) and population standard deviation is <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> (sigma).<\/p>\n<p>Descriptive measures of samples are called statistics and are typically written using Roman letters. The sample mean is <span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-5\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170451\/927.png\" alt=\"927.png\" \/><\/span>(x-bar). The sample variance is <span class=\"BoldItalic Strong-2\">s<\/span><span class=\"Superscript SmallText\">2<\/span> and the sample standard deviation is <span class=\"BoldItalic Strong-2\">s<\/span>. Sample statistics are used to estimate unknown population parameters.<\/p>\n<p>In this section, we will examine descriptive statistics in terms of measures of center and measures of dispersion. These descriptive statistics help us to identify the center and spread of the data.<\/p>\n<h2>Measures of Center<\/h2>\n<h3>Mean<\/h3>\n<p>The arithmetic mean of a variable, often called the average, is computed by adding up all the values and dividing by the total number of values.<\/p>\n<p>The population mean is represented by the Greek letter \u03bc (mu). The sample mean is represented by <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>(x-bar). The sample mean is usually the best, unbiased estimate of the population mean. However, the mean is influenced by extreme values (outliers) and may not be the best measure of center with strongly skewed data. The following equations compute the population mean and sample mean.<\/p>\n<p class=\"Centered\"><span class=\"Inline-Equation-Large\"><img decoding=\"async\" class=\"frame-71 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170452\/910.png\" alt=\"910.png\" \/><\/span>\u00a0 \u00a0<span class=\"Inline-Equation-Large\"><img decoding=\"async\" class=\"frame-71 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170452\/902.png\" alt=\"902.png\" \/><\/span><\/p>\n<p>where <em>x<\/em><span class=\"Subscript SmallText\">i<\/span> is an element in the data set, <em>N<\/em> is the number of elements in the population, and <em>n<\/em> is the number of elements in the sample data set.<\/p>\n<div class=\"textbox examples\">\n<h3>Example 2<\/h3>\n<p class=\"Example\">Find the mean for the following sample data set: 6.4, 5.2, 7.9, 3.4<\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-10 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170453\/893.png\" alt=\"893.png\" \/><\/span><\/p>\n<\/div>\n<h3>Median<\/h3>\n<p>The median of a variable is the middle value of the data set when the data are sorted in order from least to greatest. It splits the data into two equal halves with 50% of the data below the median and 50% above the median. The median is resistant to the influence of outliers, and may be a better measure of center with strongly skewed data.<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-11 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170455\/Image35835_fmt.png\" alt=\"Image35835.PNG\" \/><\/p>\n<p>The calculation of the median depends on the number of observations in the data set.<\/p>\n<p>To calculate the median with an odd number of values (<em>n<\/em> is odd), first sort the data from smallest to largest.<\/p>\n<div class=\"textbox examples\">\n<h3>Example 3<\/h3>\n<p class=\"ExampleHeading\" style=\"text-align: center\">23, 27, 29, 31, 35, 39, 40, 42, 44, 47, 51<\/p>\n<p class=\"Example\">The median is 39. It is the middle value that separates the lower 50% of the data from the upper 50% of the data.<\/p>\n<\/div>\n<p>To calculate the median with an even number of values (<em>n<\/em> is even), first sort the data from smallest to largest and take the average of the two middle values.<\/p>\n<div class=\"textbox examples\">\n<h3>Example 4<\/h3>\n<p class=\"ExampleHeading\" style=\"text-align: center\">23, 27, 29, 31, 35, 39, 40, 42, 44, 47<\/p>\n<p class=\"Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-12 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170456\/877.png\" alt=\"877.png\" \/><\/span><\/p>\n<\/div>\n<h3>Mode<\/h3>\n<p>The mode is the most frequently occurring value and is commonly used with qualitative data as the values are categorical. Categorical data cannot be added, subtracted, multiplied or divided, so the mean and median cannot be computed. The mode is less commonly used with quantitative data as a measure of center. Sometimes each value occurs only once and the mode will not be meaningful.<\/p>\n<p>Understanding the relationship between the mean and median is important. It gives us insight into the distribution of the variable. For example, if the distribution is skewed right (positively skewed), the mean will increase to account for the few larger observations that pull the distribution to the right. The median will be less affected by these extreme large values, so in this situation, the mean will be larger than the median. In a symmetric distribution, the mean, median, and mode will all be similar in value. If the distribution is skewed left (negatively skewed), the mean will decrease to account for the few smaller observations that pull the distribution to the left. Again, the median will be less affected by these extreme small observations, and in this situation, the mean will be less than the median.<\/p>\n<div style=\"width: 850px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170458\/Image35846_fmt.png\" alt=\"Image35846.PNG\" width=\"840\" height=\"229\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 2. Illustration of skewed and symmetric distributions.<\/p>\n<\/div>\n<h2>Measures of Dispersion<\/h2>\n<p>Measures of center look at the average or middle values of a data set. Measures of dispersion look at the spread or variation of the data. Variation refers to the amount that the values vary among themselves. Values in a data set that are relatively close to each other have lower measures of variation. Values that are spread farther apart have higher measures of variation.<\/p>\n<p>Examine the two histograms below. Both groups have the same mean weight, but the values of Group A are more spread out compared to the values in Group B. Both groups have an average weight of 267 lb. but the weights of Group A are more variable.<\/p>\n<div style=\"width: 918px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170501\/860.png\" alt=\"860.png\" width=\"908\" height=\"295\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 3. Histograms of Group A and Group B.<\/p>\n<\/div>\n<p>This section will examine five measures of dispersion: range, variance, standard deviation, standard error, and coefficient of variation.<\/p>\n<h3>Range<\/h3>\n<p>The range of a variable is the largest value minus the smallest value. It is the simplest measure and uses only these two values in a quantitative data set.<\/p>\n<div class=\"textbox examples\">\n<h3>Example 5<\/h3>\n<p class=\"Example\">Find the range for the given data set.<\/p>\n<p class=\"Example\" style=\"text-align: center\">12, 29, 32, 34, 38, 49, 57<\/p>\n<p class=\"Example\">Range = 57 \u2013 12 = 45<\/p>\n<\/div>\n<h3>Variance<\/h3>\n<p>The variance uses the difference between each value and its arithmetic mean. The differences are squared to deal with positive and negative differences. The sample variance (s<span class=\"Superscript SmallText\">2<\/span>) is an unbiased estimator of the population variance (<strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong><span class=\"Superscript SmallText\">2<\/span>), with n-1 degrees of freedom.<\/p>\n<p class=\"Callout\"><span class=\"pullquote-left\">Degrees of freedom: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.<\/span><\/p>\n<p>The sample variance is unbiased due to the difference in the denominator. If we used \u201cn\u201d in the denominator instead of \u201cn &#8211; 1\u201d, we would consistently underestimate the true population variance. To correct this bias, the denominator is modified to \u201cn &#8211; 1\u201d.<\/p>\n<p>Population variance \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Sample variance<\/p>\n<p class=\"Centered\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span><span class=\"Superscript SmallText\">2<\/span> = <span class=\"Inline-Equation-Large\"><img decoding=\"async\" class=\"frame-58\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170502\/852.png\" alt=\"852.png\" \/><\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0s<span class=\"Superscript SmallText\">2<\/span> = <span class=\"Inline-Equation-Large\"><img decoding=\"async\" class=\"frame-57\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170503\/842.png\" alt=\"842.png\" \/><\/span><\/p>\n<div class=\"textbox examples\">\n<h3>Example 6<\/h3>\n<p class=\"Example\">Compute the variance of the sample data: 3, 5, 7. The sample mean is 5.<\/p>\n<p class=\"ExampleCenter\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-32 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170504\/832.png\" alt=\"832.png\" \/>\u00a0<\/span><\/p>\n<\/div>\n<h3>Standard Deviation<\/h3>\n<p>The standard deviation is the square root of the variance (both population and sample). While the sample variance is the positive, unbiased estimator for the population variance, the units for the variance are squared. The standard deviation is a common method for numerically describing the distribution of a variable. The population standard deviation is <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span> (sigma) and sample standard deviation is <em>s<\/em>.<\/p>\n<p>Population standard deviation \u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0Sample standard deviation<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-12\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170506\/823.png\" alt=\"823.png\" \/>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<img decoding=\"async\" class=\"frame-12\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170507\/816.png\" alt=\"816.png\" \/><\/p>\n<div class=\"textbox examples\">\n<h3>Example 7<\/h3>\n<p class=\"Example\">Compute the standard deviation of the sample data: 3, 5, 7 with a sample mean of 5.<\/p>\n<p class=\"ExampleCenter\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-19 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170508\/809.png\" alt=\"809.png\" \/><\/span><\/p>\n<\/div>\n<h3>Standard Error of the Means<\/h3>\n<p>Commonly, we use the sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span> to estimate the population mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>\u03bc<\/em><\/span>. For example, if we want to estimate the heights of eighty-year-old cherry trees, we can proceed as follows:<\/p>\n<ul>\n<li class=\"List-Paragraph\">Randomly select 100 trees<\/li>\n<li class=\"List-Paragraph\">Compute the sample mean of the 100 heights<\/li>\n<li class=\"List-Paragraph\">Use that as our estimate<\/li>\n<\/ul>\n<p>We want to use this sample mean to estimate the true but unknown population mean. But our sample of 100 trees is just one of many possible samples (of the same size) that could have been randomly selected. Imagine if we take a series of different random samples from the same population and all the same size:<\/p>\n<ul>\n<li class=\"List-Paragraph\">Sample 1\u2014we compute sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span><\/li>\n<li class=\"List-Paragraph\">Sample 2\u2014we compute sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span><\/li>\n<li class=\"List-Paragraph\">Sample 3\u2014we compute sample mean <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span><\/li>\n<li class=\"List-Paragraph\">Etc.<\/li>\n<\/ul>\n<p>Each time we sample, we may get a different result as we are using a different subset of data to compute the sample mean. This shows us that the sample mean is a random variable!<\/p>\n<p>The sample mean (<span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>) is a random variable with its own probability distribution called the sampling distribution of the sample mean. The distribution of the sample mean will have a mean equal to \u00b5 and a standard deviation equal to <span class=\"Inline-Equation\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-42\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170509\/761.png\" alt=\"761.png\" width=\"27\" height=\"25\" \/><\/span>.<\/p>\n<p class=\"Callout\"><span class=\"pullquote-left\">The standard error <span class=\"Inline-Equation\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-42\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170510\/750.png\" alt=\"750.png\" width=\"28\" height=\"25\" \/><\/span> is the standard deviation of all possible sample means.<\/span><\/p>\n<p>In reality, we would only take one sample, but we need to understand and quantify the sample to sample variability that occurs in the sampling process.<\/p>\n<p>The standard error is the standard deviation of the sample means and can be expressed in different ways.<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-45 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170510\/Image35864_fmt.png\" alt=\"Image35864.PNG\" \/><\/p>\n<p>Note: <em>s<\/em><span class=\"Superscript SmallText\">2<\/span> is the sample variance and <em>s<\/em> is the sample standard deviation<\/p>\n<div class=\"textbox examples\">\n<h3>Example 8<\/h3>\n<p class=\"Example\">Describe the distribution of the sample mean.<\/p>\n<p class=\"Example\">A population of fish has weights that are normally distributed with \u00b5 = 8 lb. and s = 2.6 lb. If you take a sample of size n=6, the sample mean will have a normal distribution with a mean of 8 and a standard deviation (standard error) of <span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-18\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170511\/728.png\" alt=\"728.png\" \/><\/span>= 1.061 lb.<\/p>\n<p class=\"Example\">If you increase the sample size to 10, the sample mean will be normally distributed with a mean of 8 lb. and a standard deviation (standard error) of <span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-18\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170512\/721.png\" alt=\"721.png\" \/><\/span> = 0.822 lb.<\/p>\n<p class=\"Example\">Notice how the standard error decreases as the sample size increases.<\/p>\n<\/div>\n<p>The Central Limit Theorem (CLT) states that the sampling distribution of the sample means will approach a normal distribution as the sample size increases. If we do not have a normal distribution, or know nothing about our distribution of our random variable, the CLT tells us that the distribution of the <span class=\"Symbols\" xml:lang=\"ar-SA\"><em>x\u0304<\/em><\/span>\u2019s will become normal as <em>n<\/em> increases. How large does <em>n<\/em> have to be? A general rule of thumb tells us that <em>n<\/em> \u2265 30.<\/p>\n<p class=\"Callout\"><span class=\"pullquote-left\">The Central Limit Theorem tells us that regardless of the shape of our population, the sampling distribution of the sample mean will be normal as the sample size increases.<\/span><\/p>\n<h3>Coefficient of Variation<\/h3>\n<p>To compare standard deviations between different populations or samples is difficult because the standard deviation depends on units of measure. The coefficient of variation expresses the standard deviation as a percentage of the sample or population mean. It is a unitless measure.<\/p>\n<p>Population data\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Sample data<\/p>\n<p class=\"Centered\">CV = <img decoding=\"async\" class=\"frame-23\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170512\/703.png\" alt=\"703.png\" \/>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CV = <img decoding=\"async\" class=\"frame-23\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170513\/694.png\" alt=\"694.png\" \/><\/p>\n<div class=\"textbox examples\">\n<h3>Example 9<\/h3>\n<p class=\"Example\">Fisheries biologists were studying the length and weight of Pacific salmon. They took a random sample and computed the mean and standard deviation for length and weight (given below). While the standard deviations are similar, the differences in units between lengths and weights make it difficult to compare the variability. Computing the coefficient of variation for each variable allows the biologists to determine which variable has the greater standard deviation.<\/p>\n<table id=\"table-2\" class=\"Table\" style=\"margin-left: 23px\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><\/td>\n<td>Sample mean<\/td>\n<td>Sample standard deviation<\/td>\n<\/tr>\n<tr>\n<td>Length<\/td>\n<td>63 cm<\/td>\n<td>19.97 cm<\/td>\n<\/tr>\n<tr>\n<td>Weight<\/td>\n<td>37.6 kg<\/td>\n<td>19.39 kg<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><span class=\"Superscript SmallText\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-24\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170515\/685.png\" alt=\"685.png\" width=\"207\" height=\"49\" \/>\u00a0<\/span><\/td>\n<td><span class=\"Superscript SmallText\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-25\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170516\/678.png\" alt=\"678.png\" width=\"211\" height=\"49\" \/>\u00a0<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p class=\"Example\">There is greater variability in Pacific salmon weight compared to length.<\/p>\n<\/div>\n<h3>Variability<\/h3>\n<p>Variability is described in many different ways. Standard deviation measures point to point variability <span class=\"Red Strong-2\">within a sample<\/span>, i.e., variation among individual sampling units. Coefficient of variation also measures point to point variability but on a relative basis (relative to the mean), and is not influenced by measurement units. Standard error measures the <span class=\"Red Strong-2\">sample to sample variability<\/span>, i.e. variation among repeated samples in the sampling process. Typically, we only have one sample and standard error allows us to quantify the uncertainty in our sampling process.<\/p>\n<h3>Basic Statistics Example using Excel and Minitab Software<\/h3>\n<p>Consider the following tally from 11 sample plots on Heiburg Forest, where X<span class=\"Subscript SmallText\">i<\/span> is the number of downed logs per acre. Compute basic statistics for the sample plots.<\/p>\n<div style=\"width: 842px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170519\/661.png\" alt=\"661.png\" width=\"832\" height=\"851\" \/><\/p>\n<p class=\"wp-caption-text\">Table 1. Sample data on number of downed logs per acre from Heiburg Forest.<\/p>\n<\/div>\n<p>(1) Sample mean:\u00a0<span class=\"Inline-Equation\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-26\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170520\/654.png\" alt=\"654.png\" width=\"192\" height=\"80\" \/><\/span><\/p>\n<p>(2) Median = 35<\/p>\n<p>(3) Variance:<\/p>\n<p><span class=\"Picture\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-27 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170522\/644.png\" alt=\"644.png\" width=\"436\" height=\"200\" \/><\/span><\/p>\n<p>(4) Standard deviation: \u00a0<span class=\"Inline-Equation\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-28\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170524\/634.png\" alt=\"634.png\" width=\"258\" height=\"35\" \/><\/span><\/p>\n<p>(5) Range: 55 \u2013 5 = 50<\/p>\n<p>(6) Coefficient of variation:<\/p>\n<p class=\"Equation\"><span class=\"Inline-Equation\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170526\/625.png\" alt=\"625.png\" width=\"325\" height=\"53\" \/><\/span><\/p>\n<p>(7) Standard error of the mean:<\/p>\n<p class=\"Side-by-Side-Equations\"><span class=\"Picture\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-10 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170528\/618.png\" alt=\"618.png\" width=\"262\" height=\"119\" \/><\/span><\/p>\n<h2>Software Solutions<\/h2>\n<h3>Minitab<\/h3>\n<p>Open Minitab and enter data in the spreadsheet. Select STAT&gt;Descriptive stats and check all statistics required.<\/p>\n<p class=\"No-Caption\"><span class=\"Equation-Left\"><img decoding=\"async\" class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170530\/008_1_fmt.png\" alt=\"008_1.tif\" \/><\/span><span class=\"Equation-Right\"><img decoding=\"async\" class=\"frame-1 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170534\/008_2_fmt.png\" alt=\"008_2.tif\" \/><\/span><\/p>\n<h4>Descriptive Statistics: Data<\/h4>\n<table class=\"Table\" style=\"font-size: 0.5em\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Variable<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">N<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">N*<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Mean<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">SE Mean<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">StDev<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Variance<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">CoefVar<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Minimum<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Q1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Data<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">11<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">32.27<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4.83<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">16.03<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">256.82<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">49.66<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5.00<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">20.00<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table id=\"table-4\" class=\"Table\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr style=\"height: 43.2188px\">\n<td class=\"Table\" style=\"height: 43.2188px\">\n<p class=\"Table\">Variable<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43.2188px\">\n<p class=\"Table\">Median<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43.2188px\">\n<p class=\"Table\">Q3<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43.2188px\">\n<p class=\"Table\">Maximum<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43.2188px\">\n<p class=\"Table\">IQR<\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 43px\">\n<td class=\"Table\" style=\"height: 43px\">\n<p class=\"Table\">Data<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43px\">\n<p class=\"Table\">35.00<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43px\">\n<p class=\"Table\">45.00<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43px\">\n<p class=\"Table\">55.00<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43px\">\n<p class=\"Table\">25.00<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Excel<\/h3>\n<p>Open up Excel and enter the data in the first column of the spreadsheet. Select DATA&gt;Data Analysis&gt;Descriptive Statistics. For the Input Range, select data in column A. Check \u201cLabels in First Row\u201d and \u201cSummary Statistics\u201d. Also check \u201cOutput Range\u201d and select location for output.<\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-50 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170537\/009_2_fmt.png\" alt=\"009_2.tif\" \/><\/span><\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-50 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170540\/009_1_fmt.png\" alt=\"009_1.tif\" \/><\/span><\/p>\n<table id=\"table-5\" class=\"Table\">\n<colgroup>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table-Heading\" colspan=\"2\">\n<p class=\"Table-Heading\" style=\"text-align: center\">Data<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Mean<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">32.27273<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Standard Error<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4.831884<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Median<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">35<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Mode<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">25<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Standard Deviation<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">16.02555<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Sample Variance<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">256.8182<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Kurtosis<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-0.73643<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Skewness<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-0.05982<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Range<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">50<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Minimum<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Maximum<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">55<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Sum<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">355<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Count<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">11<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Graphical Representation<\/h2>\n<p>Data organization and summarization can be done graphically, as well as numerically. Tables and graphs allow for a quick overview of the information collected and support the presentation of the data used in the project. While there are a multitude of available graphics, this chapter will focus on a specific few commonly used tools.<\/p>\n<h3>Pie Charts<\/h3>\n<p>Pie charts are a good visual tool allowing the reader to quickly see the relationship between categories. It is important to clearly label each category, and adding the frequency or relative frequency is often helpful. However, too many categories can be confusing. Be careful of putting too much information in a pie chart. The first pie chart gives a clear idea of the representation of fish types relative to the whole sample. The second pie chart is more difficult to interpret, with too many categories. It is important to select the best graphic when presenting the information to the reader.<\/p>\n<div style=\"width: 1013px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170544\/542.png\" alt=\"542.png\" width=\"1003\" height=\"371\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 4. Comparison of pie charts.<\/p>\n<\/div>\n<h3>Bar Charts and Histograms<\/h3>\n<p>Bar charts graphically describe the distribution of a qualitative variable (fish type) while histograms describe the distribution of a quantitative variable discrete or continuous variables (bear weight).<\/p>\n<div style=\"width: 1062px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170547\/534.png\" alt=\"534.png\" width=\"1052\" height=\"389\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 5. Comparison of a bar chart for qualitative data and a histogram for quantitative data.<\/p>\n<\/div>\n<p>In both cases, the bars\u2019 equal width and the y-axis are clearly defined. With qualitative data, each category is represented by a specific bar. With continuous data, lower and upper class limits must be defined with equal class widths. There should be no gaps between classes and each observation should fall into one, and only one, class.<\/p>\n<h3>Boxplots<\/h3>\n<p>Boxplots use the 5-number summary (minimum and maximum values with the three quartiles) to illustrate the center, spread, and distribution of your data. When paired with histograms, they give an excellent description, both numerically and graphically, of the data.<\/p>\n<p>With symmetric data, the distribution is bell-shaped and somewhat symmetric. In the boxplot, we see that Q1 and Q3 are approximately equidistant from the median, as are the minimum and maximum values. Also, both whiskers (lines extending from the boxes) are approximately equal in length.<\/p>\n<p class=\"Caption\"><span class=\"Equation-Right\"><img decoding=\"async\" class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170550\/012_2_fmt.png\" alt=\"012_2.tif\" \/><\/span><\/p>\n<div style=\"width: 412px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-29\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170552\/012_1_fmt.png\" alt=\"012_1.tif\" width=\"402\" height=\"344\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 6. A histogram and boxplot of a normal distribution.<\/p>\n<\/div>\n<p>With skewed left distributions, we see that the histogram looks \u201cpulled\u201d to the left. In the boxplot, Q1 is farther away from the median as are the minimum values, and the left whisker is longer than the right whisker.<\/p>\n<div style=\"width: 433px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-19\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170554\/013_2_fmt.png\" alt=\"013_2.tif\" width=\"423\" height=\"350\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 7. A histogram and boxplot of a skewed left distribution.<\/p>\n<\/div>\n<p class=\"Caption\"><span class=\"Equation-Left\"><img decoding=\"async\" class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170556\/013_1_fmt.png\" alt=\"013_1.tif\" \/><\/span><\/p>\n<p>With skewed right distributions, we see that the histogram looks \u201cpulled\u201d to the right. In the boxplot, Q3 is farther away from the median, as is the maximum value, and the right whisker is longer than the left whisker.<\/p>\n<p class=\"Caption\"><span class=\"Equation-Right\"><img decoding=\"async\" class=\"frame-29 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170558\/014_2_fmt.png\" alt=\"014_2.tif\" \/><\/span><\/p>\n<div style=\"width: 423px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-52\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170600\/014_1_fmt.png\" alt=\"014_1.tif\" width=\"413\" height=\"366\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 8. A histogram and boxplot of a skewed right distribution.<\/p>\n<\/div>\n<h1>Section 2<\/h1>\n<h2>Probability Distribution<\/h2>\n<p>Once we have organized and summarized your sample data, the next step is to identify the underlying distribution of our random variable. Computing probabilities for continuous random variables are complicated by the fact that there are an infinite number of possible values that our random variable can take on, so the probability of observing a particular value for a random variable is zero. Therefore, to find the probabilities associated with a continuous random variable, we use a probability density function (PDF).<\/p>\n<p>A PDF is an equation used to find probabilities for continuous random variables. The PDF must satisfy the following two rules:<\/p>\n<ol>\n<li>The area under the curve must equal one (over all possible values of the random variable).<\/li>\n<li class=\"p1\">The probabilities must be equal to or greater than zero for all possible values of the random variable.<\/li>\n<\/ol>\n<p class=\"Callout\"><span class=\"pullquote-left\">The area under the curve of the probability density function over some interval represents the probability of observing those values of the random variable in that interval.<\/span><\/p>\n<h2>The Normal Distribution<\/h2>\n<p>Many continuous random variables have a bell-shaped or somewhat symmetric distribution. This is a normal distribution. In other words, the probability distribution of its relative frequency histogram follows a normal curve. The curve is bell-shaped, symmetric about the mean, and defined by \u00b5 and \u03c3 (the mean and standard deviation).<\/p>\n<div style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-27\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170603\/Kiernan_media015_fmt.png\" alt=\"Kiernan_media015.png\" width=\"550\" height=\"350\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 9. A normal distribution.<\/p>\n<\/div>\n<p>There are normal curves for every combination of \u00b5 and \u03c3. The mean (\u00b5) shifts the curve to the left or right. The standard deviation (\u03c3) alters the spread of the curve. The first pair of curves have different means but the same standard deviation. The second pair of curves share the same mean (\u00b5) but have different standard deviations. The pink curve has a smaller standard deviation. It is narrower and taller, and the probability is spread over a smaller range of values. The blue curve has a larger standard deviation. The curve is flatter and the tails are thicker. The probability is spread over a larger range of values.<\/p>\n<p class=\"Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-53 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170605\/Image36036_fmt.png\" alt=\"07_fig05a\" \/><\/span><\/p>\n<div style=\"width: 607px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-53\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170607\/Image36045_fmt.png\" alt=\"07_fig05b\" width=\"597\" height=\"371\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 10. A comparison of normal curves.<\/p>\n<\/div>\n<p>Properties of the normal curve:<\/p>\n<ul>\n<li class=\"List-Paragraph\">The mean is the center of this distribution and the highest point.<\/li>\n<li class=\"List-Paragraph\">The curve is symmetric about the mean. (The area to the left of the mean equals the area to the right of the mean.)<\/li>\n<li class=\"List-Paragraph\">The total area under the curve is equal to one.<\/li>\n<li class=\"List-Paragraph\">As <em>x<\/em> increases and decreases, the curve goes to zero but never touches.<\/li>\n<li class=\"List-Paragraph\">The PDF of a normal curve is <img loading=\"lazy\" decoding=\"async\" class=\"frame-12\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170609\/438.png\" alt=\"438.png\" width=\"120\" height=\"53\" \/>.<\/li>\n<li class=\"List-Paragraph\">A normal curve can be used to estimate probabilities.<\/li>\n<li class=\"List-Paragraph\">A normal curve can be used to estimate proportions of a population that have certain x-values.<\/li>\n<\/ul>\n<h2>The Standard Normal Distribution<\/h2>\n<p>There are millions of possible combinations of means and standard deviations for continuous random variables. Finding probabilities associated with these variables would require us to integrate the PDF over the range of values we are interested in. To avoid this, we can rely on the standard normal distribution. The standard normal distribution is a special normal distribution with a \u00b5 = 0 and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 1. We can use the Z-score to standardize any normal random variable, converting the x-values to Z-scores, thus allowing us to use probabilities from the standard normal table. So how do we find area under the curve associated with a Z-score?<\/p>\n<h4>Standard Normal Table<\/h4>\n<ul>\n<li class=\"List-Paragraph-Bullet-level-2\">The standard normal table gives probabilities associated with specific Z-scores.<\/li>\n<li class=\"List-Paragraph-Bullet-level-2\">The table we use is cumulative from the left.<\/li>\n<li class=\"List-Paragraph-Bullet-level-2\">The negative side is for all Z-scores less than zero (all values less than the mean).<\/li>\n<li class=\"List-Paragraph-Bullet-level-2\">The positive side is for all Z-scores greater than zero (all values greater than the mean).<\/li>\n<li class=\"List-Paragraph-Bullet-level-2\">Not all standard normal tables work the same way.<\/li>\n<\/ul>\n<div class=\"textbox examples\">\n<h3>Example 10<\/h3>\n<p class=\"ExampleHeading\">What is the area associated with the Z-score 1.62?<\/p>\n<div style=\"width: 993px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170612\/429.png\" alt=\"429.png\" width=\"983\" height=\"625\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 11. The standard normal table and associated area for z = 1.62.<\/p>\n<\/div>\n<\/div>\n<h4>Reading the Standard Normal Table<\/h4>\n<ul>\n<li class=\"List-Paragraph\">Read down the Z-column to get the first part of the Z-score (1.6).<\/li>\n<li class=\"List-Paragraph\">Read across the top row to get the second decimal place in the Z-score (0.02).<\/li>\n<li class=\"List-Paragraph\">The intersection of this row and column gives the area under the curve to the left of the Z-score.<\/li>\n<\/ul>\n<h3>Finding Z-scores for a Given Area<\/h3>\n<ul>\n<li class=\"List-Paragraph\">What if we have an area and we want to find the Z-score associated with that area?<\/li>\n<li class=\"List-Paragraph\">Instead of Z-score \u2192 area, we want area \u2192 Z-score.<\/li>\n<li class=\"List-Paragraph\">We can use the standard normal table to find the area in the body of values and read backwards to find the associated Z-score.<\/li>\n<li class=\"List-Paragraph\">Using the table, search the probabilities to find an area that is closest to the probability you are interested in.<\/li>\n<\/ul>\n<div class=\"textbox examples\">\n<h3>Example 11<\/h3>\n<p class=\"Example\">To find a Z-score for which the area to the right is 5%:<\/p>\n<p class=\"Example\">Since the table is cumulative from the left, you must use the complement of 5%.<\/p>\n<p class=\"ExampleCenter\" style=\"text-align: center\">1.000 \u2013 0.05 = 0.9500<\/p>\n<div style=\"width: 426px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-104\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170614\/Image36062_fmt.png\" alt=\"Image36062.PNG\" width=\"416\" height=\"248\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 12. The upper 5% of the area under a normal curve.<\/p>\n<\/div>\n<ul>\n<li class=\"ExampleList\">Find the Z-score for the area of 0.9500.<\/li>\n<li class=\"ExampleList\">Look at the probabilities and find a value as close to 0.9500 as possible.<\/li>\n<\/ul>\n<div style=\"width: 588px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170616\/Image36070_fmt.png\" alt=\"Image36070.PNG\" width=\"578\" height=\"197\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 13. The standard normal table.<\/p>\n<\/div>\n<p class=\"ExampleCenter\">The Z-score for the 95<span class=\"Superscript SmallText\">th<\/span> percentile is 1.64.<\/p>\n<\/div>\n<h3>Area in between Two Z-scores<\/h3>\n<div class=\"textbox examples\">\n<h3>Example 12<\/h3>\n<p class=\"Example\">To find Z-scores that limit the middle 95%:<\/p>\n<ul>\n<li class=\"ExampleList\">The middle 95% has 2.5% on the right and 2.5% on the left.<\/li>\n<li class=\"ExampleList\">Use the symmetry of the curve.<\/li>\n<\/ul>\n<div style=\"width: 470px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-104\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170618\/Image36080_fmt.png\" alt=\"Image36080.PNG\" width=\"460\" height=\"242\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 14. The middle 95% of the area under a normal curve.<\/p>\n<\/div>\n<ul>\n<li class=\"ExampleList\">Look at your standard normal table. Since the table is cumulative from the left, it is easier to find the area to the left first.<\/li>\n<li class=\"ExampleList\">Find the area of 0.025 on the negative side of the table.<\/li>\n<li class=\"ExampleList\">The Z-score for the area to the left is -1.96.<\/li>\n<li class=\"ExampleList\">Since the curve is symmetric, the Z-score for the area to the right is 1.96.<\/li>\n<\/ul>\n<\/div>\n<h3 class=\"Example\">Common Z-scores<\/h3>\n<p>There are many commonly used Z-scores:<\/p>\n<ul>\n<li class=\"List-Paragraph\">Z<span class=\"Subscript SmallText\">.05<\/span> = 1.645 and the area between -1.645 and 1.645 is 90%<\/li>\n<li class=\"List-Paragraph\">Z<span class=\"Subscript SmallText\">.025<\/span> = 1.96 and the area between -1.96 and 1.96 is 95%<\/li>\n<li class=\"List-Paragraph\">Z<span class=\"Subscript SmallText\">.005<\/span> = 2.575 and the area between -2.575 and 2.575 is 99%<\/li>\n<\/ul>\n<h2>Applications of the Normal Distribution<\/h2>\n<p>Typically, our normally distributed data do not have \u03bc = 0 and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 1, but we can relate any normal distribution to the standard normal distributions using the Z-score. We can transform values of x to values of z.<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-17 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170620\/393.png\" alt=\"393.png\" \/><\/p>\n<p>For example, if a normally distributed random variable has a \u03bc = 6 and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 2, then a value of x = 7 corresponds to a Z-score of 0.5.<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-6 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170621\/386.png\" alt=\"386.png\" \/><\/p>\n<p>This tells you that 7 is one-half a standard deviation above its mean. We can use this relationship to find probabilities for any normal random variable.<\/p>\n<div style=\"width: 912px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170623\/Image36090_fmt.png\" alt=\"07_fig33\" width=\"902\" height=\"325\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 15. A normal and standard normal curve.<\/p>\n<\/div>\n<p>To find the area for values of X, a normal random variable, draw a picture of the area of interest, convert the x-values to Z-scores using the Z-score and then use the standard normal table to find areas to the left, to the right, or in between.<\/p>\n<p class=\"Centered\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-17 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170625\/369.png\" alt=\"369.png\" \/><\/span><\/p>\n<div class=\"textbox examples\">\n<h3>Example 13<\/h3>\n<p class=\"Example\">Adult deer population weights are normally distributed with \u00b5 = 110 lb. and <strong class=\"SymbolsBold\" xml:lang=\"ar-SA\">\u03c3<\/strong> = 29.7 lb. As a biologist you determine that a weight less than 82 lb. is unhealthy and you want to know what proportion of your population is unhealthy.<\/p>\n<p class=\"ExampleCenter\">P(x&lt;82)<\/p>\n<div style=\"width: 463px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-59\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170627\/Image36098_fmt.png\" alt=\"Image36098.PNG\" width=\"453\" height=\"241\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 16. The area under a normal curve for P(x&lt;82).<\/p>\n<\/div>\n<p class=\"ExampleCenter\">Convert 82 to a Z-score <span class=\"Inline-Equation-Large\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-58\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170629\/352.png\" alt=\"352.png\" width=\"166\" height=\"51\" \/><\/span><\/p>\n<p class=\"Example\">The <em>x<\/em> value of 82 is 0.94 standard deviations below the mean.<\/p>\n<div style=\"width: 477px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-59\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170630\/Image36106_fmt.png\" alt=\"Image36106.PNG\" width=\"467\" height=\"241\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 17. Area under a standard normal curve for P(z&lt;-0.94).<\/p>\n<\/div>\n<p class=\"Example\">Go to the standard normal table (negative side) and find the area associated with a Z-score of -0.94.<\/p>\n<p class=\"Example\">This is an \u201carea to the left\u201d problem so you can read directly from the table to get the probability.<\/p>\n<p class=\"ExampleCenter\">P(x&lt;82) = 0.1736<\/p>\n<p class=\"Example\">Approximately 17.36% of the population of adult deer is underweight, OR one deer chosen at random will have a 17.36% chance of weighing less than 82 lb.<\/p>\n<\/div>\n<div class=\"textbox examples\">\n<h3>Example 14<\/h3>\n<p class=\"Example\">Statistics from the Midwest Regional Climate Center indicate that Jones City, which has a large wildlife refuge, gets an average of 36.7 in. of rain each year with a standard deviation of 5.1 in. The amount of rain is normally distributed. During what percent of the years does Jones City get more than 40 in. of rain?<\/p>\n<p class=\"ExampleCenter\">P(x &gt; 40)<\/p>\n<div style=\"width: 406px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-59\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170632\/Image36118_fmt.png\" alt=\"Image36118.PNG\" width=\"396\" height=\"243\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 18. Area under a normal curve for P(x&gt;40).<\/p>\n<\/div>\n<p class=\"ExampleCenter\"><span class=\"Inline-Equation-Large\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-58\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170634\/325.png\" alt=\"325.png\" width=\"148\" height=\"47\" \/><\/span>\u00a0\u00a0\u00a0\u00a0 P(x&gt;40) = (1-0.7422) = 0.2578<\/p>\n<p class=\"Example\">For approximately 25.78% of the years, Jones City will get more than 40 in. of rain.<\/p>\n<\/div>\n<h2 class=\"Example\">Assessing Normality<\/h2>\n<p>If the distribution is unknown and the sample size is not greater than 30 (Central Limit Theorem), we have to assess the assumption of normality. Our primary method is the normal probability plot. This plot graphs the observed data, ranked in ascending order, against the \u201cexpected\u201d Z-score of that rank. If the sample data were taken from a normally distributed random variable, then the plot would be approximately linear.<\/p>\n<p>Examine the following probability plot. The center line is the relationship we would expect to see if the data were drawn from a perfectly normal distribution. Notice how the observed data (red dots) loosely follow this linear relationship. Minitab also computes an Anderson-Darling test to assess normality. The null hypothesis for this test is that the sample data have been drawn from a normally distributed population. A p-value greater than 0.05 supports the assumption of normality.<\/p>\n<div style=\"width: 759px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-40\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170636\/314.png\" alt=\"314.png\" width=\"749\" height=\"497\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 19. A normal probability plot generated using Minitab 16.<\/p>\n<\/div>\n<p>Compare the histogram and the normal probability plot in this next example. The histogram indicates a skewed right distribution.<\/p>\n<div style=\"width: 850px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11170638\/304.png\" alt=\"304.png\" width=\"840\" height=\"290\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 20. Histogram and normal probability plot for skewed right data.<\/p>\n<\/div>\n<p>The observed data do not follow a linear pattern and the p-value for the A-D test is less than 0.005 indicating a non-normal population distribution.<\/p>\n<p>Normality cannot be assumed. You must always verify this assumption. Remember, the probabilities we are finding come from the standard NORMAL table. If our data are NOT normally distributed, then these probabilities DO NOT APPLY.<\/p>\n<ul>\n<li class=\"List-Paragraph\">Do you know if the population is normally distributed?<\/li>\n<li class=\"List-Paragraph\">Do you have a large enough sample size (n\u226530)? Remember the Central Limit Theorem?<\/li>\n<li class=\"List-Paragraph\">Did you construct a normal probability plot?<\/li>\n<\/ul>\n<\/div>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-85\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Natural Resources Biometrics. <strong>Authored by<\/strong>: Diane Kiernan. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/textbooks.opensuny.org\/natural-resources-biometrics\/\">https:\/\/textbooks.opensuny.org\/natural-resources-biometrics\/<\/a>. <strong>Project<\/strong>: Open SUNY Textbooks. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\">CC BY-NC-SA: Attribution-NonCommercial-ShareAlike<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":622,"menu_order":1,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Natural Resources Biometrics\",\"author\":\"Diane Kiernan\",\"organization\":\"\",\"url\":\"https:\/\/textbooks.opensuny.org\/natural-resources-biometrics\/\",\"project\":\"Open SUNY Textbooks\",\"license\":\"cc-by-nc-sa\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-85","chapter","type-chapter","status-publish","hentry"],"part":21,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/85","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/users\/622"}],"version-history":[{"count":1,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/85\/revisions"}],"predecessor-version":[{"id":1245,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/85\/revisions\/1245"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/parts\/21"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/85\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/media?parent=85"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapter-type?post=85"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/contributor?post=85"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/license?post=85"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}