{"id":5526,"date":"2022-09-21T10:59:01","date_gmt":"2022-09-21T10:59:01","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/?post_type=chapter&#038;p=5526"},"modified":"2022-10-11T19:44:48","modified_gmt":"2022-10-11T19:44:48","slug":"16b-inclass","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/16b-inclass\/","title":{"raw":"16B InClass","rendered":"16B InClass"},"content":{"raw":"<div id=\"bp-page-1\" class=\"page\" data-page-number=\"1\" data-loaded=\"true\">\r\n<div class=\"textLayer\">Earlier in this course, you learned how to conduct a one-way ANOVA for scenarios that involve comparing more than two groups. In this in-class activity, you\u2019ll extend what you learned about ANOVA to a regression context. Using these new tools, you will consider the relationship between neighborhood income and organic food access.<\/div>\r\n<div><img class=\"\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004648\/Picture89-300x200.jpg\" alt=\"An assortment of fresh produce and other foods\" width=\"485\" height=\"323\" \/><\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 1<\/h3>\r\n1) What was the purpose of the ANOVA table when comparing more than two groups?\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">Sum of Squares Total<\/div>\r\n<div class=\"textLayer\">[latex]\\sum(y-\\bar{y})^{2}[\/latex]<\/div>\r\n<div class=\"textLayer\"><img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004652\/Picture902-300x240.png\" alt=\"A scatterplot with points that generally have higher y-values when they also have higher x-values. There is a horizontal line at approximately y = 7.5.\" \/><\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">Sum of Squares Regression<\/div>\r\n<div class=\"textLayer\">[latex]\\sum(\\hat{y}-\\bar{y})^{2}[\/latex]<\/div>\r\n<div class=\"textLayer\"><img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004657\/Picture912-300x240.png\" alt=\"A scatterplot with points that generally have higher y-values when they also have higher x-values. There is a horizontal line at approximately y = 7.5. There is also a sloped line extending from approximately (0, 0) to approximately (12, 15).\" \/><\/div>\r\n<div class=\"textLayer\">Sum of Squares Residuals<\/div>\r\n<div class=\"textLayer\">[latex]\\sum(y-\\hat{y})^{2}[\/latex]<\/div>\r\n<div class=\"textLayer\"><img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004700\/Picture922.png\" alt=\"A scatterplot with points that generally have higher y-values when they also have higher x-values. There is a sloped line extending from approximately (0, 0) to approximately (12, 15).\" \/><\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">In a regression context, the ANOVA table includes three different sums of squares: SSTotal, SSRegression, and SSResiduals.<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 2<\/h3>\r\n2) What does SSTotal measure? Mark the previous scatterplotsto show the deviations between the actual values of the response variable (\ud835\udc66)and the mean response (\ud835\udc66\u0305).\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 3<\/h3>\r\n3) What does SSRegression measure? Mark the previous scatterplot to show the deviations between the predicted values of the response variable (\ud835\udc66\u0302)and the mean response (\ud835\udc66\u0305).\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 4<\/h3>\r\n4) What does SSResiduals measure? Mark the previous scatterplot to show the deviations between the actual values of the response variable (\ud835\udc66)and the predicted values of the response variable (\ud835\udc66\u0302).\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">To better understand what each sum of squares measures, let\u2019s imagine extreme scenarios where the sums of squares are equal to 0.<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 5<\/h3>\r\n<div class=\"textLayer\">5) Fill in the following table by sketching three scatterplots that satisfy the given criteria.<\/div>\r\n<div class=\"textLayer\">SSTotal = 0SSRegression = 0(but SSTotal \u22600)SSResiduals = 0(but SSTotal \u22600)<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td>SSTotal = 0<\/td>\r\n<td>SSRegression = 0\r\n\r\n(but SSTotal \u00a00)<\/td>\r\n<td>SSResiduals = 0\r\n\r\n(but SSTotal \u00a00)<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nAn ANOVA is a way to \u201cpartition\u201d the variation in the data. In other words, it divides the total variation into two parts: the part that is explained by the regression model (SSRegression) and the part that remains unexplained (SSResiduals).<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">\ud835\udc46\ud835\udc46\ud835\udc47\ud835\udc5c\ud835\udc61\ud835\udc4e\ud835\udc59=\ud835\udc46\ud835\udc46\ud835\udc45\ud835\udc52\ud835\udc54\ud835\udc5f\ud835\udc52\ud835\udc60\ud835\udc60\ud835\udc56\ud835\udc5c\ud835\udc5b+\ud835\udc46\ud835\udc46\ud835\udc45\ud835\udc52\ud835\udc60\ud835\udc56\ud835\udc51\ud835\udc62\ud835\udc4e\ud835\udc59\ud835\udc60<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">In In-Class Activity6.C, you learned that the coefficient of determination, \ud835\udc45<sup>2<\/sup>, is interpreted as the percentage of variation in the response variable that can be explained by the linear relationship with an explanatory variable. This quantity can be expressed using the sums of squares. Note that \ud835\udc452can be expressed as a percentage or as a proportion.<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">[latex]\ud835\udc45^{2}=\\frac{variation\\;explained}{total\\;variation}=\\frac{SSRegression}{SSTotal}=1\u2212\\frac{SSResiduals}{SSTotal}[\/latex]<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 6<\/h3>\r\n<div class=\"textLayer\">6) Revisit the extreme examples that you sketched in Question 5.<\/div>\r\n<div class=\"textLayer\">Part A: When SSRegression = 0, \ud835\udc452= _____.<\/div>\r\n<div class=\"textLayer\">Part B:When SSResiduals = 0, \ud835\udc452= _____.<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"bp-page-3\" class=\"page\" data-page-number=\"3\" data-loaded=\"true\">\r\n<div class=\"ba-Layer ba-Layer--highlight\" data-resin-fileid=\"910629731994\" data-resin-iscurrent=\"true\" data-resin-feature=\"annotations\" data-testid=\"ba-Layer--highlight\"><\/div>\r\n<\/div>\r\n<div id=\"bp-page-4\" class=\"page\" data-page-number=\"4\" data-loaded=\"true\">\r\n<div class=\"textLayer\">Sums of squares can be organized in an ANOVA table. The following table provides the information necessary to calculate an F-statistic in the context of regression. Note that \ud835\udc5b=sample size and \ud835\udc5d=number of predictors. In simple linear regression, \ud835\udc5d=1.<\/div>\r\n<\/div>\r\n<div>\r\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 20%;\"><strong>Source<\/strong><\/td>\r\n<td style=\"width: 20%;\"><strong>Df<\/strong><\/td>\r\n<td style=\"width: 20%;\"><strong>Sum sq<\/strong><\/td>\r\n<td style=\"width: 20%;\"><strong>Mean sq<\/strong><\/td>\r\n<td style=\"width: 20%;\"><strong>F value<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 20%;\">Regression<\/td>\r\n<td style=\"width: 20%;\">p<\/td>\r\n<td style=\"width: 20%;\">SSRegression<\/td>\r\n<td style=\"width: 20%;\">[latex]MSRegression =\\frac{SSRegression}{p}[\/latex]<\/td>\r\n<td style=\"width: 20%;\">[latex]\\frac{MSRegression}{MSResiduals}[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 20%;\">Residuals<\/td>\r\n<td style=\"width: 20%;\">n-1-p<\/td>\r\n<td style=\"width: 20%;\">SSResiduals<\/td>\r\n<td style=\"width: 20%;\">[latex]MSResiduals =\\frac{SSResiduals}{n-1-p}[\/latex]<\/td>\r\n<td style=\"width: 20%;\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 20%;\"><strong>Total<\/strong><\/td>\r\n<td style=\"width: 20%;\"><strong>n-1I<\/strong><\/td>\r\n<td style=\"width: 20%;\"><strong>SSTotal<\/strong><\/td>\r\n<td style=\"width: 20%;\"><\/td>\r\n<td style=\"width: 20%;\"><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div id=\"bp-page-4\" class=\"page\" data-page-number=\"4\" data-loaded=\"true\">\r\n<div class=\"textLayer\">A statistics student from San Antonio, Texas completed a project to explore whether there is a relationship between neighborhood income and access to organic items at local grocery stores. Specifically, the student counted the number of organic vegetables offered at 37 H.E.B. grocery stores. She then cross-referenced these data with the average household incomes(in dollars) for the zip codes where the stores are located.1<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 7<\/h3>\r\n7) Enter the grocery store data into the DCMP Linear Regression tool at https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/. Select \u201cAverage income in zip code\u201das the explanatory (\ud835\udc4b) variable and \u201cNumber of organic items offered\u201d as the response (\ud835\udc4c) variable. Under \u201cRegression Options,\u201dclick the box to show the ANOVA table. Use the information from the tool to fill out the following table.\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td>Source<\/td>\r\n<td>Df<\/td>\r\n<td>Sum sq<\/td>\r\n<td>Mean sq<\/td>\r\n<td>F value<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Regression<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Residuals<\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nThe ANOVA table will also include a P-value, which tells the probability of obtaining an F-statistic as large or larger than the one in the sample if the null hypothesis was true. An ANOVA F-test can be used to test the population slope for simple linear regression, the same scenario where you used a t-test in In-Class Activity16.A:1Scenario adapted from Skew the Script: https:\/\/skewthescript.org\/3-1\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">[latex]H_{0}:\\beta=0[\/latex] vs. [latex]H_{A}:\\beta\\neq0[\/latex], where [latex]\\beta[\/latex]=the population slope relating the number of organic items offered and the average income in zip codeTo model the values of the F-statistic that would occur if the null hypothesis was true and the assumptions for inference were met, you will use an F Distribution with \ud835\udc51\ud835\udc531=\ud835\udc5dand \ud835\udc51\ud835\udc532=\ud835\udc5b\u20131\u2013\ud835\udc5d.<\/div>\r\n<div><\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 8<\/h3>\r\n8) Use the DCMP F Distribution tool at https:\/\/dcmathpathways.shinyapps.io\/FDist\/to calculate a P-value that measures the evidence of an association between the number of organic items offered and the average household income. Include a sketch of the F Distribution.(You may assume that a linear model is appropriate and all assumptions for inference are met.)\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 9<\/h3>\r\n9) At the \ud835\udefc=0.05significance level, do these data provide sufficient evidence of an association between the number of organic items offered and the average household income in the neighborhood? State your conclusion in context. Suppose you had conducted a t-test for the slope instead of an F-test for the slope in this scenario. The value of the t-statistic would have been 7.50, the square root of the F-statistic. The P-value for the two-sided t-test would be the same as the P-value for the F-test.\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textLayer\">\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 10<\/h3>\r\n<div class=\"textLayer\">10) As the F-statistic gets larger, the P-value gets smaller, indicating stronger evidence of an association. Answer Parts A through C to understand whichfactors affect the strength of evidence.<\/div>\r\n<div class=\"textLayer\">Part A: As the slope of the regression line gets steeper, the evidence of an association gets ______ (stronger\/weaker).<\/div>\r\n<div class=\"textLayer\">Part B: As the spread of the points around the regression line increases, the evidence of an association gets ______ (stronger\/weaker).<\/div>\r\n<div class=\"textLayer\">Part C: Assuming that the slope and the spread of the points around the regression line stay about the same, as the sample size increases, the evidence of an association gets ______ (stronger\/weaker).<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div id=\"bp-page-1\" class=\"page\" data-page-number=\"1\" data-loaded=\"true\">\n<div class=\"textLayer\">Earlier in this course, you learned how to conduct a one-way ANOVA for scenarios that involve comparing more than two groups. In this in-class activity, you\u2019ll extend what you learned about ANOVA to a regression context. Using these new tools, you will consider the relationship between neighborhood income and organic food access.<\/div>\n<div><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004648\/Picture89-300x200.jpg\" alt=\"An assortment of fresh produce and other foods\" width=\"485\" height=\"323\" \/><\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 1<\/h3>\n<p>1) What was the purpose of the ANOVA table when comparing more than two groups?<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">Sum of Squares Total<\/div>\n<div class=\"textLayer\">[latex]\\sum(y-\\bar{y})^{2}[\/latex]<\/div>\n<div class=\"textLayer\"><img decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004652\/Picture902-300x240.png\" alt=\"A scatterplot with points that generally have higher y-values when they also have higher x-values. There is a horizontal line at approximately y = 7.5.\" \/><\/div>\n<div><\/div>\n<div class=\"textLayer\">Sum of Squares Regression<\/div>\n<div class=\"textLayer\">[latex]\\sum(\\hat{y}-\\bar{y})^{2}[\/latex]<\/div>\n<div class=\"textLayer\"><img decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004657\/Picture912-300x240.png\" alt=\"A scatterplot with points that generally have higher y-values when they also have higher x-values. There is a horizontal line at approximately y = 7.5. There is also a sloped line extending from approximately (0, 0) to approximately (12, 15).\" \/><\/div>\n<div class=\"textLayer\">Sum of Squares Residuals<\/div>\n<div class=\"textLayer\">[latex]\\sum(y-\\hat{y})^{2}[\/latex]<\/div>\n<div class=\"textLayer\"><img decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/27004700\/Picture922.png\" alt=\"A scatterplot with points that generally have higher y-values when they also have higher x-values. There is a sloped line extending from approximately (0, 0) to approximately (12, 15).\" \/><\/div>\n<div><\/div>\n<div class=\"textLayer\">In a regression context, the ANOVA table includes three different sums of squares: SSTotal, SSRegression, and SSResiduals.<\/div>\n<div><\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 2<\/h3>\n<p>2) What does SSTotal measure? Mark the previous scatterplotsto show the deviations between the actual values of the response variable (\ud835\udc66)and the mean response (\ud835\udc66\u0305).<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 3<\/h3>\n<p>3) What does SSRegression measure? Mark the previous scatterplot to show the deviations between the predicted values of the response variable (\ud835\udc66\u0302)and the mean response (\ud835\udc66\u0305).<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 4<\/h3>\n<p>4) What does SSResiduals measure? Mark the previous scatterplot to show the deviations between the actual values of the response variable (\ud835\udc66)and the predicted values of the response variable (\ud835\udc66\u0302).<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">To better understand what each sum of squares measures, let\u2019s imagine extreme scenarios where the sums of squares are equal to 0.<\/div>\n<div><\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 5<\/h3>\n<div class=\"textLayer\">5) Fill in the following table by sketching three scatterplots that satisfy the given criteria.<\/div>\n<div class=\"textLayer\">SSTotal = 0SSRegression = 0(but SSTotal \u22600)SSResiduals = 0(but SSTotal \u22600)<\/div>\n<\/div>\n<\/div>\n<div class=\"textLayer\">\n<table>\n<tbody>\n<tr>\n<td>SSTotal = 0<\/td>\n<td>SSRegression = 0<\/p>\n<p>(but SSTotal \u00a00)<\/td>\n<td>SSResiduals = 0<\/p>\n<p>(but SSTotal \u00a00)<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>An ANOVA is a way to \u201cpartition\u201d the variation in the data. In other words, it divides the total variation into two parts: the part that is explained by the regression model (SSRegression) and the part that remains unexplained (SSResiduals).<\/p><\/div>\n<div><\/div>\n<div class=\"textLayer\">\ud835\udc46\ud835\udc46\ud835\udc47\ud835\udc5c\ud835\udc61\ud835\udc4e\ud835\udc59=\ud835\udc46\ud835\udc46\ud835\udc45\ud835\udc52\ud835\udc54\ud835\udc5f\ud835\udc52\ud835\udc60\ud835\udc60\ud835\udc56\ud835\udc5c\ud835\udc5b+\ud835\udc46\ud835\udc46\ud835\udc45\ud835\udc52\ud835\udc60\ud835\udc56\ud835\udc51\ud835\udc62\ud835\udc4e\ud835\udc59\ud835\udc60<\/div>\n<div><\/div>\n<div class=\"textLayer\">In In-Class Activity6.C, you learned that the coefficient of determination, \ud835\udc45<sup>2<\/sup>, is interpreted as the percentage of variation in the response variable that can be explained by the linear relationship with an explanatory variable. This quantity can be expressed using the sums of squares. Note that \ud835\udc452can be expressed as a percentage or as a proportion.<\/div>\n<div><\/div>\n<div class=\"textLayer\">[latex]\ud835\udc45^{2}=\\frac{variation\\;explained}{total\\;variation}=\\frac{SSRegression}{SSTotal}=1\u2212\\frac{SSResiduals}{SSTotal}[\/latex]<\/div>\n<div><\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 6<\/h3>\n<div class=\"textLayer\">6) Revisit the extreme examples that you sketched in Question 5.<\/div>\n<div class=\"textLayer\">Part A: When SSRegression = 0, \ud835\udc452= _____.<\/div>\n<div class=\"textLayer\">Part B:When SSResiduals = 0, \ud835\udc452= _____.<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"bp-page-3\" class=\"page\" data-page-number=\"3\" data-loaded=\"true\">\n<div class=\"ba-Layer ba-Layer--highlight\" data-resin-fileid=\"910629731994\" data-resin-iscurrent=\"true\" data-resin-feature=\"annotations\" data-testid=\"ba-Layer--highlight\"><\/div>\n<\/div>\n<div id=\"bp-page-4\" class=\"page\" data-page-number=\"4\" data-loaded=\"true\">\n<div class=\"textLayer\">Sums of squares can be organized in an ANOVA table. The following table provides the information necessary to calculate an F-statistic in the context of regression. Note that \ud835\udc5b=sample size and \ud835\udc5d=number of predictors. In simple linear regression, \ud835\udc5d=1.<\/div>\n<\/div>\n<div>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 20%;\"><strong>Source<\/strong><\/td>\n<td style=\"width: 20%;\"><strong>Df<\/strong><\/td>\n<td style=\"width: 20%;\"><strong>Sum sq<\/strong><\/td>\n<td style=\"width: 20%;\"><strong>Mean sq<\/strong><\/td>\n<td style=\"width: 20%;\"><strong>F value<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 20%;\">Regression<\/td>\n<td style=\"width: 20%;\">p<\/td>\n<td style=\"width: 20%;\">SSRegression<\/td>\n<td style=\"width: 20%;\">[latex]MSRegression =\\frac{SSRegression}{p}[\/latex]<\/td>\n<td style=\"width: 20%;\">[latex]\\frac{MSRegression}{MSResiduals}[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 20%;\">Residuals<\/td>\n<td style=\"width: 20%;\">n-1-p<\/td>\n<td style=\"width: 20%;\">SSResiduals<\/td>\n<td style=\"width: 20%;\">[latex]MSResiduals =\\frac{SSResiduals}{n-1-p}[\/latex]<\/td>\n<td style=\"width: 20%;\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 20%;\"><strong>Total<\/strong><\/td>\n<td style=\"width: 20%;\"><strong>n-1I<\/strong><\/td>\n<td style=\"width: 20%;\"><strong>SSTotal<\/strong><\/td>\n<td style=\"width: 20%;\"><\/td>\n<td style=\"width: 20%;\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div id=\"bp-page-4\" class=\"page\" data-page-number=\"4\" data-loaded=\"true\">\n<div class=\"textLayer\">A statistics student from San Antonio, Texas completed a project to explore whether there is a relationship between neighborhood income and access to organic items at local grocery stores. Specifically, the student counted the number of organic vegetables offered at 37 H.E.B. grocery stores. She then cross-referenced these data with the average household incomes(in dollars) for the zip codes where the stores are located.1<\/div>\n<div><\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 7<\/h3>\n<p>7) Enter the grocery store data into the DCMP Linear Regression tool at https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/. Select \u201cAverage income in zip code\u201das the explanatory (\ud835\udc4b) variable and \u201cNumber of organic items offered\u201d as the response (\ud835\udc4c) variable. Under \u201cRegression Options,\u201dclick the box to show the ANOVA table. Use the information from the tool to fill out the following table.<\/p>\n<table>\n<tbody>\n<tr>\n<td>Source<\/td>\n<td>Df<\/td>\n<td>Sum sq<\/td>\n<td>Mean sq<\/td>\n<td>F value<\/td>\n<\/tr>\n<tr>\n<td>Regression<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Residuals<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td><strong>Total<\/strong><\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The ANOVA table will also include a P-value, which tells the probability of obtaining an F-statistic as large or larger than the one in the sample if the null hypothesis was true. An ANOVA F-test can be used to test the population slope for simple linear regression, the same scenario where you used a t-test in In-Class Activity16.A:1Scenario adapted from Skew the Script: https:\/\/skewthescript.org\/3-1<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">[latex]H_{0}:\\beta=0[\/latex] vs. [latex]H_{A}:\\beta\\neq0[\/latex], where [latex]\\beta[\/latex]=the population slope relating the number of organic items offered and the average income in zip codeTo model the values of the F-statistic that would occur if the null hypothesis was true and the assumptions for inference were met, you will use an F Distribution with \ud835\udc51\ud835\udc531=\ud835\udc5dand \ud835\udc51\ud835\udc532=\ud835\udc5b\u20131\u2013\ud835\udc5d.<\/div>\n<div><\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 8<\/h3>\n<p>8) Use the DCMP F Distribution tool at https:\/\/dcmathpathways.shinyapps.io\/FDist\/to calculate a P-value that measures the evidence of an association between the number of organic items offered and the average household income. Include a sketch of the F Distribution.(You may assume that a linear model is appropriate and all assumptions for inference are met.)<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 9<\/h3>\n<p>9) At the \ud835\udefc=0.05significance level, do these data provide sufficient evidence of an association between the number of organic items offered and the average household income in the neighborhood? State your conclusion in context. Suppose you had conducted a t-test for the slope instead of an F-test for the slope in this scenario. The value of the t-statistic would have been 7.50, the square root of the F-statistic. The P-value for the two-sided t-test would be the same as the P-value for the F-test.<\/p>\n<\/div>\n<\/div>\n<div class=\"textLayer\">\n<div class=\"textbox key-takeaways\">\n<h3>Question 10<\/h3>\n<div class=\"textLayer\">10) As the F-statistic gets larger, the P-value gets smaller, indicating stronger evidence of an association. Answer Parts A through C to understand whichfactors affect the strength of evidence.<\/div>\n<div class=\"textLayer\">Part A: As the slope of the regression line gets steeper, the evidence of an association gets ______ (stronger\/weaker).<\/div>\n<div class=\"textLayer\">Part B: As the spread of the points around the regression line increases, the evidence of an association gets ______ (stronger\/weaker).<\/div>\n<div class=\"textLayer\">Part C: Assuming that the slope and the spread of the points around the regression line stay about the same, as the sample size increases, the evidence of an association gets ______ (stronger\/weaker).<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":23592,"menu_order":68,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-5526","chapter","type-chapter","status-publish","hentry"],"part":5514,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/5526","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/users\/23592"}],"version-history":[{"count":3,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/5526\/revisions"}],"predecessor-version":[{"id":5633,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/5526\/revisions\/5633"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/parts\/5514"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/5526\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/media?parent=5526"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapter-type?post=5526"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/contributor?post=5526"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/license?post=5526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}