{"id":146,"date":"2022-06-13T19:51:14","date_gmt":"2022-06-13T19:51:14","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/chapter\/solutions-to-selected-exercises\/"},"modified":"2022-06-13T19:51:14","modified_gmt":"2022-06-13T19:51:14","slug":"solutions-to-selected-exercises","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/chapter\/solutions-to-selected-exercises\/","title":{"raw":"Solutions to Selected Exercises","rendered":"Solutions to Selected Exercises"},"content":{"raw":"\n<h2>Introduction to Multiple Regression<\/h2>\n<h3>Exercise 1: Baby Weights, Part I<\/h3>\n<ol><li>[latex]\\widehat{\\text{baby_weight}}=123.05-8.94\\times\\text{smoke}[\/latex]<\/li>\n\t<li>The estimated body weight of babies born&nbsp;to smoking mothers is 8.94 ounces lower than&nbsp;babies born to non-smoking mothers. Smoker:&nbsp;123.05 \u2212 8.94 \u00d7 1 = 114.11 ounces. Non-smoker:&nbsp;123.05 \u2212 8.94 \u00d7 0 = 123.05 ounces.<\/li>\n\t<li><em>H<\/em><sub>0<\/sub>:&nbsp;\u03b2<sub>1<\/sub> = 0.\n<em>H<sub>A<\/sub><\/em>: \u03b2<sub>1<\/sub> 6= 0.\n<em>T<\/em> = \u22128.65, and the&nbsp;<em>p<\/em>-value is approximately 0.\nSince the <em>p<\/em>-value&nbsp;is very small, we reject <em>H<\/em><sub>0<\/sub>. The data provide&nbsp;strong evidence that the true slope parameter is&nbsp;different than 0 and that there is an association&nbsp;between birth weight and smoking. Furthermore, having rejected&nbsp;<em>H<\/em><sub>0<\/sub>, we can conclude that&nbsp;smoking is associated with lower birth weights.<\/li>\n<\/ol><h3>Exercise 3<strong>: <\/strong><strong>Baby weights, Part III<\/strong><\/h3>\n<ol><li>[latex]\\widehat{\\text{baby_weight}}=-80.41+0.44\\times\\text{gestation}-3.33\\times\\text{parity}-0.01\\times\\text{age}+1.15\\times\\text{height}+0.05\\times\\text{weight}-8.40\\times\\text{smoke}[\/latex].<\/li>\n\t<li>\u03b2<sub>gestation<\/sub>: The model predicts a 0.44 ounce&nbsp;increase in the birth weight of the baby for each&nbsp;additional day of pregnancy, all else held constant.\n\u03b2<sub>age<\/sub>: The model predicts a 0.01 ounce&nbsp;decrease in the birth weight of the baby for each&nbsp;additional year in mother\u2019s age, all else held constant.<\/li>\n\t<li>Parity might be correlated with one&nbsp;of the other variables in the model, which complicates model estimation<\/li>\n\t<li>[latex]\\begin{array}\\widehat{\\text{baby_weight}}=120.58.\\hfill&amp;{e}=120-120.58=-0.58\\end{array}[\/latex].&nbsp;The&nbsp;model over-predicts this baby\u2019s birth weight.<\/li>\n\t<li><em>R<\/em><sup>2<\/sup><em><sub>adj<\/sub><\/em> = 0.2468.<\/li>\n<\/ol><h3>Exercise 5:&nbsp;<strong>GPA<\/strong><\/h3>\n(a) (-0.32, 0.16). We are 95% confident that&nbsp;male students on average have GPAs 0.32 points&nbsp;lower to 0.16 points higher than females when&nbsp;controlling for the other variables in the model.\n\n(b) Yes, since the p-value is larger than 0.05 in&nbsp;all cases (not including the intercept).\n<h2>Model Selection<\/h2>\n<h3><strong>Exercise&nbsp;7: Baby weights, Part IV<\/strong><\/h3>\nRemove age.\n<h3><strong>Exercise 9: <\/strong>Baby weights, Part V<\/h3>\nBased on the <em>p<\/em>-value alone, either gestation&nbsp;or smoke should be added to the model first.&nbsp;However, since the adjusted R2&nbsp;with gestation is higher, it would be preferable&nbsp;to add gestation in the first step of the forward-selection algorithm. (Other explanations are&nbsp;possible. For instance, it would be reasonable&nbsp;to only use the adjusted R2.)\n<h3>Exercise 11: Movie lovers, Part I<\/h3>\nShe should use <em>p<\/em>-value selection since she&nbsp;is interested in finding out about significant predictors, not just optimizing predictions.\n<h2>Checking Model Assumptions Using Graphs<\/h2>\n<h3><strong>Exercise 13<\/strong>: Baby weights, Part V<\/h3>\n<strong>Nearly normal residuals:<\/strong> The normal&nbsp;probability plot shows a nearly normal distribution of the residuals, however, there are some&nbsp;minor irregularities at the tails. With a data set&nbsp;so large, these would not be a concern.\n\n<strong>Constant variability of residuals:<\/strong> The scatter-plot of the residuals versus the fitted values does&nbsp;not show any overall structure. However, values that have very low or very high fitted values appear to also have somewhat larger outliers. In addition, the residuals do appear to&nbsp;have constant variability between the two parity&nbsp;and smoking status groups, though these items&nbsp;are relatively minor.\n\n<strong>Independent residuals:<\/strong> The scatterplot of residuals versus the order of data collection shows a&nbsp;random scatter, suggesting that there is no apparent structures related to the order the data&nbsp;were collected.\n\nLinear relationships between the response variable and numerical explanatory variables: The&nbsp;residuals vs. height and weight of mother are&nbsp;randomly distributed around 0. The residuals&nbsp;vs. length of gestation plot also does not show&nbsp;any clear or strong remaining structures, with&nbsp;the possible exception of very short or long gestations. The rest of the residuals do appear to&nbsp;be randomly distributed around 0.\n\nAll concerns raised here are relatively mild.&nbsp;There are some outliers, but there is so much&nbsp;data that the influence of such observations will&nbsp;be minor.\n<h2>Introduction to Logistic Regression<\/h2>\n<h3><strong>Exercise 15<\/strong>:&nbsp;Possum classification, Part I<\/h3>\n(a) There are a few potential outliers, e.g.&nbsp;on the left in the total length variable, but&nbsp;nothing that will be of serious concern in a data&nbsp;set this large.\n\n(b) When coefficient estimates&nbsp;are sensitive to which variables are included in&nbsp;the model, this typically indicates that some&nbsp;variables are collinear. For example, a possum\u2019s gender may be related to its head length,&nbsp;which would explain why the coefficient (and <em>p<\/em>-value) for sex male changed when we removed&nbsp;the head length variable. Likewise, a possum\u2019s&nbsp;skull width is likely to be related to its head&nbsp;length, probably even much more closely related&nbsp;than the head length was to gender.\n<h3><strong>Exercise 17:<\/strong>&nbsp;Possum classification, Part II<\/h3>\n(a) The logistic model relating \u02c6pi to the&nbsp;predictors may be written as log&nbsp;33.5095 \u2212 1.4207 \u00d7 sex malei \u2212 0.2787 \u00d7&nbsp;skull widthi + 0.5687\u00d7total lengthi \u22121.8057\u00d7&nbsp;tail lengthi. Only total length has a positive&nbsp;association with a possum being from Victoria.\n\n(b) \u02c6p = 0.0062. While the probability is very&nbsp;near zero, we have not run diagnostics on the&nbsp;model. We might also be a little skeptical that&nbsp;the model will remain accurate for a possum&nbsp;found in a US zoo. For example, perhaps the&nbsp;zoo selected a possum with specific characteristics but only looked in one region. On the&nbsp;other hand, it is encouraging that the possum&nbsp;was caught in the wild. (Answers regarding the&nbsp;reliability of the model probability will vary.)\n\n&nbsp;\n","rendered":"<h2>Introduction to Multiple Regression<\/h2>\n<h3>Exercise 1: Baby Weights, Part I<\/h3>\n<ol>\n<li>[latex]\\widehat{\\text{baby_weight}}=123.05-8.94\\times\\text{smoke}[\/latex]<\/li>\n<li>The estimated body weight of babies born&nbsp;to smoking mothers is 8.94 ounces lower than&nbsp;babies born to non-smoking mothers. Smoker:&nbsp;123.05 \u2212 8.94 \u00d7 1 = 114.11 ounces. Non-smoker:&nbsp;123.05 \u2212 8.94 \u00d7 0 = 123.05 ounces.<\/li>\n<li><em>H<\/em><sub>0<\/sub>:&nbsp;\u03b2<sub>1<\/sub> = 0.<br \/>\n<em>H<sub>A<\/sub><\/em>: \u03b2<sub>1<\/sub> 6= 0.<br \/>\n<em>T<\/em> = \u22128.65, and the&nbsp;<em>p<\/em>-value is approximately 0.<br \/>\nSince the <em>p<\/em>-value&nbsp;is very small, we reject <em>H<\/em><sub>0<\/sub>. The data provide&nbsp;strong evidence that the true slope parameter is&nbsp;different than 0 and that there is an association&nbsp;between birth weight and smoking. Furthermore, having rejected&nbsp;<em>H<\/em><sub>0<\/sub>, we can conclude that&nbsp;smoking is associated with lower birth weights.<\/li>\n<\/ol>\n<h3>Exercise 3<strong>: <\/strong><strong>Baby weights, Part III<\/strong><\/h3>\n<ol>\n<li>[latex]\\widehat{\\text{baby_weight}}=-80.41+0.44\\times\\text{gestation}-3.33\\times\\text{parity}-0.01\\times\\text{age}+1.15\\times\\text{height}+0.05\\times\\text{weight}-8.40\\times\\text{smoke}[\/latex].<\/li>\n<li>\u03b2<sub>gestation<\/sub>: The model predicts a 0.44 ounce&nbsp;increase in the birth weight of the baby for each&nbsp;additional day of pregnancy, all else held constant.<br \/>\n\u03b2<sub>age<\/sub>: The model predicts a 0.01 ounce&nbsp;decrease in the birth weight of the baby for each&nbsp;additional year in mother\u2019s age, all else held constant.<\/li>\n<li>Parity might be correlated with one&nbsp;of the other variables in the model, which complicates model estimation<\/li>\n<li>[latex]\\begin{array}\\widehat{\\text{baby_weight}}=120.58.\\hfill&{e}=120-120.58=-0.58\\end{array}[\/latex].&nbsp;The&nbsp;model over-predicts this baby\u2019s birth weight.<\/li>\n<li><em>R<\/em><sup>2<\/sup><em><sub>adj<\/sub><\/em> = 0.2468.<\/li>\n<\/ol>\n<h3>Exercise 5:&nbsp;<strong>GPA<\/strong><\/h3>\n<p>(a) (-0.32, 0.16). We are 95% confident that&nbsp;male students on average have GPAs 0.32 points&nbsp;lower to 0.16 points higher than females when&nbsp;controlling for the other variables in the model.<\/p>\n<p>(b) Yes, since the p-value is larger than 0.05 in&nbsp;all cases (not including the intercept).<\/p>\n<h2>Model Selection<\/h2>\n<h3><strong>Exercise&nbsp;7: Baby weights, Part IV<\/strong><\/h3>\n<p>Remove age.<\/p>\n<h3><strong>Exercise 9: <\/strong>Baby weights, Part V<\/h3>\n<p>Based on the <em>p<\/em>-value alone, either gestation&nbsp;or smoke should be added to the model first.&nbsp;However, since the adjusted R2&nbsp;with gestation is higher, it would be preferable&nbsp;to add gestation in the first step of the forward-selection algorithm. (Other explanations are&nbsp;possible. For instance, it would be reasonable&nbsp;to only use the adjusted R2.)<\/p>\n<h3>Exercise 11: Movie lovers, Part I<\/h3>\n<p>She should use <em>p<\/em>-value selection since she&nbsp;is interested in finding out about significant predictors, not just optimizing predictions.<\/p>\n<h2>Checking Model Assumptions Using Graphs<\/h2>\n<h3><strong>Exercise 13<\/strong>: Baby weights, Part V<\/h3>\n<p><strong>Nearly normal residuals:<\/strong> The normal&nbsp;probability plot shows a nearly normal distribution of the residuals, however, there are some&nbsp;minor irregularities at the tails. With a data set&nbsp;so large, these would not be a concern.<\/p>\n<p><strong>Constant variability of residuals:<\/strong> The scatter-plot of the residuals versus the fitted values does&nbsp;not show any overall structure. However, values that have very low or very high fitted values appear to also have somewhat larger outliers. In addition, the residuals do appear to&nbsp;have constant variability between the two parity&nbsp;and smoking status groups, though these items&nbsp;are relatively minor.<\/p>\n<p><strong>Independent residuals:<\/strong> The scatterplot of residuals versus the order of data collection shows a&nbsp;random scatter, suggesting that there is no apparent structures related to the order the data&nbsp;were collected.<\/p>\n<p>Linear relationships between the response variable and numerical explanatory variables: The&nbsp;residuals vs. height and weight of mother are&nbsp;randomly distributed around 0. The residuals&nbsp;vs. length of gestation plot also does not show&nbsp;any clear or strong remaining structures, with&nbsp;the possible exception of very short or long gestations. The rest of the residuals do appear to&nbsp;be randomly distributed around 0.<\/p>\n<p>All concerns raised here are relatively mild.&nbsp;There are some outliers, but there is so much&nbsp;data that the influence of such observations will&nbsp;be minor.<\/p>\n<h2>Introduction to Logistic Regression<\/h2>\n<h3><strong>Exercise 15<\/strong>:&nbsp;Possum classification, Part I<\/h3>\n<p>(a) There are a few potential outliers, e.g.&nbsp;on the left in the total length variable, but&nbsp;nothing that will be of serious concern in a data&nbsp;set this large.<\/p>\n<p>(b) When coefficient estimates&nbsp;are sensitive to which variables are included in&nbsp;the model, this typically indicates that some&nbsp;variables are collinear. For example, a possum\u2019s gender may be related to its head length,&nbsp;which would explain why the coefficient (and <em>p<\/em>-value) for sex male changed when we removed&nbsp;the head length variable. Likewise, a possum\u2019s&nbsp;skull width is likely to be related to its head&nbsp;length, probably even much more closely related&nbsp;than the head length was to gender.<\/p>\n<h3><strong>Exercise 17:<\/strong>&nbsp;Possum classification, Part II<\/h3>\n<p>(a) The logistic model relating \u02c6pi to the&nbsp;predictors may be written as log&nbsp;33.5095 \u2212 1.4207 \u00d7 sex malei \u2212 0.2787 \u00d7&nbsp;skull widthi + 0.5687\u00d7total lengthi \u22121.8057\u00d7&nbsp;tail lengthi. Only total length has a positive&nbsp;association with a possum being from Victoria.<\/p>\n<p>(b) \u02c6p = 0.0062. While the probability is very&nbsp;near zero, we have not run diagnostics on the&nbsp;model. We might also be a little skeptical that&nbsp;the model will remain accurate for a possum&nbsp;found in a US zoo. For example, perhaps the&nbsp;zoo selected a possum with specific characteristics but only looked in one region. On the&nbsp;other hand, it is encouraging that the possum&nbsp;was caught in the wild. (Answers regarding the&nbsp;reliability of the model probability will vary.)<\/p>\n<p>&nbsp;<\/p>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-146\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>OpenIntro Statistics. <strong>Authored by<\/strong>: David M Diez, Christopher D Barr, and Mine Cetinkaya-Rundel. <strong>Provided by<\/strong>: OpenIntro. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/www.openintro.org\/stat\/textbook.php\">https:\/\/www.openintro.org\/stat\/textbook.php<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/\">CC BY-SA: Attribution-ShareAlike<\/a><\/em>. <strong>License Terms<\/strong>: This textbook is available under a Creative Commons license. Visit openintro.org for a free  PDF, to download the textbook&#039;s source files.<\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":395986,"menu_order":6,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"OpenIntro Statistics\",\"author\":\"David M Diez, Christopher D Barr, and Mine Cetinkaya-Rundel\",\"organization\":\"OpenIntro\",\"url\":\"https:\/\/www.openintro.org\/stat\/textbook.php\",\"project\":\"\",\"license\":\"cc-by-sa\",\"license_terms\":\"This textbook is available under a Creative Commons license. Visit openintro.org for a free  PDF, to download the textbook's source files.\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-146","chapter","type-chapter","status-publish","hentry"],"part":140,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/pressbooks\/v2\/chapters\/146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/wp\/v2\/users\/395986"}],"version-history":[{"count":0,"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/pressbooks\/v2\/chapters\/146\/revisions"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/pressbooks\/v2\/parts\/140"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/pressbooks\/v2\/chapters\/146\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/wp\/v2\/media?parent=146"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=146"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/wp\/v2\/contributor?post=146"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/nhti-introstats\/wp-json\/wp\/v2\/license?post=146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}