{"id":978,"date":"2016-04-21T22:43:35","date_gmt":"2016-04-21T22:43:35","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/introstats1xmaster\/?post_type=chapter&#038;p=978"},"modified":"2016-04-21T22:43:35","modified_gmt":"2016-04-21T22:43:35","slug":"exercises","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/chapter\/exercises\/","title":{"raw":"Exercises","rendered":"Exercises"},"content":{"raw":"<h2>Introduction to Multiple Regression<\/h2>\n<h3>Exercise 1: Baby Weights, Part I<\/h3>\nThe Child Health and Development Studies investigate a range of\u00a0topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is coded 1 if the mother is a smoker, and 0 if not. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, based on the smoking status of\u00a0the mother.[footnote]Child\u00a0Health and Development Studies, Baby weights data set.[\/footnote]\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(&gt; |t|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>123.05<\/td>\n<td>0.65<\/td>\n<td>189.60<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>smoke<\/td>\n<td>\u20138.94<\/td>\n<td>1.03<\/td>\n<td>\u20138.65<\/td>\n<td>0.0000<\/td>\n<\/tr><\/tbody><\/table>\nThe variability within the smokers and non-smokers are about equal and the distributions are\u00a0symmetric. With these conditions satisfied, it is reasonable to apply the model. (Note that we\u00a0don't need to check linearity since the predictor has only two levels.)\n<ol><li>Write the equation of the regression line.<\/li>\n\t<li>Interpret the slope in this context, and calculate the predicted birth weight of babies born to\u00a0smoker and non-smoker mothers.<\/li>\n\t<li>Is there a statistically significant relationship between the average birth weight and smoking?<\/li>\n<\/ol><h3>Exercise <strong>2: Baby weights, Part II<\/strong><\/h3>\nExercise 1 introduces a data set on birth weight of babies.\u00a0Another variable we consider is parity, which is 0 if the child is the first born, and 1 otherwise. The summary table below shows the results of a linear regression model for predicting the average\u00a0birth weight of babies, measured in ounces, from parity.\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(&gt; |t|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>120.07<\/td>\n<td>0.60<\/td>\n<td>199.94<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>parity<\/td>\n<td>\u20131.93<\/td>\n<td>1.19<\/td>\n<td>\u20131.62<\/td>\n<td>0.1052<\/td>\n<\/tr><\/tbody><\/table><ol><li>Write the equation of the regression line.<\/li>\n\t<li>Interpret the slope in this context, and calculate the predicted birth weight of first borns and\u00a0others.<\/li>\n\t<li>Is there a statistically significant relationship between the average birth weight and parity?<\/li>\n<\/ol><h3>Exercise 3<strong>: <\/strong><strong>Baby weights, Part III<\/strong><\/h3>\nWe considered the variables smoke and parity, one at a time, in\u00a0modeling birth weights of babies in Exercises 1 and 2. A more realistic approach to modeling infant weights is to consider all possibly related variables at once. Other variables of interest include length of pregnancy in days (gestation), mother's age in years (age), mother's height in inches (height), and mother's pregnancy weight in pounds (weight). Below are three observations\u00a0from this data set.\n<table><tbody><tr><td\/>\n<td>bwt<\/td>\n<td>gestation<\/td>\n<td>parity<\/td>\n<td>age<\/td>\n<td>height<\/td>\n<td>weight<\/td>\n<td>smoke<\/td>\n<\/tr><tr><td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">120<\/td>\n<td style=\"text-align: center;\">284<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">27<\/td>\n<td style=\"text-align: center;\">62<\/td>\n<td style=\"text-align: center;\">100<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<\/tr><tr><td style=\"text-align: center;\">2<\/td>\n<td style=\"text-align: center;\">113<\/td>\n<td style=\"text-align: center;\">282<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">33<\/td>\n<td style=\"text-align: center;\">64<\/td>\n<td style=\"text-align: center;\">135<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<\/tr><tr><td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<\/tr><tr><td style=\"text-align: center;\">1236<\/td>\n<td style=\"text-align: center;\">117<\/td>\n<td style=\"text-align: center;\">297<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">38<\/td>\n<td style=\"text-align: center;\">65<\/td>\n<td style=\"text-align: center;\">129<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<\/tr><\/tbody><\/table>\nThe summary table below shows the results of a regression model for predicting the average birth\u00a0weight of babies based on all of the variables included in the data set.\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(&gt;|t|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>\u201380.41<\/td>\n<td>14.35<\/td>\n<td>\u20135.60<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>gestation<\/td>\n<td>0.44<\/td>\n<td>0.03<\/td>\n<td>15.26<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>parity<\/td>\n<td>\u20133.33<\/td>\n<td>1.13<\/td>\n<td>\u20132.95<\/td>\n<td>0.0033<\/td>\n<\/tr><tr><td>age<\/td>\n<td>\u20130.01<\/td>\n<td>0.09<\/td>\n<td>\u20130.10<\/td>\n<td>0.9170<\/td>\n<\/tr><tr><td>height<\/td>\n<td>1.15<\/td>\n<td>0.21<\/td>\n<td>5.63<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>weight<\/td>\n<td>0.05<\/td>\n<td>0.03<\/td>\n<td>1.99<\/td>\n<td>0.0471<\/td>\n<\/tr><tr><td>smoke<\/td>\n<td>\u20138.40<\/td>\n<td>0.95<\/td>\n<td>\u20138.81<\/td>\n<td>0.0000<\/td>\n<\/tr><\/tbody><\/table><ol><li>Write the equation of the regression line that includes all of the variables.<\/li>\n\t<li>Interpret the slopes of gestation and age in this context.<\/li>\n\t<li>The coefficient for parity is different than in the linear model shown in Exercise 2. Why\u00a0might there be a difference?<\/li>\n\t<li>Calculate the residual for the first observation in the data set.<\/li>\n\t<li>The variance of the residuals is 249.28, and the variance of the birth weights of all babies\u00a0in the data set is 332.57. Calculate the <em>R<\/em><sup>2<\/sup> and the adjusted <em>R<\/em><sup>2<\/sup>. Note that there are 1,236\u00a0observations in the data set.<\/li>\n<\/ol><h3>Exercise 4: Absenteeism, Part I<\/h3>\nResearchers interested in the relationship between absenteeism from\u00a0school and certain demographic characteristics of children collected data from 146 randomly sampled students in rural New South Wales, Australia, in a particular school year. Below are three\u00a0observations from this data set.\n<table><tbody><tr><td\/>\n<td>eth<\/td>\n<td>sex<\/td>\n<td>lrn<\/td>\n<td>days<\/td>\n<\/tr><tr><td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">2<\/td>\n<\/tr><tr><td style=\"text-align: center;\">2<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">11<\/td>\n<\/tr><tr><td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<\/tr><tr><td style=\"text-align: center;\">146<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">37<\/td>\n<\/tr><\/tbody><\/table>\nThe summary table below shows the results of a linear regression model for predicting the average\u00a0number of days absent based on ethnic background (eth: 0\u2014aboriginal, 1\u2014not aboriginal), sex\u00a0(sex: 0\u2014female, 1\u2014male), and learner status (lrn: 0\u2014average learner, 1\u2014slow learner).[footnote]W. N. Venables and B. D. Ripley.<em> Modern Applied Statistics with S<\/em>. Fourth Edition. Data can also\u00a0be found in the R MASS package. New York: Springer, 2002.[\/footnote]\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(<em>&gt;<\/em>|t|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>18.93<\/td>\n<td>2.57<\/td>\n<td>7.37<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>eth<\/td>\n<td>\u20139.11<\/td>\n<td>2.60<\/td>\n<td>\u20133.51<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><td>sex<\/td>\n<td>3.10<\/td>\n<td>2.64<\/td>\n<td>1.18<\/td>\n<td>0.2411<\/td>\n<\/tr><tr><td>lrn<\/td>\n<td>2.15<\/td>\n<td>2.65<\/td>\n<td>0.81<\/td>\n<td>0.4177<\/td>\n<\/tr><\/tbody><\/table><ol><li>Write the equation of the regression line.<\/li>\n\t<li>Interpret each one of the slopes in this context.<\/li>\n\t<li>Calculate the residual for the first observation in the data set: a student who is aboriginal,\u00a0male, a slow learner, and missed 2 days of school.<\/li>\n\t<li>The variance of the residuals is 240.57, and the variance of the number of absent days for all\u00a0students in the data set is 264.17. Calculate the<em> <em>R<\/em><\/em><sup>2<\/sup>\u00a0and the adjusted<em> <em>R<\/em><\/em><sup>2<\/sup>. Note that there are\u00a0146 observations in the data set.<\/li>\n<\/ol><h3>Exercise 5:\u00a0<strong>GPA<\/strong><\/h3>\nA survey of 55 Duke University students asked about their GPA, number of hours\u00a0they study at night, number of nights they go out, and their gender. Summary output of the\u00a0regression model is shown below. Note that male is coded as 1.\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(<em>&gt;<\/em>|t|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>3.45<\/td>\n<td>0.35<\/td>\n<td>9.85<\/td>\n<td>0.00<\/td>\n<\/tr><tr><td>studyweek<\/td>\n<td>0.00<\/td>\n<td>0.00<\/td>\n<td>0.27<\/td>\n<td>0.79<\/td>\n<\/tr><tr><td>sleepnight<\/td>\n<td>0.01<\/td>\n<td>0.05<\/td>\n<td>0.11<\/td>\n<td>0.91<\/td>\n<\/tr><tr><td>outnight<\/td>\n<td>0.05<\/td>\n<td>0.05<\/td>\n<td>1.01<\/td>\n<td>0.32<\/td>\n<\/tr><tr><td>gender<\/td>\n<td>\u20130.08<\/td>\n<td>0.12<\/td>\n<td>\u20130.68<\/td>\n<td>0.50<\/td>\n<\/tr><\/tbody><\/table><ol><li>Calculate a 95% confidence interval for the coefficient of gender in the model, and interpret it\u00a0in the context of the data.<\/li>\n\t<li>Would you expect a 95% confidence interval for the slope of the remaining variables to include\u00a00? Explain<\/li>\n<\/ol><h3>Exercise 6:\u00a0Cherry Trees<\/h3>\nTimber yield is approximately equal to the volume of a tree, however, thisvalue is difficult to measure without first cutting the tree down. Instead, other variables, such as height and diameter, may be used to predict a tree's volume and yield. Researchers wanting to understand the relationship between these variables for black cherry trees collected data from 31 such trees in the Allegheny National Forest, Pennsylvania. Height is measured in feet, diameter\u00a0in inches (at 54 inches above ground), and volume in cubic feet.[footnote]D.J. Hand.<em> A handbook of small data sets<\/em>. Chapman &amp; Hall\/CRC, 1994.[\/footnote]\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(<em>&gt;<\/em> |t|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>\u201357.99<\/td>\n<td>8.64<\/td>\n<td>\u20136.71<\/td>\n<td>0.00<\/td>\n<\/tr><tr><td>height<\/td>\n<td>0.34<\/td>\n<td>0.13<\/td>\n<td>2.61<\/td>\n<td>0.01<\/td>\n<\/tr><tr><td>diameter<\/td>\n<td>4.71<\/td>\n<td>0.26<\/td>\n<td>17.82<\/td>\n<td>0.00<\/td>\n<\/tr><\/tbody><\/table><ol><li>Calculate a 95% confidence interval for the coefficient of height, and interpret it in the context\u00a0of the data.<\/li>\n\t<li>One tree in this sample is 79 feet tall, has a diameter of 11.3 inches, and is 24.2 cubic feet in\u00a0volume. Determine if the model overestimates or underestimates the volume of this tree, and\u00a0by how much.<\/li>\n<\/ol><h2>Model Selection<\/h2>\n<h3><strong>Exercise\u00a07: Baby weights, Part IV<\/strong><\/h3>\nExercise 3 considers a model that predicts a newborn's weight\u00a0using several predictors (gestation length, parity, age of mother, height of mother, weight of mother, smoking status of mother). The table below shows the adjusted R-squared for the full model as well as adjusted R-squared values for all models we evaluate in the first step of the backwards\u00a0elimination process.\n<table><tbody><tr><td\/>\n<td>Model<\/td>\n<td>Adjusted<i>\u00a0<em><em>R<\/em><\/em><\/i><sup>2<\/sup><\/td>\n<\/tr><tr><td>1<\/td>\n<td>Full model<\/td>\n<td>0.2541<\/td>\n<\/tr><tr><td>2<\/td>\n<td>No gestation<\/td>\n<td>0.1031<\/td>\n<\/tr><tr><td>3<\/td>\n<td>No parity<\/td>\n<td>0.2492<\/td>\n<\/tr><tr><td>4<\/td>\n<td>No age<\/td>\n<td>0.2547<\/td>\n<\/tr><tr><td>5<\/td>\n<td>No height<\/td>\n<td>0.2311<\/td>\n<\/tr><tr><td>6<\/td>\n<td>No weight<\/td>\n<td>0.2536<\/td>\n<\/tr><tr><td>7<\/td>\n<td>No smoking status<\/td>\n<td>0.2072<\/td>\n<\/tr><\/tbody><\/table>\nWhich, if any, variable should be removed from the model first?\n<h3><strong>Exercise 8: <\/strong>Absenteeism, Part II<\/h3>\nExercise 4 considers a model that predicts the number of days\u00a0absent using three predictors: ethnic background (eth), gender (sex), and learner status (lrn). The table below shows the adjusted R-squared for the model as well as adjusted R-squared values\u00a0for all models we evaluate in the first step of the backwards elimination process.\n<table><tbody><tr><td\/>\n<td>Model<\/td>\n<td>Adjusted<i>\u00a0<em><em>R<\/em><\/em><\/i><sup>2<\/sup><\/td>\n<\/tr><tr><td>1<\/td>\n<td>Full model<\/td>\n<td>0.0701<\/td>\n<\/tr><tr><td>2<\/td>\n<td>No ethnicity<\/td>\n<td>\u20130.0033<\/td>\n<\/tr><tr><td>3<\/td>\n<td>No sex<\/td>\n<td>0.0676<\/td>\n<\/tr><tr><td>4<\/td>\n<td>No learner status<\/td>\n<td>0.0723<\/td>\n<\/tr><\/tbody><\/table>\nWhich, if any, variable should be removed from the model first?\n<h3><strong>Exercise 9: <\/strong>Baby weights, Part V<\/h3>\nExercise 3 provides regression output for the full model (including\u00a0all explanatory variables available in the data set) for predicting birth weight of babies. In this exercise we consider a forward-selection algorithm and add variables to the model one-at-a-time. The table below shows the <em>p<\/em>-value and adjusted<em> <i><em><em>R<\/em><\/em><\/i><\/em><sup>2<\/sup>\u00a0of each model where we include only the\u00a0corresponding predictor. Based on this table, which variable should be added to the model first?\n<table><tbody><tr><td>variable<\/td>\n<td>gestation<\/td>\n<td>parity<\/td>\n<td>age<\/td>\n<td>height<\/td>\n<td>weight<\/td>\n<td>smoke<\/td>\n<\/tr><tr><td><em>p<\/em>-value<\/td>\n<td>2.2 \u00d7 10<sup>\u221216<\/sup><\/td>\n<td>0.1052<\/td>\n<td>0.2375<\/td>\n<td>2.97 \u00d7 10<sup>\u221212<\/sup><\/td>\n<td>8.2 \u00d7 10<sup>\u22128<\/sup><\/td>\n<td>2.2 \u00d7 10<sup>\u221216<\/sup><\/td>\n<\/tr><tr><td><em>R<\/em><sup>2<\/sup><em><sub>adj<\/sub><\/em><\/td>\n<td>0.4657<\/td>\n<td>0.0013<\/td>\n<td>0.0003<\/td>\n<td>0.0386<\/td>\n<td>0.0229<\/td>\n<td>0.0569<\/td>\n<\/tr><\/tbody><\/table><h3><strong>Exercise 10: <\/strong>Absenteeism, Part III<\/h3>\nExercise 4 provides regression output for the full model, including all explanatory variables available in the data set, for predicting the number of days absent from school. In this exercise we consider a forward-selection algorithm and add variables to the model one-at-a-time. The table below shows the <em>p<\/em>-value and adjusted<em> <em><i><em><em>R<\/em><\/em><\/i><\/em><\/em><sup>2<\/sup>\u00a0of each model where we include only the corresponding predictor. Based on this table, which variable should be added to\u00a0the model first?\n<table><tbody><tr><td>variable<\/td>\n<td>ethnicity<\/td>\n<td>sex<\/td>\n<td>learner status<\/td>\n<\/tr><tr><td><em>p<\/em>-value<\/td>\n<td>0.007<\/td>\n<td>0.3142<\/td>\n<td>0.5870<\/td>\n<\/tr><tr><td><em>R<\/em><sup>2<\/sup><em><sub>adj<\/sub><\/em><\/td>\n<td>0.0714<\/td>\n<td>0.0001<\/td>\n<td>0<\/td>\n<\/tr><\/tbody><\/table><h3><strong>Exercise 11: <\/strong>Movie lovers, Part I<\/h3>\nSuppose a social scientist is interested in studying what makes\u00a0audiences love or hate a movie. She collects a random sample of movies (genre, length, cast, director, budget, etc.) as well as a measure of the success of the movie (score on a film review aggregator website). If as part of her research she is interested in finding out which variables are\u00a0significant predictors of movie success, what type of model selection method should she use?\n<h3><strong>Exercise 12: <\/strong>Movie lovers, Part II<\/h3>\nSuppose an online media streaming company is interested in building a movie recommendation system. The website maintains data on the movies in their database (genre, length, cast, director, budget, etc.) and additionally collects data from their subscribers (demographic information, previously watched movies, how they rated previously watched movies, etc.). The recommendation system will be deemed successful if subscribers actually watch, and rate highly, the movies recommended to them. Should the company use the adjusted<em> <em><em><i><em><em>R<\/em><\/em><\/i><\/em><\/em><sup>2<\/sup><\/em>\u00a0or the\u00a0p-value approach in selecting variables for their recommendation system?\n<h2>Checking Model Assumptions Using Graphs<\/h2>\n<h3><strong>Exercise 13<\/strong>: Baby weights, Part V<\/h3>\nExercise 3 presents a regression model for predicting the average\u00a0birth weight of babies based on length of gestation, parity, height, weight, and smoking status of the mother. Determine if the model assumptions are met using the plots below. If not, describe\u00a0how to proceed with the analysis.\n\n<img class=\"aligncenter size-full wp-image-1475\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215333\/Figure8_8.jpg\" alt=\"Figure8_8\" width=\"676\" height=\"1044\"\/><h3><strong>Exercise 14<\/strong>:\u00a0GPA and IQ<\/h3>\nA regression model for predicting GPA from gender and IQ was fit, and\u00a0both predictors were found to be statistically significant. Using the plots given below, determine\u00a0if this regression model is appropriate for these data.\n\n<img class=\"aligncenter size-full wp-image-1476\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215336\/Figure8_9.jpg\" alt=\"Figure8_9\" width=\"773\" height=\"955\"\/><h2>Introduction to Logistic Regression<\/h2>\n<h3><strong>Exercise 15<\/strong>:\u00a0Possum classification, Part I<\/h3>\n[caption id=\"attachment_1477\" align=\"alignright\" width=\"300\"]<img class=\"wp-image-1477\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215148\/5653697137_e7c24f507a_z.jpg\" alt=\"photo of an Australian possum. A small four-legged marsupial with short brown fur.\" width=\"300\" height=\"200\"\/> Figure 1. The common brushtail possum of Australia[\/caption]\n\nThe common brushtail possum of the Australia region is a\u00a0bit cuter than its distant cousin, the American opossum. We consider 104 brushtail possums from two regions in Australia, where the possums may be considered a random sample from the population. The first region is Victoria, which is in the eastern half of Australia and traverses the southern coast. The second region consists of New South Wales and\u00a0Queensland, which make up eastern and northeastern Australia.\n\nWe use logistic regression to differentiate between possums in these two regions. The outcome\u00a0variable, called population, takes value 1 when a possum is from Victoria and 0 when it is from New South Wales or Queensland. We consider five predictors: sex male (an indicator for a possum being male), head length, skull width, total length, and tail length. Each variable is summarized in a histogram. The full logistic regression model and a reduced model after variable\u00a0selection are summarized in the table.\n\n\u00a0\n\n<img class=\"aligncenter size-full wp-image-1478\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215338\/Figure8_10.png\" alt=\"Figure8_10\" width=\"796\" height=\"364\"\/><table><thead><tr><th\/>\n<th colspan=\"4\">Full Model<\/th>\n<th colspan=\"4\">Reduced Model<\/th>\n<\/tr><\/thead><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>SE<\/td>\n<td>Z<\/td>\n<td>Pr(&gt;|Z|)<\/td>\n<td>Estimate<\/td>\n<td>SE<\/td>\n<td>Z<\/td>\n<td>Pr(&gt;|Z|)<\/td>\n<\/tr><tr><th>(Intercept)<\/th>\n<td>39.2349<\/td>\n<td>11.5368<\/td>\n<td>3.40<\/td>\n<td>0.0007<\/td>\n<td>33.5095<\/td>\n<td>9.9053<\/td>\n<td>3.38<\/td>\n<td>0.0007<\/td>\n<\/tr><tr><th>sex_male<\/th>\n<td>\u22121.2376<\/td>\n<td>0.6662<\/td>\n<td>\u22121.86<\/td>\n<td>0.0632<\/td>\n<td>\u22121.4207<\/td>\n<td>0.6457<\/td>\n<td>\u22122.20<\/td>\n<td>0.0278<\/td>\n<\/tr><tr><th>head_length<\/th>\n<td>\u22120.1601<\/td>\n<td>0.1386<\/td>\n<td>\u22121.16<\/td>\n<td>0.2480<\/td>\n<td\/>\n<td\/>\n<td\/>\n<td\/>\n<\/tr><tr><th>skull_width<\/th>\n<td>\u22120.2012<\/td>\n<td>0.1327<\/td>\n<td>\u22121.52<\/td>\n<td>0.1294<\/td>\n<td>\u22120.2787<\/td>\n<td>0.1226<\/td>\n<td>\u22122.27<\/td>\n<td>0.0231<\/td>\n<\/tr><tr><th>total_length<\/th>\n<td>0.6488<\/td>\n<td>0.1531<\/td>\n<td>4.24<\/td>\n<td>0.0000<\/td>\n<td>0.5687<\/td>\n<td>0.1322<\/td>\n<td>4.30<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><th>tail_length<\/th>\n<td>\u22121.8708<\/td>\n<td>0.3741<\/td>\n<td>\u22125.00<\/td>\n<td>0.0000<\/td>\n<td>\u22121.8057<\/td>\n<td>0.3599<\/td>\n<td>\u22125.02<\/td>\n<td>0.0000<\/td>\n<\/tr><\/tbody><\/table><ol><li>Examine each of the predictors. Are there any outliers that are likely to have a very large\u00a0in\ufb02uence on the logistic regression model?<\/li>\n\t<li>The summary table for the full model indicates that at least one variable should be eliminated\u00a0when using the p-value approach for variable selection: head length. The second component of the table summarizes the reduced model following variable selection. Explain why the\u00a0remaining estimates change between the two models.<\/li>\n<\/ol><h3><strong>Exercise 16:<\/strong>\u00a0Challenger disaster, Part I<\/h3>\nOn January 28, 1986, a routine launch was anticipated for\u00a0the Challenger space shuttle. Seventy-three seconds into the \ufb02ight, disaster happened: the shuttle broke apart, killing all seven crew members on board. An investigation into the cause of the disaster focused on a critical seal called an O-ring, and it is believed that damage to these O-rings during a shuttle launch may be related to the ambient temperature during the launch. The table below summarizes observational data on O-rings for 23 shuttle missions, where the mission order is based on the temperature at the time of the launch.<em> Temp<\/em> gives the temperature in Fahrenheit, <em>Damaged<\/em> represents the number of damaged O-rings, and<em> Undamaged<\/em> represents the number of\u00a0O-rings that were not damaged.\n<table><thead><tr><th>Shuttle Mission<\/th>\n<th>1<\/th>\n<th>2<\/th>\n<th>3<\/th>\n<th>4<\/th>\n<th>5<\/th>\n<th>6<\/th>\n<th>7<\/th>\n<th>8<\/th>\n<th>9<\/th>\n<th>10<\/th>\n<th>11<\/th>\n<th>12<\/th>\n<\/tr><\/thead><tbody><tr><td>Temperature<\/td>\n<td>53<\/td>\n<td>57<\/td>\n<td>58<\/td>\n<td>63<\/td>\n<td>66<\/td>\n<td>67<\/td>\n<td>67<\/td>\n<td>67<\/td>\n<td>68<\/td>\n<td>69<\/td>\n<td>70<\/td>\n<td>70<\/td>\n<\/tr><tr><td>Damaged<\/td>\n<td>5<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<\/tr><tr><td>Undamaged<\/td>\n<td>1<\/td>\n<td>5<\/td>\n<td>5<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<\/tr><\/tbody><\/table><table><thead><tr><th>Shuttle Mission<\/th>\n<th>13<\/th>\n<th>14<\/th>\n<th>15<\/th>\n<th>16<\/th>\n<th>17<\/th>\n<th>18<\/th>\n<th>19<\/th>\n<th>20<\/th>\n<th>21<\/th>\n<th>22<\/th>\n<th>23<\/th>\n<\/tr><\/thead><tbody><tr><td>Temperature<\/td>\n<td>70<\/td>\n<td>70<\/td>\n<td>71<\/td>\n<td>73<\/td>\n<td>75<\/td>\n<td>75<\/td>\n<td>76<\/td>\n<td>76<\/td>\n<td>78<\/td>\n<td>79<\/td>\n<td>81<\/td>\n<\/tr><tr><td>Damaged<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr><tr><td>Undamaged<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<\/tr><\/tbody><\/table><ol><li>Each column of the table above represents a different shuttle mission. Examine these data\u00a0and describe what you observe with respect to the relationship between temperatures and\u00a0damaged O-rings.<\/li>\n\t<li>Failures have been coded as 1 for a damaged O-ring and 0 for an undamaged O-ring, and\u00a0a logistic regression model was fit to these data. A summary of this model is given below.\u00a0Describe the key components of this summary table in words.\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>z-value<\/td>\n<td>Pr(&gt;|z|)<\/td>\n<\/tr><tr><td>(Intercept)<\/td>\n<td>11.6630<\/td>\n<td>3.2963<\/td>\n<td>3.54<\/td>\n<td>0.0004<\/td>\n<\/tr><tr><td>Temperature<\/td>\n<td>\u22120.2162<\/td>\n<td>0.0532<\/td>\n<td>\u22124.07<\/td>\n<td>0.0000<\/td>\n<\/tr><\/tbody><\/table><\/li>\n\t<li>Write out the logistic model using the point estimates of the model parameters.<\/li>\n\t<li>Based on the model, do you think concerns regarding O-rings are justified? Explain.<\/li>\n<\/ol><h3><strong>Exercise 17:<\/strong>\u00a0Possum classification, Part II<\/h3>\nA logistic regression model was proposed for classifying\u00a0common brushtail possums into their two regions in Exercise 15. The outcome variable took\u00a0value 1 if the possum was from Victoria and 0 otherwise.\n<table><tbody><tr><td\/>\n<td>Estimate<\/td>\n<td>SE<\/td>\n<td>Z<\/td>\n<td>Pr(&gt;|Z|)<\/td>\n<\/tr><tr><th>(Intercept)<\/th>\n<td>33.5095<\/td>\n<td>9.9053<\/td>\n<td>3.38<\/td>\n<td>0.0007<\/td>\n<\/tr><tr><th>sex_male<\/th>\n<td>\u22121.4207<\/td>\n<td>0.6457<\/td>\n<td>\u22122.20<\/td>\n<td>0.0278<\/td>\n<\/tr><tr><th>skull_width<\/th>\n<td>\u22120.2787<\/td>\n<td>0.1226<\/td>\n<td>\u22122.27<\/td>\n<td>0.0231<\/td>\n<\/tr><tr><th>total_length<\/th>\n<td>0.5687<\/td>\n<td>0.1322<\/td>\n<td>4.30<\/td>\n<td>0.0000<\/td>\n<\/tr><tr><th>tail_length<\/th>\n<td>\u22121.8057<\/td>\n<td>0.3599<\/td>\n<td>\u22125.02<\/td>\n<td>0.0000<\/td>\n<\/tr><\/tbody><\/table><ol><li>Write out the form of the model. Also identify which of the variables are positively associated\u00a0when controlling for other variables.<\/li>\n\t<li>Suppose we see a brushtail possum at a zoo in the US, and a sign says the possum had been\u00a0captured in the wild in Australia, but it doesn't say which part of Australia. However, the sign does indicate that the possum is male, its skull is about 63 mm wide, its tail is 37 cm long, and its total length is 83 cm. What is the reduced model's computed probability that this possum is from Victoria? How confident are you in the model's accuracy of this probability\u00a0calculation?<\/li>\n<\/ol><h3><strong>Exercise 18:<\/strong><strong>\u00a0Challenger Disaster, Part II<\/strong><\/h3>\nExercise 16 introduced us to O-rings that were identified\u00a0as a plausible explanation for the breakup of the Challenger space shuttle 73 seconds into takeoff in 1986. The investigation found that the ambient temperature at the time of the shuttle launch was closely related to the damage of O-rings, which are a critical component of the shuttle. See\u00a0this earlier exercise if you would like to browse the original data.\n\n<img class=\"size-full wp-image-1479 alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215340\/Figure8_11.png\" alt=\"Figure8_11\" width=\"458\" height=\"272\"\/><ol><li>The data provided in the previous exercise are shown in the plot. The logistic model fit to\u00a0these data may be written as\n[latex]\\displaystyle\\log\\left(\\frac{\\hat{p}}{1-\\hat{p}}\\right)=11.6630-0.2162\\times\\text{Temperature}[\/latex]\nwhere [latex]\\hat{p}[\/latex]\u00a0is the model-estimated probability that an O-ring will become damaged. Use the\u00a0model to calculate the probability that an O-ring will become damaged at each of the following ambient temperatures: 51, 53, and 55 degrees Fahrenheit. The model-estimated probabilities for several additional ambient temperatures are provided below, where subscripts indicate the\u00a0temperature:\n[latex]\\displaystyle\\begin{array}\\hat{p}_{57}=0.341\\hfill&amp;\\hat{p}_{59}=0.251\\hfill&amp;\\hat{p}_{61}=0.179\\hfill&amp;\\hat{p}_{63}=0.124\\\\\\hat{p}_{67}=0.084\\hfill&amp;\\hat{p}_{67}=0.056\\hfill&amp;\\hat{p}_{69}=0.037\\hfill&amp;\\hat{p}_{71}-0.024\\end{array}[\/latex]<\/li>\n\t<li>Add the model-estimated probabilities from part 1 on the plot, then connect these dots using\u00a0a smooth curve to represent the model-estimated probabilities.<\/li>\n\t<li>Describe any concerns you may have regarding applying logistic regression in this application,\u00a0and note any assumptions that are required to accept the model's validity.<\/li>\n<\/ol>","rendered":"<h2>Introduction to Multiple Regression<\/h2>\n<h3>Exercise 1: Baby Weights, Part I<\/h3>\n<p>The Child Health and Development Studies investigate a range of\u00a0topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is coded 1 if the mother is a smoker, and 0 if not. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, based on the smoking status of\u00a0the mother.<a class=\"footnote\" title=\"Child\u00a0Health and Development Studies, Baby weights data set.\" id=\"return-footnote-978-1\" href=\"#footnote-978-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a><\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(&gt; |t|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>123.05<\/td>\n<td>0.65<\/td>\n<td>189.60<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>smoke<\/td>\n<td>\u20138.94<\/td>\n<td>1.03<\/td>\n<td>\u20138.65<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The variability within the smokers and non-smokers are about equal and the distributions are\u00a0symmetric. With these conditions satisfied, it is reasonable to apply the model. (Note that we\u00a0don&#8217;t need to check linearity since the predictor has only two levels.)<\/p>\n<ol>\n<li>Write the equation of the regression line.<\/li>\n<li>Interpret the slope in this context, and calculate the predicted birth weight of babies born to\u00a0smoker and non-smoker mothers.<\/li>\n<li>Is there a statistically significant relationship between the average birth weight and smoking?<\/li>\n<\/ol>\n<h3>Exercise <strong>2: Baby weights, Part II<\/strong><\/h3>\n<p>Exercise 1 introduces a data set on birth weight of babies.\u00a0Another variable we consider is parity, which is 0 if the child is the first born, and 1 otherwise. The summary table below shows the results of a linear regression model for predicting the average\u00a0birth weight of babies, measured in ounces, from parity.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(&gt; |t|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>120.07<\/td>\n<td>0.60<\/td>\n<td>199.94<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>parity<\/td>\n<td>\u20131.93<\/td>\n<td>1.19<\/td>\n<td>\u20131.62<\/td>\n<td>0.1052<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Write the equation of the regression line.<\/li>\n<li>Interpret the slope in this context, and calculate the predicted birth weight of first borns and\u00a0others.<\/li>\n<li>Is there a statistically significant relationship between the average birth weight and parity?<\/li>\n<\/ol>\n<h3>Exercise 3<strong>: <\/strong><strong>Baby weights, Part III<\/strong><\/h3>\n<p>We considered the variables smoke and parity, one at a time, in\u00a0modeling birth weights of babies in Exercises 1 and 2. A more realistic approach to modeling infant weights is to consider all possibly related variables at once. Other variables of interest include length of pregnancy in days (gestation), mother&#8217;s age in years (age), mother&#8217;s height in inches (height), and mother&#8217;s pregnancy weight in pounds (weight). Below are three observations\u00a0from this data set.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>bwt<\/td>\n<td>gestation<\/td>\n<td>parity<\/td>\n<td>age<\/td>\n<td>height<\/td>\n<td>weight<\/td>\n<td>smoke<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">120<\/td>\n<td style=\"text-align: center;\">284<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">27<\/td>\n<td style=\"text-align: center;\">62<\/td>\n<td style=\"text-align: center;\">100<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">2<\/td>\n<td style=\"text-align: center;\">113<\/td>\n<td style=\"text-align: center;\">282<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">33<\/td>\n<td style=\"text-align: center;\">64<\/td>\n<td style=\"text-align: center;\">135<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">1236<\/td>\n<td style=\"text-align: center;\">117<\/td>\n<td style=\"text-align: center;\">297<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">38<\/td>\n<td style=\"text-align: center;\">65<\/td>\n<td style=\"text-align: center;\">129<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The summary table below shows the results of a regression model for predicting the average birth\u00a0weight of babies based on all of the variables included in the data set.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(&gt;|t|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>\u201380.41<\/td>\n<td>14.35<\/td>\n<td>\u20135.60<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>gestation<\/td>\n<td>0.44<\/td>\n<td>0.03<\/td>\n<td>15.26<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>parity<\/td>\n<td>\u20133.33<\/td>\n<td>1.13<\/td>\n<td>\u20132.95<\/td>\n<td>0.0033<\/td>\n<\/tr>\n<tr>\n<td>age<\/td>\n<td>\u20130.01<\/td>\n<td>0.09<\/td>\n<td>\u20130.10<\/td>\n<td>0.9170<\/td>\n<\/tr>\n<tr>\n<td>height<\/td>\n<td>1.15<\/td>\n<td>0.21<\/td>\n<td>5.63<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>weight<\/td>\n<td>0.05<\/td>\n<td>0.03<\/td>\n<td>1.99<\/td>\n<td>0.0471<\/td>\n<\/tr>\n<tr>\n<td>smoke<\/td>\n<td>\u20138.40<\/td>\n<td>0.95<\/td>\n<td>\u20138.81<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Write the equation of the regression line that includes all of the variables.<\/li>\n<li>Interpret the slopes of gestation and age in this context.<\/li>\n<li>The coefficient for parity is different than in the linear model shown in Exercise 2. Why\u00a0might there be a difference?<\/li>\n<li>Calculate the residual for the first observation in the data set.<\/li>\n<li>The variance of the residuals is 249.28, and the variance of the birth weights of all babies\u00a0in the data set is 332.57. Calculate the <em>R<\/em><sup>2<\/sup> and the adjusted <em>R<\/em><sup>2<\/sup>. Note that there are 1,236\u00a0observations in the data set.<\/li>\n<\/ol>\n<h3>Exercise 4: Absenteeism, Part I<\/h3>\n<p>Researchers interested in the relationship between absenteeism from\u00a0school and certain demographic characteristics of children collected data from 146 randomly sampled students in rural New South Wales, Australia, in a particular school year. Below are three\u00a0observations from this data set.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>eth<\/td>\n<td>sex<\/td>\n<td>lrn<\/td>\n<td>days<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">2<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">2<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">11<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<td style=\"text-align: center;\">[latex]\\vdots[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\">146<\/td>\n<td style=\"text-align: center;\">1<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">0<\/td>\n<td style=\"text-align: center;\">37<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The summary table below shows the results of a linear regression model for predicting the average\u00a0number of days absent based on ethnic background (eth: 0\u2014aboriginal, 1\u2014not aboriginal), sex\u00a0(sex: 0\u2014female, 1\u2014male), and learner status (lrn: 0\u2014average learner, 1\u2014slow learner).<a class=\"footnote\" title=\"W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Fourth Edition. Data can also\u00a0be found in the R MASS package. New York: Springer, 2002.\" id=\"return-footnote-978-2\" href=\"#footnote-978-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a><\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(<em>&gt;<\/em>|t|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>18.93<\/td>\n<td>2.57<\/td>\n<td>7.37<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>eth<\/td>\n<td>\u20139.11<\/td>\n<td>2.60<\/td>\n<td>\u20133.51<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<td>sex<\/td>\n<td>3.10<\/td>\n<td>2.64<\/td>\n<td>1.18<\/td>\n<td>0.2411<\/td>\n<\/tr>\n<tr>\n<td>lrn<\/td>\n<td>2.15<\/td>\n<td>2.65<\/td>\n<td>0.81<\/td>\n<td>0.4177<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Write the equation of the regression line.<\/li>\n<li>Interpret each one of the slopes in this context.<\/li>\n<li>Calculate the residual for the first observation in the data set: a student who is aboriginal,\u00a0male, a slow learner, and missed 2 days of school.<\/li>\n<li>The variance of the residuals is 240.57, and the variance of the number of absent days for all\u00a0students in the data set is 264.17. Calculate the<em> <em>R<\/em><\/em><sup>2<\/sup>\u00a0and the adjusted<em> <em>R<\/em><\/em><sup>2<\/sup>. Note that there are\u00a0146 observations in the data set.<\/li>\n<\/ol>\n<h3>Exercise 5:\u00a0<strong>GPA<\/strong><\/h3>\n<p>A survey of 55 Duke University students asked about their GPA, number of hours\u00a0they study at night, number of nights they go out, and their gender. Summary output of the\u00a0regression model is shown below. Note that male is coded as 1.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(<em>&gt;<\/em>|t|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>3.45<\/td>\n<td>0.35<\/td>\n<td>9.85<\/td>\n<td>0.00<\/td>\n<\/tr>\n<tr>\n<td>studyweek<\/td>\n<td>0.00<\/td>\n<td>0.00<\/td>\n<td>0.27<\/td>\n<td>0.79<\/td>\n<\/tr>\n<tr>\n<td>sleepnight<\/td>\n<td>0.01<\/td>\n<td>0.05<\/td>\n<td>0.11<\/td>\n<td>0.91<\/td>\n<\/tr>\n<tr>\n<td>outnight<\/td>\n<td>0.05<\/td>\n<td>0.05<\/td>\n<td>1.01<\/td>\n<td>0.32<\/td>\n<\/tr>\n<tr>\n<td>gender<\/td>\n<td>\u20130.08<\/td>\n<td>0.12<\/td>\n<td>\u20130.68<\/td>\n<td>0.50<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Calculate a 95% confidence interval for the coefficient of gender in the model, and interpret it\u00a0in the context of the data.<\/li>\n<li>Would you expect a 95% confidence interval for the slope of the remaining variables to include\u00a00? Explain<\/li>\n<\/ol>\n<h3>Exercise 6:\u00a0Cherry Trees<\/h3>\n<p>Timber yield is approximately equal to the volume of a tree, however, thisvalue is difficult to measure without first cutting the tree down. Instead, other variables, such as height and diameter, may be used to predict a tree&#8217;s volume and yield. Researchers wanting to understand the relationship between these variables for black cherry trees collected data from 31 such trees in the Allegheny National Forest, Pennsylvania. Height is measured in feet, diameter\u00a0in inches (at 54 inches above ground), and volume in cubic feet.<a class=\"footnote\" title=\"D.J. Hand. A handbook of small data sets. Chapman &amp; Hall\/CRC, 1994.\" id=\"return-footnote-978-3\" href=\"#footnote-978-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a><\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>t-value<\/td>\n<td>Pr(<em>&gt;<\/em> |t|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>\u201357.99<\/td>\n<td>8.64<\/td>\n<td>\u20136.71<\/td>\n<td>0.00<\/td>\n<\/tr>\n<tr>\n<td>height<\/td>\n<td>0.34<\/td>\n<td>0.13<\/td>\n<td>2.61<\/td>\n<td>0.01<\/td>\n<\/tr>\n<tr>\n<td>diameter<\/td>\n<td>4.71<\/td>\n<td>0.26<\/td>\n<td>17.82<\/td>\n<td>0.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Calculate a 95% confidence interval for the coefficient of height, and interpret it in the context\u00a0of the data.<\/li>\n<li>One tree in this sample is 79 feet tall, has a diameter of 11.3 inches, and is 24.2 cubic feet in\u00a0volume. Determine if the model overestimates or underestimates the volume of this tree, and\u00a0by how much.<\/li>\n<\/ol>\n<h2>Model Selection<\/h2>\n<h3><strong>Exercise\u00a07: Baby weights, Part IV<\/strong><\/h3>\n<p>Exercise 3 considers a model that predicts a newborn&#8217;s weight\u00a0using several predictors (gestation length, parity, age of mother, height of mother, weight of mother, smoking status of mother). The table below shows the adjusted R-squared for the full model as well as adjusted R-squared values for all models we evaluate in the first step of the backwards\u00a0elimination process.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Model<\/td>\n<td>Adjusted<i>\u00a0<em><em>R<\/em><\/em><\/i><sup>2<\/sup><\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>Full model<\/td>\n<td>0.2541<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>No gestation<\/td>\n<td>0.1031<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>No parity<\/td>\n<td>0.2492<\/td>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>No age<\/td>\n<td>0.2547<\/td>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>No height<\/td>\n<td>0.2311<\/td>\n<\/tr>\n<tr>\n<td>6<\/td>\n<td>No weight<\/td>\n<td>0.2536<\/td>\n<\/tr>\n<tr>\n<td>7<\/td>\n<td>No smoking status<\/td>\n<td>0.2072<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Which, if any, variable should be removed from the model first?<\/p>\n<h3><strong>Exercise 8: <\/strong>Absenteeism, Part II<\/h3>\n<p>Exercise 4 considers a model that predicts the number of days\u00a0absent using three predictors: ethnic background (eth), gender (sex), and learner status (lrn). The table below shows the adjusted R-squared for the model as well as adjusted R-squared values\u00a0for all models we evaluate in the first step of the backwards elimination process.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Model<\/td>\n<td>Adjusted<i>\u00a0<em><em>R<\/em><\/em><\/i><sup>2<\/sup><\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>Full model<\/td>\n<td>0.0701<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>No ethnicity<\/td>\n<td>\u20130.0033<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>No sex<\/td>\n<td>0.0676<\/td>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>No learner status<\/td>\n<td>0.0723<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Which, if any, variable should be removed from the model first?<\/p>\n<h3><strong>Exercise 9: <\/strong>Baby weights, Part V<\/h3>\n<p>Exercise 3 provides regression output for the full model (including\u00a0all explanatory variables available in the data set) for predicting birth weight of babies. In this exercise we consider a forward-selection algorithm and add variables to the model one-at-a-time. The table below shows the <em>p<\/em>-value and adjusted<em> <i><em><em>R<\/em><\/em><\/i><\/em><sup>2<\/sup>\u00a0of each model where we include only the\u00a0corresponding predictor. Based on this table, which variable should be added to the model first?<\/p>\n<table>\n<tbody>\n<tr>\n<td>variable<\/td>\n<td>gestation<\/td>\n<td>parity<\/td>\n<td>age<\/td>\n<td>height<\/td>\n<td>weight<\/td>\n<td>smoke<\/td>\n<\/tr>\n<tr>\n<td><em>p<\/em>-value<\/td>\n<td>2.2 \u00d7 10<sup>\u221216<\/sup><\/td>\n<td>0.1052<\/td>\n<td>0.2375<\/td>\n<td>2.97 \u00d7 10<sup>\u221212<\/sup><\/td>\n<td>8.2 \u00d7 10<sup>\u22128<\/sup><\/td>\n<td>2.2 \u00d7 10<sup>\u221216<\/sup><\/td>\n<\/tr>\n<tr>\n<td><em>R<\/em><sup>2<\/sup><em><sub>adj<\/sub><\/em><\/td>\n<td>0.4657<\/td>\n<td>0.0013<\/td>\n<td>0.0003<\/td>\n<td>0.0386<\/td>\n<td>0.0229<\/td>\n<td>0.0569<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><strong>Exercise 10: <\/strong>Absenteeism, Part III<\/h3>\n<p>Exercise 4 provides regression output for the full model, including all explanatory variables available in the data set, for predicting the number of days absent from school. In this exercise we consider a forward-selection algorithm and add variables to the model one-at-a-time. The table below shows the <em>p<\/em>-value and adjusted<em> <em><i><em><em>R<\/em><\/em><\/i><\/em><\/em><sup>2<\/sup>\u00a0of each model where we include only the corresponding predictor. Based on this table, which variable should be added to\u00a0the model first?<\/p>\n<table>\n<tbody>\n<tr>\n<td>variable<\/td>\n<td>ethnicity<\/td>\n<td>sex<\/td>\n<td>learner status<\/td>\n<\/tr>\n<tr>\n<td><em>p<\/em>-value<\/td>\n<td>0.007<\/td>\n<td>0.3142<\/td>\n<td>0.5870<\/td>\n<\/tr>\n<tr>\n<td><em>R<\/em><sup>2<\/sup><em><sub>adj<\/sub><\/em><\/td>\n<td>0.0714<\/td>\n<td>0.0001<\/td>\n<td>0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><strong>Exercise 11: <\/strong>Movie lovers, Part I<\/h3>\n<p>Suppose a social scientist is interested in studying what makes\u00a0audiences love or hate a movie. She collects a random sample of movies (genre, length, cast, director, budget, etc.) as well as a measure of the success of the movie (score on a film review aggregator website). If as part of her research she is interested in finding out which variables are\u00a0significant predictors of movie success, what type of model selection method should she use?<\/p>\n<h3><strong>Exercise 12: <\/strong>Movie lovers, Part II<\/h3>\n<p>Suppose an online media streaming company is interested in building a movie recommendation system. The website maintains data on the movies in their database (genre, length, cast, director, budget, etc.) and additionally collects data from their subscribers (demographic information, previously watched movies, how they rated previously watched movies, etc.). The recommendation system will be deemed successful if subscribers actually watch, and rate highly, the movies recommended to them. Should the company use the adjusted<em> <em><em><i><em><em>R<\/em><\/em><\/i><\/em><\/em><sup>2<\/sup><\/em>\u00a0or the\u00a0p-value approach in selecting variables for their recommendation system?<\/p>\n<h2>Checking Model Assumptions Using Graphs<\/h2>\n<h3><strong>Exercise 13<\/strong>: Baby weights, Part V<\/h3>\n<p>Exercise 3 presents a regression model for predicting the average\u00a0birth weight of babies based on length of gestation, parity, height, weight, and smoking status of the mother. Determine if the model assumptions are met using the plots below. If not, describe\u00a0how to proceed with the analysis.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1475\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215333\/Figure8_8.jpg\" alt=\"Figure8_8\" width=\"676\" height=\"1044\" \/><\/p>\n<h3><strong>Exercise 14<\/strong>:\u00a0GPA and IQ<\/h3>\n<p>A regression model for predicting GPA from gender and IQ was fit, and\u00a0both predictors were found to be statistically significant. Using the plots given below, determine\u00a0if this regression model is appropriate for these data.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1476\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215336\/Figure8_9.jpg\" alt=\"Figure8_9\" width=\"773\" height=\"955\" \/><\/p>\n<h2>Introduction to Logistic Regression<\/h2>\n<h3><strong>Exercise 15<\/strong>:\u00a0Possum classification, Part I<\/h3>\n<div id=\"attachment_1477\" style=\"width: 310px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1477\" class=\"wp-image-1477\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215148\/5653697137_e7c24f507a_z.jpg\" alt=\"photo of an Australian possum. A small four-legged marsupial with short brown fur.\" width=\"300\" height=\"200\" \/><\/p>\n<p id=\"caption-attachment-1477\" class=\"wp-caption-text\">Figure 1. The common brushtail possum of Australia<\/p>\n<\/div>\n<p>The common brushtail possum of the Australia region is a\u00a0bit cuter than its distant cousin, the American opossum. We consider 104 brushtail possums from two regions in Australia, where the possums may be considered a random sample from the population. The first region is Victoria, which is in the eastern half of Australia and traverses the southern coast. The second region consists of New South Wales and\u00a0Queensland, which make up eastern and northeastern Australia.<\/p>\n<p>We use logistic regression to differentiate between possums in these two regions. The outcome\u00a0variable, called population, takes value 1 when a possum is from Victoria and 0 when it is from New South Wales or Queensland. We consider five predictors: sex male (an indicator for a possum being male), head length, skull width, total length, and tail length. Each variable is summarized in a histogram. The full logistic regression model and a reduced model after variable\u00a0selection are summarized in the table.<\/p>\n<p>\u00a0<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1478\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215338\/Figure8_10.png\" alt=\"Figure8_10\" width=\"796\" height=\"364\" \/><\/p>\n<table>\n<thead>\n<tr>\n<th>\n<\/th>\n<th colspan=\"4\">Full Model<\/th>\n<th colspan=\"4\">Reduced Model<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>SE<\/td>\n<td>Z<\/td>\n<td>Pr(&gt;|Z|)<\/td>\n<td>Estimate<\/td>\n<td>SE<\/td>\n<td>Z<\/td>\n<td>Pr(&gt;|Z|)<\/td>\n<\/tr>\n<tr>\n<th>(Intercept)<\/th>\n<td>39.2349<\/td>\n<td>11.5368<\/td>\n<td>3.40<\/td>\n<td>0.0007<\/td>\n<td>33.5095<\/td>\n<td>9.9053<\/td>\n<td>3.38<\/td>\n<td>0.0007<\/td>\n<\/tr>\n<tr>\n<th>sex_male<\/th>\n<td>\u22121.2376<\/td>\n<td>0.6662<\/td>\n<td>\u22121.86<\/td>\n<td>0.0632<\/td>\n<td>\u22121.4207<\/td>\n<td>0.6457<\/td>\n<td>\u22122.20<\/td>\n<td>0.0278<\/td>\n<\/tr>\n<tr>\n<th>head_length<\/th>\n<td>\u22120.1601<\/td>\n<td>0.1386<\/td>\n<td>\u22121.16<\/td>\n<td>0.2480<\/td>\n<td>\n<\/td>\n<td>\n<\/td>\n<td>\n<\/td>\n<td>\n<\/td>\n<\/tr>\n<tr>\n<th>skull_width<\/th>\n<td>\u22120.2012<\/td>\n<td>0.1327<\/td>\n<td>\u22121.52<\/td>\n<td>0.1294<\/td>\n<td>\u22120.2787<\/td>\n<td>0.1226<\/td>\n<td>\u22122.27<\/td>\n<td>0.0231<\/td>\n<\/tr>\n<tr>\n<th>total_length<\/th>\n<td>0.6488<\/td>\n<td>0.1531<\/td>\n<td>4.24<\/td>\n<td>0.0000<\/td>\n<td>0.5687<\/td>\n<td>0.1322<\/td>\n<td>4.30<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<th>tail_length<\/th>\n<td>\u22121.8708<\/td>\n<td>0.3741<\/td>\n<td>\u22125.00<\/td>\n<td>0.0000<\/td>\n<td>\u22121.8057<\/td>\n<td>0.3599<\/td>\n<td>\u22125.02<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Examine each of the predictors. Are there any outliers that are likely to have a very large\u00a0in\ufb02uence on the logistic regression model?<\/li>\n<li>The summary table for the full model indicates that at least one variable should be eliminated\u00a0when using the p-value approach for variable selection: head length. The second component of the table summarizes the reduced model following variable selection. Explain why the\u00a0remaining estimates change between the two models.<\/li>\n<\/ol>\n<h3><strong>Exercise 16:<\/strong>\u00a0Challenger disaster, Part I<\/h3>\n<p>On January 28, 1986, a routine launch was anticipated for\u00a0the Challenger space shuttle. Seventy-three seconds into the \ufb02ight, disaster happened: the shuttle broke apart, killing all seven crew members on board. An investigation into the cause of the disaster focused on a critical seal called an O-ring, and it is believed that damage to these O-rings during a shuttle launch may be related to the ambient temperature during the launch. The table below summarizes observational data on O-rings for 23 shuttle missions, where the mission order is based on the temperature at the time of the launch.<em> Temp<\/em> gives the temperature in Fahrenheit, <em>Damaged<\/em> represents the number of damaged O-rings, and<em> Undamaged<\/em> represents the number of\u00a0O-rings that were not damaged.<\/p>\n<table>\n<thead>\n<tr>\n<th>Shuttle Mission<\/th>\n<th>1<\/th>\n<th>2<\/th>\n<th>3<\/th>\n<th>4<\/th>\n<th>5<\/th>\n<th>6<\/th>\n<th>7<\/th>\n<th>8<\/th>\n<th>9<\/th>\n<th>10<\/th>\n<th>11<\/th>\n<th>12<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Temperature<\/td>\n<td>53<\/td>\n<td>57<\/td>\n<td>58<\/td>\n<td>63<\/td>\n<td>66<\/td>\n<td>67<\/td>\n<td>67<\/td>\n<td>67<\/td>\n<td>68<\/td>\n<td>69<\/td>\n<td>70<\/td>\n<td>70<\/td>\n<\/tr>\n<tr>\n<td>Damaged<\/td>\n<td>5<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<td>Undamaged<\/td>\n<td>1<\/td>\n<td>5<\/td>\n<td>5<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<thead>\n<tr>\n<th>Shuttle Mission<\/th>\n<th>13<\/th>\n<th>14<\/th>\n<th>15<\/th>\n<th>16<\/th>\n<th>17<\/th>\n<th>18<\/th>\n<th>19<\/th>\n<th>20<\/th>\n<th>21<\/th>\n<th>22<\/th>\n<th>23<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Temperature<\/td>\n<td>70<\/td>\n<td>70<\/td>\n<td>71<\/td>\n<td>73<\/td>\n<td>75<\/td>\n<td>75<\/td>\n<td>76<\/td>\n<td>76<\/td>\n<td>78<\/td>\n<td>79<\/td>\n<td>81<\/td>\n<\/tr>\n<tr>\n<td>Damaged<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<td>Undamaged<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>5<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<td>6<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Each column of the table above represents a different shuttle mission. Examine these data\u00a0and describe what you observe with respect to the relationship between temperatures and\u00a0damaged O-rings.<\/li>\n<li>Failures have been coded as 1 for a damaged O-ring and 0 for an undamaged O-ring, and\u00a0a logistic regression model was fit to these data. A summary of this model is given below.\u00a0Describe the key components of this summary table in words.<br \/>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>Std. Error<\/td>\n<td>z-value<\/td>\n<td>Pr(&gt;|z|)<\/td>\n<\/tr>\n<tr>\n<td>(Intercept)<\/td>\n<td>11.6630<\/td>\n<td>3.2963<\/td>\n<td>3.54<\/td>\n<td>0.0004<\/td>\n<\/tr>\n<tr>\n<td>Temperature<\/td>\n<td>\u22120.2162<\/td>\n<td>0.0532<\/td>\n<td>\u22124.07<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/li>\n<li>Write out the logistic model using the point estimates of the model parameters.<\/li>\n<li>Based on the model, do you think concerns regarding O-rings are justified? Explain.<\/li>\n<\/ol>\n<h3><strong>Exercise 17:<\/strong>\u00a0Possum classification, Part II<\/h3>\n<p>A logistic regression model was proposed for classifying\u00a0common brushtail possums into their two regions in Exercise 15. The outcome variable took\u00a0value 1 if the possum was from Victoria and 0 otherwise.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<\/td>\n<td>Estimate<\/td>\n<td>SE<\/td>\n<td>Z<\/td>\n<td>Pr(&gt;|Z|)<\/td>\n<\/tr>\n<tr>\n<th>(Intercept)<\/th>\n<td>33.5095<\/td>\n<td>9.9053<\/td>\n<td>3.38<\/td>\n<td>0.0007<\/td>\n<\/tr>\n<tr>\n<th>sex_male<\/th>\n<td>\u22121.4207<\/td>\n<td>0.6457<\/td>\n<td>\u22122.20<\/td>\n<td>0.0278<\/td>\n<\/tr>\n<tr>\n<th>skull_width<\/th>\n<td>\u22120.2787<\/td>\n<td>0.1226<\/td>\n<td>\u22122.27<\/td>\n<td>0.0231<\/td>\n<\/tr>\n<tr>\n<th>total_length<\/th>\n<td>0.5687<\/td>\n<td>0.1322<\/td>\n<td>4.30<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<tr>\n<th>tail_length<\/th>\n<td>\u22121.8057<\/td>\n<td>0.3599<\/td>\n<td>\u22125.02<\/td>\n<td>0.0000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol>\n<li>Write out the form of the model. Also identify which of the variables are positively associated\u00a0when controlling for other variables.<\/li>\n<li>Suppose we see a brushtail possum at a zoo in the US, and a sign says the possum had been\u00a0captured in the wild in Australia, but it doesn&#8217;t say which part of Australia. However, the sign does indicate that the possum is male, its skull is about 63 mm wide, its tail is 37 cm long, and its total length is 83 cm. What is the reduced model&#8217;s computed probability that this possum is from Victoria? How confident are you in the model&#8217;s accuracy of this probability\u00a0calculation?<\/li>\n<\/ol>\n<h3><strong>Exercise 18:<\/strong><strong>\u00a0Challenger Disaster, Part II<\/strong><\/h3>\n<p>Exercise 16 introduced us to O-rings that were identified\u00a0as a plausible explanation for the breakup of the Challenger space shuttle 73 seconds into takeoff in 1986. The investigation found that the ambient temperature at the time of the shuttle launch was closely related to the damage of O-rings, which are a critical component of the shuttle. See\u00a0this earlier exercise if you would like to browse the original data.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1479 alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/132\/2016\/04\/21215340\/Figure8_11.png\" alt=\"Figure8_11\" width=\"458\" height=\"272\" \/><\/p>\n<ol>\n<li>The data provided in the previous exercise are shown in the plot. The logistic model fit to\u00a0these data may be written as<br \/>\n[latex]\\displaystyle\\log\\left(\\frac{\\hat{p}}{1-\\hat{p}}\\right)=11.6630-0.2162\\times\\text{Temperature}[\/latex]<br \/>\nwhere [latex]\\hat{p}[\/latex]\u00a0is the model-estimated probability that an O-ring will become damaged. Use the\u00a0model to calculate the probability that an O-ring will become damaged at each of the following ambient temperatures: 51, 53, and 55 degrees Fahrenheit. The model-estimated probabilities for several additional ambient temperatures are provided below, where subscripts indicate the\u00a0temperature:<br \/>\n[latex]\\displaystyle\\begin{array}\\hat{p}_{57}=0.341\\hfill&\\hat{p}_{59}=0.251\\hfill&\\hat{p}_{61}=0.179\\hfill&\\hat{p}_{63}=0.124\\\\\\hat{p}_{67}=0.084\\hfill&\\hat{p}_{67}=0.056\\hfill&\\hat{p}_{69}=0.037\\hfill&\\hat{p}_{71}-0.024\\end{array}[\/latex]<\/li>\n<li>Add the model-estimated probabilities from part 1 on the plot, then connect these dots using\u00a0a smooth curve to represent the model-estimated probabilities.<\/li>\n<li>Describe any concerns you may have regarding applying logistic regression in this application,\u00a0and note any assumptions that are required to accept the model&#8217;s validity.<\/li>\n<\/ol>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-978\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>OpenIntro Statistics. <strong>Authored by<\/strong>: David M Diez, Christopher D Barr, and Mine Cetinkaya-Rundel. <strong>Provided by<\/strong>: OpenIntro. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/www.openintro.org\/stat\/textbook.php\">https:\/\/www.openintro.org\/stat\/textbook.php<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/\">CC BY-SA: Attribution-ShareAlike<\/a><\/em>. <strong>License Terms<\/strong>: This textbook is available under a Creative Commons license. Visit openintro.org for a free  PDF, to download the textbook&#039;s source files.<\/li><li>Common Brushtail Possum. <strong>Authored by<\/strong>: Greg Schechter. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/flic.kr\/p\/9BAFbR\">https:\/\/flic.kr\/p\/9BAFbR<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section><hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-978-1\">Child\u00a0Health and Development Studies, Baby weights data set. <a href=\"#return-footnote-978-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-978-2\">W. N. Venables and B. D. Ripley.<em> Modern Applied Statistics with S<\/em>. Fourth Edition. Data can also\u00a0be found in the R MASS package. New York: Springer, 2002. <a href=\"#return-footnote-978-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-978-3\">D.J. Hand.<em> A handbook of small data sets<\/em>. Chapman &amp; Hall\/CRC, 1994. <a href=\"#return-footnote-978-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":21,"menu_order":5,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"OpenIntro Statistics\",\"author\":\"David M Diez, Christopher D Barr, and Mine Cetinkaya-Rundel\",\"organization\":\"OpenIntro\",\"url\":\"https:\/\/www.openintro.org\/stat\/textbook.php\",\"project\":\"\",\"license\":\"cc-by-sa\",\"license_terms\":\"This textbook is available under a Creative Commons license. Visit openintro.org for a free  PDF, to download the textbook's source files.\"},{\"type\":\"cc\",\"description\":\"Common Brushtail Possum\",\"author\":\"Greg Schechter\",\"organization\":\"\",\"url\":\"https:\/\/flic.kr\/p\/9BAFbR\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-978","chapter","type-chapter","status-publish","hentry"],"part":961,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/chapters\/978","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/wp\/v2\/users\/21"}],"version-history":[{"count":1,"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/chapters\/978\/revisions"}],"predecessor-version":[{"id":1235,"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/chapters\/978\/revisions\/1235"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/parts\/961"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/chapters\/978\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/wp\/v2\/media?parent=978"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/pressbooks\/v2\/chapter-type?post=978"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/wp\/v2\/contributor?post=978"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/ntcc-introstats1\/wp-json\/wp\/v2\/license?post=978"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}