{"id":936,"date":"2017-05-11T17:18:24","date_gmt":"2017-05-11T17:18:24","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/chapter\/chapter-8-multiple-linear-regression\/"},"modified":"2017-05-11T18:31:38","modified_gmt":"2017-05-11T18:31:38","slug":"chapter-8-multiple-linear-regression","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/chapter\/chapter-8-multiple-linear-regression\/","title":{"raw":"Chapter 8: Multiple Linear Regression","rendered":"Chapter 8: Multiple Linear Regression"},"content":{"raw":"<div class=\"Basic-Text-Frame\">\r\n\r\nIt frequently happens that a dependent variable (<em>y<\/em>) in which we are interested is related to more than one independent variable. If this relationship can be estimated, it may enable us to make more precise predictions of the dependent variable than would be possible by a simple linear regression. Regressions based on more than one independent variable are called <strong class=\"Strong-2\">multiple regressions<\/strong>.\r\n\r\nMultiple linear regression is an extension of simple linear regression and many of the ideas we examined in simple linear regression carry over to the multiple regression setting. For example, scatterplots, correlation, and least squares method are still essential components for a multiple regression.\r\n\r\nFor example, a habitat suitability index (used to evaluate the impact on wildlife habitat from land use changes) for ruffed grouse might be related to three factors:\r\n<p class=\"BlockQuote\" style=\"padding-left: 30px\"><strong class=\"Strong-2\">x<\/strong><sub><span class=\"Subscript SmallText\">1<\/span><\/sub> = stem density\r\n<strong class=\"Strong-2\">x<\/strong><sub><span class=\"Subscript SmallText\">2<\/span><\/sub> = percent of conifers\r\n<strong class=\"Strong-2\">x<\/strong><sub><span class=\"Subscript SmallText\">3<\/span><\/sub> = amount of understory herbaceous matter<\/p>\r\nA researcher would collect data on these variables and use the sample data to construct a regression equation relating these three variables to the response. The researcher will have questions about his model similar to a simple linear regression model.\r\n<ul>\r\n \t<li class=\"BlockQuote\">How strong is the relationship between y and the three predictor variables?<\/li>\r\n \t<li class=\"BlockQuote\">How well does the model fit?<\/li>\r\n \t<li class=\"BlockQuote\">Have any important assumptions been violated?<\/li>\r\n \t<li class=\"BlockQuote\">How good are the estimates and predictions?<\/li>\r\n<\/ul>\r\nThe general linear regression model takes the form of\r\n<p class=\"Centered\"><span class=\"Inline-Equation\"><img class=\"frame-57\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171729\/13323.png\" alt=\"13323.png\" \/><\/span>,<\/p>\r\nwith the mean value of y given as\r\n<p class=\"Centered\"><span class=\"Inline-Equation\"><img class=\"frame-57\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171730\/13331.png\" alt=\"13331.png\" \/><\/span>,<\/p>\r\nwhere:\r\n<ul>\r\n \t<li class=\"List-Paragraph\">y is the random response variable and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03bc<\/span><span class=\"Subscript SmallText\">y<\/span> is the mean value of <em>y,<\/em><\/li>\r\n \t<li class=\"List-Paragraph\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span>, and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> are the parameters to be estimated based on the sample data,<\/li>\r\n \t<li class=\"List-Paragraph\"><em>x<\/em><span class=\"Subscript SmallText\">1<\/span><em>, x<\/em><span class=\"Subscript SmallText\">2<\/span><em>,\u2026, x<\/em><span class=\"Subscript SmallText\">k<\/span> are the predictor variables that are assumed to be non-random or fixed and measured without error, and k is the number of predictor variable,<\/li>\r\n \t<li class=\"List-Paragraph\">and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b5<\/span> is the random error, which allows each response to deviate from the average value of <em>y<\/em>. The errors are assumed to be independent, have a mean of zero and a common variance (<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span><span class=\"Superscript SmallText\">2<\/span>), and are normally distributed.<\/li>\r\n<\/ul>\r\nAs you can see, the multiple regression model and assumptions are very similar to those for a simple linear regression model with one predictor variable. Examining residual plots and normal probability plots for the residuals is key to verifying the assumptions.\r\n<h2>Correlation<\/h2>\r\nAs with simple linear regression, we should always begin with a scatterplot of the response variable versus each predictor variable. Linear correlation coefficients for each pair should also be computed. Instead of computing the correlation of each pair individually, we can create a correlation matrix, which shows the linear correlation between each pair of variables under consideration in a multiple linear regression model.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"511\"]<img class=\"frame-53\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171732\/13236.png\" alt=\"13236.png\" width=\"511\" height=\"208\" \/> Table 1. A correlation matrix.[\/caption]\r\n<p class=\"Caption\" style=\"text-align: left\">In this matrix, the upper value is the linear correlation coefficient and the lower value is the p-value for testing the null hypothesis that a correlation coefficient is equal to zero. This matrix allows us to see the strength and direction of the linear relationship between each predictor variable and the response variable, but also the relationship between the predictor variables. For example, <em>y<\/em> and <em>x1<\/em> have a strong, positive linear relationship with r = 0.816, which is statistically significant because p = 0.000. We can also see that predictor variables <em>x1<\/em> and <em>x3<\/em> have a moderately strong positive linear relationship (r = 0.588) that is significant (p = 0.001).<\/p>\r\nThere are many different reasons for selecting which explanatory variables to include in our model (see Model Development and Selection), however, we frequently choose the ones that have a high linear correlation with the response variable, but we must be careful. We do not want to include explanatory variables that are highly correlated among themselves. We need to be aware of any multicollinearity between predictor variables.\r\n<p class=\"Callout\"><span class=\"pullquote-left\"><strong class=\"char-style-override-2\">Multicollinearity<\/strong> exists between two explanatory variables if they have a strong linear relationship.<\/span><\/p>\r\nFor example, if we are trying to predict a person\u2019s blood pressure, one predictor variable would be weight and another predictor variable would be diet. Both predictor variables are highly correlated with blood pressure (as weight increases blood pressure typically increases, and as diet increases blood pressure also increases). But, both predictor variables are also highly correlated with each other. Both of these predictor variables are conveying essentially the same information when it comes to explaining blood pressure. Including both in the model may lead to problems when estimating the coefficients, as multicollinearity increases the standard errors of the coefficients. This means that coefficients for some variables may be found <strong class=\"Strong-2\">not<\/strong> to be significantly different from zero, whereas without multicollinearity and with lower standard errors, the same coefficients might have been found significant. Ways to test for multicollinearity are not covered in this text, however a general rule of thumb is to be wary of a linear correlation of less than -0.7 and greater than 0.7 between two predictor variables. Always examine the correlation matrix for relationships between predictor variables to avoid multicollinearity issues.\r\n<h2>Estimation<\/h2>\r\nEstimation and inference procedures are also very similar to simple linear regression. Just as we used our sample data to estimate <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span> and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> for our simple linear regression model, we are going to extend this process to estimate all the coefficients for our multiple regression models.\r\n\r\nWith the simpler population model\r\n<p class=\"Centered\"><span class=\"Inline-Equation\"><img class=\"frame-21\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171734\/13364.png\" alt=\"13364.png\" \/><\/span><em>x<\/em><\/p>\r\n<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> is the slope and tells the user what the change in the response would be as the predictor variable changes. With multiple predictor variables, and therefore multiple parameters to estimate, the coefficients <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1,<\/span> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> and so on are called partial slopes or partial regression coefficients. The partial slope <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">i<\/span> measures the change in <em>y<\/em> for a one-unit change in <em>x<\/em><span class=\"Subscript SmallText\">i<\/span> when <strong class=\"Strong-2\">all other independent variables are held constant.<\/strong> These regression coefficients must be estimated from the sample data in order to obtain the general form of the estimated multiple regression equation\r\n<p class=\"Centered\"><img class=\"frame-1\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171734\/13373.png\" alt=\"13373.png\" \/><\/p>\r\nand the population model\r\n<p class=\"Centered\"><img class=\"frame-1\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171736\/13386.png\" alt=\"13386.png\" \/><\/p>\r\nwhere <em>k<\/em> = the number of independent variables (also called predictor variables)\r\n<p class=\"BlockQuote\"><span class=\"Symbols\" xml:lang=\"ar-SA\"><em>y\u0302<\/em><\/span> = the predicted value of the dependent variable (computed by using the multiple regression equation)<\/p>\r\n<p class=\"BlockQuote\"><em>x<\/em><span class=\"Subscript SmallText\">1<\/span>, <em>x<\/em><span class=\"Subscript SmallText\">2<\/span>, \u2026, <em>x<\/em><span class=\"Subscript SmallText\">k<\/span> = the independent variables<\/p>\r\n<p class=\"BlockQuote\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span> is the y-intercept (the value of y when all the predictor variables equal 0)<\/p>\r\n<p class=\"BlockQuote\"><em>b<\/em><span class=\"Subscript SmallText\">0<\/span> is the estimate of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span> based on that sample data<\/p>\r\n<p class=\"BlockQuote\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span><em>,\u2026<\/em><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> are the coefficients of the independent variables <em>x<\/em><span class=\"Subscript SmallText\">1<\/span>, <em>x<\/em><span class=\"Subscript SmallText\">2<\/span>, \u2026, <em>x<\/em><span class=\"Subscript SmallText\">k<\/span><\/p>\r\n<p class=\"BlockQuote\"><em>b<\/em><span class=\"Subscript SmallText\">1<\/span><em>, b<\/em><span class=\"Subscript SmallText\">2<\/span><em>, b<\/em><span class=\"Subscript SmallText\">3<\/span>, \u2026, <em>b<\/em><span class=\"Subscript SmallText\">k<\/span> are the sample estimates of the coefficients <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span><em>,\u2026<\/em><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span><\/p>\r\nThe method of least-squares is still used to fit the model to the data. Remember that this method minimizes the sum of the squared deviations of the observed and predicted values (SSE).\r\n\r\nThe analysis of variance table for multiple regression has a similar appearance to that of a simple linear regression.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"901\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171738\/13226.png\" alt=\"13226.png\" width=\"901\" height=\"227\" \/> Table 2. ANOVA table.[\/caption]\r\n\r\nWhere k is the number of predictor variables and n is the number of observations.\r\n\r\nThe best estimate of the random variation <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span><span class=\"Superscript SmallText\">2<\/span>\u2014the variation that is unexplained by the predictor variables\u2014is still s<span class=\"Superscript SmallText\">2<\/span>, the MSE. The regression standard error, s, is the square root of the MSE.\r\n\r\nA new column in the ANOVA table for multiple linear regression shows a decomposition of SSR, in which the conditional contribution of each predictor variable <em>given the variables already entered into the model<\/em> is shown for the order of entry that you specify in your regression. These conditional or <strong class=\"Strong-2\">sequential sums of squares<\/strong> each account for 1 regression degree of freedom, and allow the user to see the contribution of each predictor variable to the total variation explained by the regression model by using the ratio:\r\n<p class=\"Centered\"><span class=\"Inline-Equation-Large\"><img class=\"frame-43 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171739\/13413.png\" alt=\"13413.png\" \/><\/span><\/p>\r\n\r\n<h2>Adjusted R<span class=\"Superscript SmallText\">2<\/span><\/h2>\r\nIn simple linear regression, we used the relationship between the explained and total variation as a measure of model fit:\r\n<p class=\"Centered\"><img class=\"frame-59 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171740\/13422.png\" alt=\"13422.png\" \/><\/p>\r\nNotice from this definition that the value of the coefficient of determination can never decrease with the addition of more variables into the regression model. Hence, R<span class=\"Superscript SmallText\">2<\/span> can be artificially inflated as more variables (significant or not) are included in the model. An alternative measure of strength of the regression model is adjusted for degrees of freedom by using mean squares rather than sums of squares:\r\n<p class=\"Centered\"><img class=\"frame-49 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171742\/13429.png\" alt=\"13429.png\" \/><\/p>\r\nThe adjusted R<span class=\"Superscript SmallText\">2<\/span> value represents the percentage of variation in the response variable explained by the independent variables, corrected for degrees of freedom. Unlike R<span class=\"Superscript SmallText\">2<\/span>, the adjusted R<span class=\"Superscript SmallText\">2<\/span> will not tend to increase as variables are added and it will tend to stabilize around some upper limit as variables are added.\r\n<h2>Tests of Significance<\/h2>\r\nRecall in the previous chapter we tested to see if <em>y<\/em> and <em>x<\/em> were linearly related by testing\r\n<table class=\"Table\"><colgroup> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> = 0<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">H<span class=\"Subscript SmallText\">1<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> \u2260 0<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nwith the t-test (or the equivalent F-test). In multiple linear regression, there are several partial slopes and the t-test and F-test are no longer equivalent. Our question changes: Is the regression equation that uses information provided by the predictor variables x<span class=\"Subscript SmallText\">1<\/span>, x<span class=\"Subscript SmallText\">2<\/span>, x<span class=\"Subscript SmallText\">3<\/span>, \u2026, x<span class=\"Subscript SmallText\">k<\/span>, better than the simple predictor <span class=\"Inline-Equation\"><img class=\"frame-5\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171744\/13615.png\" alt=\"13615.png\" \/><\/span>(the mean response value), which does not rely on any of these independent variables?\r\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> = \u2026=<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> = 0<\/p>\r\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">1<\/span>: At least one of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> , <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> , \u2026<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> \u2260 0<\/p>\r\nThe F-test statistic is used to answer this question and is found in the ANOVA table.\r\n<p class=\"Centered\"><img class=\"frame-17 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171744\/13437.png\" alt=\"13437.png\" \/><\/p>\r\nThis test statistic follows the F-distribution with df<span class=\"Subscript SmallText\">1<\/span> = k and df<span class=\"Subscript SmallText\">2<\/span> = (n-k-1). Since the exact p-value is given in the output, you can use the Decision Rule to answer the question.\r\n<p class=\"Callout\"><span class=\"pullquote-left\">If the p-value is less than the level of significance, reject the null hypothesis.<\/span><\/p>\r\nRejecting the null hypothesis supports the claim that at least one of the predictor variables has a significant linear relationship with the response variable. The next step is to determine which predictor variables add important information for prediction in the presence of other predictors already in the model. To test the significance of the partial regression coefficients, you need to examine each relationship separately using individual t-tests.\r\n<table class=\"Table\"><colgroup> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr style=\"height: 43.0312px\">\r\n<td class=\"Table\" style=\"height: 43.0312px\">\r\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">i<\/span> = 0<\/p>\r\n<\/td>\r\n<td class=\"Table\" style=\"height: 43.0312px\">\r\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">1<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">i<\/span> \u2260 0<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p class=\"Centered\"><span class=\"Inline-Equation-Large\">\r\n<img class=\"frame-17\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171745\/13447.png\" alt=\"13447.png\" \/><\/span> with df = (n-k-1)<\/p>\r\nwhere <em>SE(b<sub>i<\/sub>)<\/em> is the standard error of <em>b<sub><span class=\"Subscript SmallText\">i<\/span><\/sub><\/em>. Exact p-values are also given for these tests. Examining specific p-values for each predictor variable will allow you to decide which variables are significantly related to the response variable. Typically, any insignificant variables are removed from the model, but remember these tests are done with other variables in the model. A good procedure is to remove the least significant variable and then refit the model with the reduced data set. With each new model, always check the regression standard error (lower is better), the adjusted R<sup><span class=\"Superscript SmallText\">2<\/span><\/sup> (higher is better), the p-values for all predictor variables, and the residual and normal probability plots.\r\n\r\nBecause of the complexity of the calculations, we will rely on software to fit the model and give us the regression coefficients. Don\u2019t forget\u2026 you always begin with scatterplots. Strong relationships between predictor and response variables make for a good model.\r\n<div class=\"textbox examples\">\r\n<h3>Example 1<\/h3>\r\nA researcher collected data in a project to predict the annual growth per acre of upland boreal forests in southern Canada. They hypothesized that cubic foot volume growth (<em>y<\/em>) is a function of stand basal area per acre (<em>x<\/em><span class=\"Subscript SmallText\">1<\/span>), the percentage of that basal area in black spruce (<em>x<\/em><span class=\"Subscript SmallText\">2<\/span>), and the stand\u2019s site index for black spruce (<em>x<\/em><span class=\"Subscript SmallText\">3<\/span>). <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b1<\/span> = 0.05.\r\n\r\n<\/div>\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"901\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171746\/132151.png\" alt=\"132151.png\" width=\"901\" height=\"473\" \/> Table 3. Observed data for cubic feet, stand basal area, percent basal area in black spruce, and site index.[\/caption]\r\n\r\nScatterplots of the response variable versus each predictor variable were created along with a correlation matrix.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"901\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171749\/13205.png\" alt=\"13205.png\" width=\"901\" height=\"658\" \/> Figure 1. Scatterplots of cubic feet versus basal area, percent basal area in black spruce, and site index.[\/caption]\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"589\"]<img class=\"frame-11\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171751\/13195.png\" alt=\"13195.png\" width=\"589\" height=\"383\" \/> Table 4. Correlation matrix.[\/caption]\r\n\r\nAs you can see from the scatterplots and the correlation matrix, BA\/ac has the strongest linear relationship with CuFt volume (r = 0.816) and %BA in black spruce has the weakest linear relationship (r = 0.413). Also of note is the moderately strong correlation between the two predictor variables, BA\/ac and SI (r = 0.588). All three predictor variables have significant linear relationships with the response variable (volume) so we will begin by using all variables in our multiple linear regression model. The Minitab output is given below.\r\n\r\nWe begin by testing the following null and alternative hypotheses:\r\n<p class=\"Centered\">H<sub><span class=\"Subscript SmallText\">0<\/span><\/sub>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">1<\/span><\/sub> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">2<\/span><\/sub> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">3<\/span><\/sub> = 0<\/p>\r\n<p class=\"Centered\">H<sub><span class=\"Subscript SmallText\">1<\/span><\/sub>: At least one of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">1<\/span><\/sub>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">2<\/span><\/sub> , <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">3<\/span><\/sub> \u2260 0<\/p>\r\n\r\n<h4>General Regression Analysis: CuFt versus BA\/ac, SI, %BA Bspruce<\/h4>\r\n<table class=\"Table\" style=\"font-size: 0.7em;margin: 1px 1px 1px 1px\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Regression Equation<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\" colspan=\"7\">\r\n<p class=\"Table\">CuFt = -19.3858 + 0.591004 BA\/ac + 0.0899883 SI + 0.489441 %BA Bspruce<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Coefficients<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Term<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SE Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">T<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">P<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">95% CI<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Constant<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-19.3858<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4.15332<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-4.6675<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(-27.9578, -10.8137)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">BA\/ac<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.5910<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.04294<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">13.7647<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(0.5024, 0.6796)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SI<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.0900<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.11262<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.7991<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.432<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(-0.1424, 0.3224)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">%BA Bspruce<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.4894<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.05245<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">9.3311<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(0.3812, 0.5977)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Summary of Model<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">S = 3.17736<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">R-Sq = 95.53%<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">R-Sq(adj) = 94.97%<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">PRESS = 322.279<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">R-Sq(pred) = 94.05%<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Analysis of Variance<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Source<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">DF<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Seq SS<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Adj SS<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Adj MS<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">F<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">P<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Regression<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">3<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5176.56<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5176.56<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1725.52<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">170.918<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">0.000000<\/strong><\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">BA\/ac<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">3611.17<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1912.79<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1912.79<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">189.467<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000000<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SI<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">686.37<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">6.45<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">6.45<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.638<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.432094<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">%BA Bspruce<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">879.02<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">879.02<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">879.02<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">87.069<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000000<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Error<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">24<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">242.30<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">242.30<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">10.10<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Total<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">27<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5418.86<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nThe F-test statistic (and associated p-value) is used to answer this question and is found in the ANOVA table. For this example, F = 170.918 with a p-value of 0.00000. The p-value is smaller than our level of significance (0.0000&lt;0.05) so we will reject the null hypothesis. At least one of the predictor variables significantly contributes to the prediction of volume.\r\n\r\nThe coefficients for the three predictor variables are all positive indicating that as they increase cubic foot volume will also increase. For example, if we hold values of SI and %BA Bspruce constant, this equation tells us that as basal area increases by 1 sq. ft., volume will increase an additional 0.591004 cu. ft. The signs of these coefficients are logical, and what we would expect. The adjusted R<sup><span class=\"Superscript SmallText\">2<\/span><\/sup> is also very high at 94.97%.\r\n\r\nThe next step is to examine the individual t-tests for each predictor variable. The test statistics and associated p-values are found in the Minitab output and repeated below:\r\n<table class=\"Table\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"6\">\r\n<p class=\"Table-Heading\">Coefficients<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Term<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SE Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">T<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">P<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">95% CI<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Constant<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-19.3858<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4.15332<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-4.6675<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">(-27.9578, -10.8137)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">BA\/ac<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.5910<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.04294<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">13.7647<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">0.000<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">( 0.5024, 0.6796)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SI<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.0900<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.11262<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">0.7991<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">0.432<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">( -0.1424, 0.3224)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">%BA Bspruce<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.4894<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.05245<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">9.3311<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\"><strong class=\"Strong-2\">0.000<\/strong><\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">( 0.3812, 0.5977)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nThe predictor variables BA\/ac and %BA Bspruce have t-statistics of 13.7647 and 9.3311 and p-values of 0.0000, indicating that both are significantly contributing to the prediction of volume. However, SI has a t-statistic of 0.7991 with a p-value of 0.432. This variable does not significantly contribute to the prediction of cubic foot volume.\r\n\r\nThis result may surprise you as SI had the second strongest relationship with volume, but don\u2019t forget about the correlation between SI and BA\/ac (r = 0.588). The predictor variable BA\/ac had the strongest linear relationship with volume, and using the sequential sums of squares, we can see that BA\/ac is already accounting for 70% of the variation in cubic foot volume (3611.17\/5176.56 = 0.6976). The information from SI may be too similar to the information in BA\/ac, and SI only explains about 13% of the variation on volume (686.37\/5176.56 = 0.1326) given that BA\/ac is already in the model.\r\n\r\nThe next step is to examine the residual and normal probability plots. A single outlier is evident in the otherwise acceptable plots.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"1030\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171754\/13186.png\" alt=\"13186.png\" width=\"1030\" height=\"405\" \/> Figure 2. Residual and normal probability plots.[\/caption]\r\n\r\n<strong class=\"Strong-2\">So where do we go from here?<\/strong>\r\n\r\nWe will remove the non-significant variable and re-fit the model excluding the data for SI in our model. The Minitab output is given below.\r\n<h4>General Regression Analysis: CuFt versus BA\/ac, %BA Bspruce<\/h4>\r\n<table class=\"Table\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Regression Equation<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\" colspan=\"7\">\r\n<p class=\"Table\">CuFt = -19.1142 + 0.615531 BA\/ac + 0.515122 %BA Bspruce<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Coefficients<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Term<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SE Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">T<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">P<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">95% CI<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Constant<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-19.1142<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4.10936<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-4.6514<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(-27.5776, -10.6508)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">BA\/ac<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.6155<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.02980<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">20.6523<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(0.5541, 0.6769)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">%BA Bspruce<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.5151<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.04115<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">12.5173<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">(0.4304, 0.5999)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Summary of Model<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">S = 3.15431<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">R-Sq = 95.41%<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">R-Sq(adj) = 95.04%<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">PRESS = 298.712<\/p>\r\n<\/td>\r\n<td class=\"Table\" colspan=\"2\">\r\n<p class=\"Table\">R-Sq(pred) = 94.49%<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"7\">\r\n<p class=\"Table-Heading\">Analysis of Variance<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Source<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">DF<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SeqSS<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">AdjSS<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">AdjMS<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">F<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">P<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Regression<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">2<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5170.12<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5170.12<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">2585.06<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">259.814<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.0000000<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">BA\/ac<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">3611.17<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4243.71<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4243.71<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">426.519<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.0000000<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">%BA Bspruce<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1558.95<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1558.95<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">1558.95<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">156.684<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.0000000<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Error<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">25<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">248.74<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">248.74<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">9.95<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Total<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">27<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">5418.86<\/p>\r\n<\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<td class=\"Table\"><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nWe will repeat the steps followed with our first model. We begin by again testing the following hypotheses:\r\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> = 0<\/p>\r\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">1<\/span>: At least one of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> , <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> \u2260 0<\/p>\r\nThis reduced model has an F-statistic equal to 259.814 and a p-value of 0.0000. We will reject the null hypothesis. At least one of the predictor variables significantly contributes to the prediction of volume. The coefficients are still positive (as we expected) but the values have changed to account for the different model.\r\n\r\nThe individual t-tests for each coefficient (repeated below) show that both predictor variables are significantly different from zero and contribute to the prediction of volume.\r\n<table class=\"Table\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td class=\"Table-Heading\" colspan=\"6\">\r\n<p class=\"Table-Heading\">Coefficients<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Term<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">SE Coef<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">T<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">P<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">95% CI<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">Constant<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-19.1142<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">4.10936<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">-4.6514<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">(-27.5776, -10.6508)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">BA\/ac<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.6155<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.02980<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">20.6523<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">( 0.5541, 0.6769)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td class=\"Table\">\r\n<p class=\"Table\">%BA Bspruce<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.5151<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.04115<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">12.5173<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">0.000<\/p>\r\n<\/td>\r\n<td class=\"Table\">\r\n<p class=\"Table\">( 0.4304, 0.5999)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nNotice that the adjusted R<span class=\"Superscript SmallText\">2<\/span> has increased from 94.97% to 95.04% indicating a slightly better fit to the data. The regression standard error has also changed for the better, decreasing from 3.17736 to 3.15431 indicating slightly less variation of the observed data to the model.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"1040\"]<img class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171758\/131751.png\" alt=\"131751.png\" width=\"1040\" height=\"359\" \/> Figure 3. Residual and normal probability plots.[\/caption]\r\n\r\nThe residual and normal probability plots have changed little, still not indicating any issues with the regression assumption. By removing the non-significant variable, the model has improved.\r\n<h2>Model Development and Selection<\/h2>\r\nThere are many different reasons for creating a multiple linear regression model and its purpose directly influences how the model is created. Listed below are several of the more commons uses for a regression model:\r\n<ol>\r\n \t<li class=\"List-Paragraph-Number-1\">Describing the behavior of your response variable<\/li>\r\n \t<li class=\"List-Paragraph-Number-1\">Predicting a response or estimating the average response<\/li>\r\n \t<li class=\"List-Paragraph-Number-1\">Estimating the parameters (<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span>, \u2026)<\/li>\r\n \t<li class=\"List-Paragraph-Number-1\">Developing an accurate model of the process<\/li>\r\n<\/ol>\r\nDepending on your objective for creating a regression model, your methodology may vary when it comes to variable selection, retention, and elimination.\r\n\r\nWhen the object is simple description of your response variable, you are typically less concerned about eliminating non-significant variables. The best representation of the response variable, in terms of minimal residual sums of squares, is the full model, which includes all predictor variables available from the data set. It is less important that the variables are causally related or that the model is realistic.\r\n\r\nA common reason for creating a regression model is for prediction and estimating. A researcher wants to be able to define events within the x-space of data that were collected for this model, and it is assumed that the system will continue to function as it did when the data were collected. Any measurable predictor variables that contain information on the response variable should be included. For this reason, non-significant variables may be retained in the model. However, regression equations with fewer variables are easier to use and have an economic advantage in terms of data collection. Additionally, there is a greater confidence attached to models that contain only significant variables.\r\n\r\nIf the objective is to estimate the model parameters, you will be more cautious when considering variable elimination. You want to avoid introducing a bias by removing a variable that has predictive information about the response. However, there is a statistical advantage in terms of reduced variance of the parameter estimates if variables truly unrelated to the response variable are removed.\r\n\r\nBuilding a realistic model of the process you are studying is often a primary goal of much research. It is important to identify the variables that are linked to the response through some causal relationship. While you can identify which variables have a strong correlation with the response, this only serves as an indicator of which variables require further study. The principal objective is to develop a model whose functional form realistically reflects the behavior of a system.\r\n\r\nThe following figure is a strategy for building a regression model.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"370\"]<img class=\"frame-60\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171801\/153_1_fmt.png\" alt=\"153_1.tif\" width=\"370\" height=\"576\" \/> Figure 4. Strategy for building a regression model.[\/caption]\r\n<h2>Software Solutions<\/h2>\r\n<h3>Minitab<\/h3>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-27 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171804\/155_1_fmt.png\" alt=\"155_1.tif\" \/><\/span><\/p>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-70 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171807\/155_2_fmt.png\" alt=\"155_2.tif\" \/><\/span><\/p>\r\n<p class=\"No-Caption\"><img class=\"frame-66 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171810\/155_3_fmt.png\" alt=\"155_3.tif\" \/><\/p>\r\nThe output and plots are given in the previous example.\r\n<h3>Excel<\/h3>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-107 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171813\/154_1_fmt.png\" alt=\"154_1.tif\" \/><\/span><\/p>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-79 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171817\/154_2_fmt.png\" alt=\"154_2.tif\" \/><\/span><\/p>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-108 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171820\/154_3_fmt.png\" alt=\"154_3.tif\" \/><\/span><\/p>\r\n<p class=\"No-Caption\"><span class=\"Picture\"><img class=\"frame-173 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171823\/154_4_fmt.png\" alt=\"154_4.tif\" \/><\/span><\/p>\r\n\r\n<\/div>","rendered":"<div class=\"Basic-Text-Frame\">\n<p>It frequently happens that a dependent variable (<em>y<\/em>) in which we are interested is related to more than one independent variable. If this relationship can be estimated, it may enable us to make more precise predictions of the dependent variable than would be possible by a simple linear regression. Regressions based on more than one independent variable are called <strong class=\"Strong-2\">multiple regressions<\/strong>.<\/p>\n<p>Multiple linear regression is an extension of simple linear regression and many of the ideas we examined in simple linear regression carry over to the multiple regression setting. For example, scatterplots, correlation, and least squares method are still essential components for a multiple regression.<\/p>\n<p>For example, a habitat suitability index (used to evaluate the impact on wildlife habitat from land use changes) for ruffed grouse might be related to three factors:<\/p>\n<p class=\"BlockQuote\" style=\"padding-left: 30px\"><strong class=\"Strong-2\">x<\/strong><sub><span class=\"Subscript SmallText\">1<\/span><\/sub> = stem density<br \/>\n<strong class=\"Strong-2\">x<\/strong><sub><span class=\"Subscript SmallText\">2<\/span><\/sub> = percent of conifers<br \/>\n<strong class=\"Strong-2\">x<\/strong><sub><span class=\"Subscript SmallText\">3<\/span><\/sub> = amount of understory herbaceous matter<\/p>\n<p>A researcher would collect data on these variables and use the sample data to construct a regression equation relating these three variables to the response. The researcher will have questions about his model similar to a simple linear regression model.<\/p>\n<ul>\n<li class=\"BlockQuote\">How strong is the relationship between y and the three predictor variables?<\/li>\n<li class=\"BlockQuote\">How well does the model fit?<\/li>\n<li class=\"BlockQuote\">Have any important assumptions been violated?<\/li>\n<li class=\"BlockQuote\">How good are the estimates and predictions?<\/li>\n<\/ul>\n<p>The general linear regression model takes the form of<\/p>\n<p class=\"Centered\"><span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-57\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171729\/13323.png\" alt=\"13323.png\" \/><\/span>,<\/p>\n<p>with the mean value of y given as<\/p>\n<p class=\"Centered\"><span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-57\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171730\/13331.png\" alt=\"13331.png\" \/><\/span>,<\/p>\n<p>where:<\/p>\n<ul>\n<li class=\"List-Paragraph\">y is the random response variable and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03bc<\/span><span class=\"Subscript SmallText\">y<\/span> is the mean value of <em>y,<\/em><\/li>\n<li class=\"List-Paragraph\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span>, and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> are the parameters to be estimated based on the sample data,<\/li>\n<li class=\"List-Paragraph\"><em>x<\/em><span class=\"Subscript SmallText\">1<\/span><em>, x<\/em><span class=\"Subscript SmallText\">2<\/span><em>,\u2026, x<\/em><span class=\"Subscript SmallText\">k<\/span> are the predictor variables that are assumed to be non-random or fixed and measured without error, and k is the number of predictor variable,<\/li>\n<li class=\"List-Paragraph\">and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b5<\/span> is the random error, which allows each response to deviate from the average value of <em>y<\/em>. The errors are assumed to be independent, have a mean of zero and a common variance (<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span><span class=\"Superscript SmallText\">2<\/span>), and are normally distributed.<\/li>\n<\/ul>\n<p>As you can see, the multiple regression model and assumptions are very similar to those for a simple linear regression model with one predictor variable. Examining residual plots and normal probability plots for the residuals is key to verifying the assumptions.<\/p>\n<h2>Correlation<\/h2>\n<p>As with simple linear regression, we should always begin with a scatterplot of the response variable versus each predictor variable. Linear correlation coefficients for each pair should also be computed. Instead of computing the correlation of each pair individually, we can create a correlation matrix, which shows the linear correlation between each pair of variables under consideration in a multiple linear regression model.<\/p>\n<div style=\"width: 521px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-53\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171732\/13236.png\" alt=\"13236.png\" width=\"511\" height=\"208\" \/><\/p>\n<p class=\"wp-caption-text\">Table 1. A correlation matrix.<\/p>\n<\/div>\n<p class=\"Caption\" style=\"text-align: left\">In this matrix, the upper value is the linear correlation coefficient and the lower value is the p-value for testing the null hypothesis that a correlation coefficient is equal to zero. This matrix allows us to see the strength and direction of the linear relationship between each predictor variable and the response variable, but also the relationship between the predictor variables. For example, <em>y<\/em> and <em>x1<\/em> have a strong, positive linear relationship with r = 0.816, which is statistically significant because p = 0.000. We can also see that predictor variables <em>x1<\/em> and <em>x3<\/em> have a moderately strong positive linear relationship (r = 0.588) that is significant (p = 0.001).<\/p>\n<p>There are many different reasons for selecting which explanatory variables to include in our model (see Model Development and Selection), however, we frequently choose the ones that have a high linear correlation with the response variable, but we must be careful. We do not want to include explanatory variables that are highly correlated among themselves. We need to be aware of any multicollinearity between predictor variables.<\/p>\n<p class=\"Callout\"><span class=\"pullquote-left\"><strong class=\"char-style-override-2\">Multicollinearity<\/strong> exists between two explanatory variables if they have a strong linear relationship.<\/span><\/p>\n<p>For example, if we are trying to predict a person\u2019s blood pressure, one predictor variable would be weight and another predictor variable would be diet. Both predictor variables are highly correlated with blood pressure (as weight increases blood pressure typically increases, and as diet increases blood pressure also increases). But, both predictor variables are also highly correlated with each other. Both of these predictor variables are conveying essentially the same information when it comes to explaining blood pressure. Including both in the model may lead to problems when estimating the coefficients, as multicollinearity increases the standard errors of the coefficients. This means that coefficients for some variables may be found <strong class=\"Strong-2\">not<\/strong> to be significantly different from zero, whereas without multicollinearity and with lower standard errors, the same coefficients might have been found significant. Ways to test for multicollinearity are not covered in this text, however a general rule of thumb is to be wary of a linear correlation of less than -0.7 and greater than 0.7 between two predictor variables. Always examine the correlation matrix for relationships between predictor variables to avoid multicollinearity issues.<\/p>\n<h2>Estimation<\/h2>\n<p>Estimation and inference procedures are also very similar to simple linear regression. Just as we used our sample data to estimate <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span> and <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> for our simple linear regression model, we are going to extend this process to estimate all the coefficients for our multiple regression models.<\/p>\n<p>With the simpler population model<\/p>\n<p class=\"Centered\"><span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-21\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171734\/13364.png\" alt=\"13364.png\" \/><\/span><em>x<\/em><\/p>\n<p><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> is the slope and tells the user what the change in the response would be as the predictor variable changes. With multiple predictor variables, and therefore multiple parameters to estimate, the coefficients <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1,<\/span> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> and so on are called partial slopes or partial regression coefficients. The partial slope <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">i<\/span> measures the change in <em>y<\/em> for a one-unit change in <em>x<\/em><span class=\"Subscript SmallText\">i<\/span> when <strong class=\"Strong-2\">all other independent variables are held constant.<\/strong> These regression coefficients must be estimated from the sample data in order to obtain the general form of the estimated multiple regression equation<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-1\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171734\/13373.png\" alt=\"13373.png\" \/><\/p>\n<p>and the population model<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-1\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171736\/13386.png\" alt=\"13386.png\" \/><\/p>\n<p>where <em>k<\/em> = the number of independent variables (also called predictor variables)<\/p>\n<p class=\"BlockQuote\"><span class=\"Symbols\" xml:lang=\"ar-SA\"><em>y\u0302<\/em><\/span> = the predicted value of the dependent variable (computed by using the multiple regression equation)<\/p>\n<p class=\"BlockQuote\"><em>x<\/em><span class=\"Subscript SmallText\">1<\/span>, <em>x<\/em><span class=\"Subscript SmallText\">2<\/span>, \u2026, <em>x<\/em><span class=\"Subscript SmallText\">k<\/span> = the independent variables<\/p>\n<p class=\"BlockQuote\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span> is the y-intercept (the value of y when all the predictor variables equal 0)<\/p>\n<p class=\"BlockQuote\"><em>b<\/em><span class=\"Subscript SmallText\">0<\/span> is the estimate of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span> based on that sample data<\/p>\n<p class=\"BlockQuote\"><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span><em>,\u2026<\/em><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> are the coefficients of the independent variables <em>x<\/em><span class=\"Subscript SmallText\">1<\/span>, <em>x<\/em><span class=\"Subscript SmallText\">2<\/span>, \u2026, <em>x<\/em><span class=\"Subscript SmallText\">k<\/span><\/p>\n<p class=\"BlockQuote\"><em>b<\/em><span class=\"Subscript SmallText\">1<\/span><em>, b<\/em><span class=\"Subscript SmallText\">2<\/span><em>, b<\/em><span class=\"Subscript SmallText\">3<\/span>, \u2026, <em>b<\/em><span class=\"Subscript SmallText\">k<\/span> are the sample estimates of the coefficients <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span><em>,<\/em> <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span><em>,\u2026<\/em><span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span><\/p>\n<p>The method of least-squares is still used to fit the model to the data. Remember that this method minimizes the sum of the squared deviations of the observed and predicted values (SSE).<\/p>\n<p>The analysis of variance table for multiple regression has a similar appearance to that of a simple linear regression.<\/p>\n<div style=\"width: 911px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171738\/13226.png\" alt=\"13226.png\" width=\"901\" height=\"227\" \/><\/p>\n<p class=\"wp-caption-text\">Table 2. ANOVA table.<\/p>\n<\/div>\n<p>Where k is the number of predictor variables and n is the number of observations.<\/p>\n<p>The best estimate of the random variation <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03c3<\/span><span class=\"Superscript SmallText\">2<\/span>\u2014the variation that is unexplained by the predictor variables\u2014is still s<span class=\"Superscript SmallText\">2<\/span>, the MSE. The regression standard error, s, is the square root of the MSE.<\/p>\n<p>A new column in the ANOVA table for multiple linear regression shows a decomposition of SSR, in which the conditional contribution of each predictor variable <em>given the variables already entered into the model<\/em> is shown for the order of entry that you specify in your regression. These conditional or <strong class=\"Strong-2\">sequential sums of squares<\/strong> each account for 1 regression degree of freedom, and allow the user to see the contribution of each predictor variable to the total variation explained by the regression model by using the ratio:<\/p>\n<p class=\"Centered\"><span class=\"Inline-Equation-Large\"><img decoding=\"async\" class=\"frame-43 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171739\/13413.png\" alt=\"13413.png\" \/><\/span><\/p>\n<h2>Adjusted R<span class=\"Superscript SmallText\">2<\/span><\/h2>\n<p>In simple linear regression, we used the relationship between the explained and total variation as a measure of model fit:<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-59 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171740\/13422.png\" alt=\"13422.png\" \/><\/p>\n<p>Notice from this definition that the value of the coefficient of determination can never decrease with the addition of more variables into the regression model. Hence, R<span class=\"Superscript SmallText\">2<\/span> can be artificially inflated as more variables (significant or not) are included in the model. An alternative measure of strength of the regression model is adjusted for degrees of freedom by using mean squares rather than sums of squares:<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-49 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171742\/13429.png\" alt=\"13429.png\" \/><\/p>\n<p>The adjusted R<span class=\"Superscript SmallText\">2<\/span> value represents the percentage of variation in the response variable explained by the independent variables, corrected for degrees of freedom. Unlike R<span class=\"Superscript SmallText\">2<\/span>, the adjusted R<span class=\"Superscript SmallText\">2<\/span> will not tend to increase as variables are added and it will tend to stabilize around some upper limit as variables are added.<\/p>\n<h2>Tests of Significance<\/h2>\n<p>Recall in the previous chapter we tested to see if <em>y<\/em> and <em>x<\/em> were linearly related by testing<\/p>\n<table class=\"Table\">\n<colgroup>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> = 0<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">H<span class=\"Subscript SmallText\">1<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> \u2260 0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>with the t-test (or the equivalent F-test). In multiple linear regression, there are several partial slopes and the t-test and F-test are no longer equivalent. Our question changes: Is the regression equation that uses information provided by the predictor variables x<span class=\"Subscript SmallText\">1<\/span>, x<span class=\"Subscript SmallText\">2<\/span>, x<span class=\"Subscript SmallText\">3<\/span>, \u2026, x<span class=\"Subscript SmallText\">k<\/span>, better than the simple predictor <span class=\"Inline-Equation\"><img decoding=\"async\" class=\"frame-5\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171744\/13615.png\" alt=\"13615.png\" \/><\/span>(the mean response value), which does not rely on any of these independent variables?<\/p>\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> = \u2026=<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> = 0<\/p>\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">1<\/span>: At least one of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> , <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> , \u2026<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">k<\/span> \u2260 0<\/p>\n<p>The F-test statistic is used to answer this question and is found in the ANOVA table.<\/p>\n<p class=\"Centered\"><img decoding=\"async\" class=\"frame-17 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171744\/13437.png\" alt=\"13437.png\" \/><\/p>\n<p>This test statistic follows the F-distribution with df<span class=\"Subscript SmallText\">1<\/span> = k and df<span class=\"Subscript SmallText\">2<\/span> = (n-k-1). Since the exact p-value is given in the output, you can use the Decision Rule to answer the question.<\/p>\n<p class=\"Callout\"><span class=\"pullquote-left\">If the p-value is less than the level of significance, reject the null hypothesis.<\/span><\/p>\n<p>Rejecting the null hypothesis supports the claim that at least one of the predictor variables has a significant linear relationship with the response variable. The next step is to determine which predictor variables add important information for prediction in the presence of other predictors already in the model. To test the significance of the partial regression coefficients, you need to examine each relationship separately using individual t-tests.<\/p>\n<table class=\"Table\">\n<colgroup>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr style=\"height: 43.0312px\">\n<td class=\"Table\" style=\"height: 43.0312px\">\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">i<\/span> = 0<\/p>\n<\/td>\n<td class=\"Table\" style=\"height: 43.0312px\">\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">1<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">i<\/span> \u2260 0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p class=\"Centered\"><span class=\"Inline-Equation-Large\"><br \/>\n<img decoding=\"async\" class=\"frame-17\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171745\/13447.png\" alt=\"13447.png\" \/><\/span> with df = (n-k-1)<\/p>\n<p>where <em>SE(b<sub>i<\/sub>)<\/em> is the standard error of <em>b<sub><span class=\"Subscript SmallText\">i<\/span><\/sub><\/em>. Exact p-values are also given for these tests. Examining specific p-values for each predictor variable will allow you to decide which variables are significantly related to the response variable. Typically, any insignificant variables are removed from the model, but remember these tests are done with other variables in the model. A good procedure is to remove the least significant variable and then refit the model with the reduced data set. With each new model, always check the regression standard error (lower is better), the adjusted R<sup><span class=\"Superscript SmallText\">2<\/span><\/sup> (higher is better), the p-values for all predictor variables, and the residual and normal probability plots.<\/p>\n<p>Because of the complexity of the calculations, we will rely on software to fit the model and give us the regression coefficients. Don\u2019t forget\u2026 you always begin with scatterplots. Strong relationships between predictor and response variables make for a good model.<\/p>\n<div class=\"textbox examples\">\n<h3>Example 1<\/h3>\n<p>A researcher collected data in a project to predict the annual growth per acre of upland boreal forests in southern Canada. They hypothesized that cubic foot volume growth (<em>y<\/em>) is a function of stand basal area per acre (<em>x<\/em><span class=\"Subscript SmallText\">1<\/span>), the percentage of that basal area in black spruce (<em>x<\/em><span class=\"Subscript SmallText\">2<\/span>), and the stand\u2019s site index for black spruce (<em>x<\/em><span class=\"Subscript SmallText\">3<\/span>). <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b1<\/span> = 0.05.<\/p>\n<\/div>\n<div style=\"width: 911px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171746\/132151.png\" alt=\"132151.png\" width=\"901\" height=\"473\" \/><\/p>\n<p class=\"wp-caption-text\">Table 3. Observed data for cubic feet, stand basal area, percent basal area in black spruce, and site index.<\/p>\n<\/div>\n<p>Scatterplots of the response variable versus each predictor variable were created along with a correlation matrix.<\/p>\n<div style=\"width: 911px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171749\/13205.png\" alt=\"13205.png\" width=\"901\" height=\"658\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 1. Scatterplots of cubic feet versus basal area, percent basal area in black spruce, and site index.<\/p>\n<\/div>\n<div style=\"width: 599px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-11\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171751\/13195.png\" alt=\"13195.png\" width=\"589\" height=\"383\" \/><\/p>\n<p class=\"wp-caption-text\">Table 4. Correlation matrix.<\/p>\n<\/div>\n<p>As you can see from the scatterplots and the correlation matrix, BA\/ac has the strongest linear relationship with CuFt volume (r = 0.816) and %BA in black spruce has the weakest linear relationship (r = 0.413). Also of note is the moderately strong correlation between the two predictor variables, BA\/ac and SI (r = 0.588). All three predictor variables have significant linear relationships with the response variable (volume) so we will begin by using all variables in our multiple linear regression model. The Minitab output is given below.<\/p>\n<p>We begin by testing the following null and alternative hypotheses:<\/p>\n<p class=\"Centered\">H<sub><span class=\"Subscript SmallText\">0<\/span><\/sub>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">1<\/span><\/sub> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">2<\/span><\/sub> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">3<\/span><\/sub> = 0<\/p>\n<p class=\"Centered\">H<sub><span class=\"Subscript SmallText\">1<\/span><\/sub>: At least one of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">1<\/span><\/sub>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">2<\/span><\/sub> , <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><sub><span class=\"Subscript SmallText\">3<\/span><\/sub> \u2260 0<\/p>\n<h4>General Regression Analysis: CuFt versus BA\/ac, SI, %BA Bspruce<\/h4>\n<table class=\"Table\" style=\"font-size: 0.7em;margin: 1px 1px 1px 1px\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Regression Equation<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\" colspan=\"7\">\n<p class=\"Table\">CuFt = -19.3858 + 0.591004 BA\/ac + 0.0899883 SI + 0.489441 %BA Bspruce<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Coefficients<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Term<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">SE Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">T<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">P<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">95% CI<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Constant<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-19.3858<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4.15332<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-4.6675<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(-27.9578, -10.8137)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">BA\/ac<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.5910<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.04294<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">13.7647<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(0.5024, 0.6796)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">SI<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.0900<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.11262<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.7991<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.432<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(-0.1424, 0.3224)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">%BA Bspruce<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.4894<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.05245<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">9.3311<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(0.3812, 0.5977)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Summary of Model<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">S = 3.17736<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">R-Sq = 95.53%<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">R-Sq(adj) = 94.97%<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">PRESS = 322.279<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">R-Sq(pred) = 94.05%<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Analysis of Variance<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Source<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">DF<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Seq SS<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Adj SS<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Adj MS<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">F<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">P<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Regression<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">3<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5176.56<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5176.56<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1725.52<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">170.918<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">0.000000<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">BA\/ac<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">3611.17<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1912.79<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1912.79<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">189.467<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">SI<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">686.37<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">6.45<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">6.45<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.638<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.432094<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">%BA Bspruce<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">879.02<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">879.02<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">879.02<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">87.069<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Error<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">24<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">242.30<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">242.30<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">10.10<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Total<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">27<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5418.86<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The F-test statistic (and associated p-value) is used to answer this question and is found in the ANOVA table. For this example, F = 170.918 with a p-value of 0.00000. The p-value is smaller than our level of significance (0.0000&lt;0.05) so we will reject the null hypothesis. At least one of the predictor variables significantly contributes to the prediction of volume.<\/p>\n<p>The coefficients for the three predictor variables are all positive indicating that as they increase cubic foot volume will also increase. For example, if we hold values of SI and %BA Bspruce constant, this equation tells us that as basal area increases by 1 sq. ft., volume will increase an additional 0.591004 cu. ft. The signs of these coefficients are logical, and what we would expect. The adjusted R<sup><span class=\"Superscript SmallText\">2<\/span><\/sup> is also very high at 94.97%.<\/p>\n<p>The next step is to examine the individual t-tests for each predictor variable. The test statistics and associated p-values are found in the Minitab output and repeated below:<\/p>\n<table class=\"Table\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table-Heading\" colspan=\"6\">\n<p class=\"Table-Heading\">Coefficients<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Term<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">SE Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">T<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">P<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">95% CI<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Constant<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-19.3858<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4.15332<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-4.6675<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">(-27.9578, -10.8137)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">BA\/ac<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.5910<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.04294<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">13.7647<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">0.000<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">( 0.5024, 0.6796)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">SI<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.0900<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.11262<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">0.7991<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">0.432<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">( -0.1424, 0.3224)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">%BA Bspruce<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.4894<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.05245<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">9.3311<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\"><strong class=\"Strong-2\">0.000<\/strong><\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">( 0.3812, 0.5977)<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The predictor variables BA\/ac and %BA Bspruce have t-statistics of 13.7647 and 9.3311 and p-values of 0.0000, indicating that both are significantly contributing to the prediction of volume. However, SI has a t-statistic of 0.7991 with a p-value of 0.432. This variable does not significantly contribute to the prediction of cubic foot volume.<\/p>\n<p>This result may surprise you as SI had the second strongest relationship with volume, but don\u2019t forget about the correlation between SI and BA\/ac (r = 0.588). The predictor variable BA\/ac had the strongest linear relationship with volume, and using the sequential sums of squares, we can see that BA\/ac is already accounting for 70% of the variation in cubic foot volume (3611.17\/5176.56 = 0.6976). The information from SI may be too similar to the information in BA\/ac, and SI only explains about 13% of the variation on volume (686.37\/5176.56 = 0.1326) given that BA\/ac is already in the model.<\/p>\n<p>The next step is to examine the residual and normal probability plots. A single outlier is evident in the otherwise acceptable plots.<\/p>\n<div style=\"width: 1040px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171754\/13186.png\" alt=\"13186.png\" width=\"1030\" height=\"405\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 2. Residual and normal probability plots.<\/p>\n<\/div>\n<p><strong class=\"Strong-2\">So where do we go from here?<\/strong><\/p>\n<p>We will remove the non-significant variable and re-fit the model excluding the data for SI in our model. The Minitab output is given below.<\/p>\n<h4>General Regression Analysis: CuFt versus BA\/ac, %BA Bspruce<\/h4>\n<table class=\"Table\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Regression Equation<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\" colspan=\"7\">\n<p class=\"Table\">CuFt = -19.1142 + 0.615531 BA\/ac + 0.515122 %BA Bspruce<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Coefficients<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Term<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">SE Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">T<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">P<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">95% CI<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Constant<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-19.1142<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4.10936<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-4.6514<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(-27.5776, -10.6508)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">BA\/ac<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.6155<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.02980<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">20.6523<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(0.5541, 0.6769)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">%BA Bspruce<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.5151<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.04115<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">12.5173<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">(0.4304, 0.5999)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Summary of Model<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">S = 3.15431<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">R-Sq = 95.41%<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">R-Sq(adj) = 95.04%<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">PRESS = 298.712<\/p>\n<\/td>\n<td class=\"Table\" colspan=\"2\">\n<p class=\"Table\">R-Sq(pred) = 94.49%<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table-Heading\" colspan=\"7\">\n<p class=\"Table-Heading\">Analysis of Variance<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Source<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">DF<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">SeqSS<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">AdjSS<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">AdjMS<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">F<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">P<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Regression<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">2<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5170.12<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5170.12<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">2585.06<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">259.814<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.0000000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">BA\/ac<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">3611.17<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4243.71<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4243.71<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">426.519<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.0000000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">%BA Bspruce<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1558.95<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1558.95<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">1558.95<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">156.684<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.0000000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Error<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">25<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">248.74<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">248.74<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">9.95<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Total<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">27<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">5418.86<\/p>\n<\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<td class=\"Table\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We will repeat the steps followed with our first model. We begin by again testing the following hypotheses:<\/p>\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">0<\/span>: <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> = <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> = 0<\/p>\n<p class=\"Centered\">H<span class=\"Subscript SmallText\">1<\/span>: At least one of <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span> , <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">3<\/span> \u2260 0<\/p>\n<p>This reduced model has an F-statistic equal to 259.814 and a p-value of 0.0000. We will reject the null hypothesis. At least one of the predictor variables significantly contributes to the prediction of volume. The coefficients are still positive (as we expected) but the values have changed to account for the different model.<\/p>\n<p>The individual t-tests for each coefficient (repeated below) show that both predictor variables are significantly different from zero and contribute to the prediction of volume.<\/p>\n<table class=\"Table\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"Table-Heading\" colspan=\"6\">\n<p class=\"Table-Heading\">Coefficients<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Term<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">SE Coef<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">T<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">P<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">95% CI<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">Constant<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-19.1142<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">4.10936<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">-4.6514<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">(-27.5776, -10.6508)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">BA\/ac<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.6155<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.02980<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">20.6523<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">( 0.5541, 0.6769)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"Table\">\n<p class=\"Table\">%BA Bspruce<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.5151<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.04115<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">12.5173<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">0.000<\/p>\n<\/td>\n<td class=\"Table\">\n<p class=\"Table\">( 0.4304, 0.5999)<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Notice that the adjusted R<span class=\"Superscript SmallText\">2<\/span> has increased from 94.97% to 95.04% indicating a slightly better fit to the data. The regression standard error has also changed for the better, decreasing from 3.17736 to 3.15431 indicating slightly less variation of the observed data to the model.<\/p>\n<div style=\"width: 1050px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-13\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171758\/131751.png\" alt=\"131751.png\" width=\"1040\" height=\"359\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 3. Residual and normal probability plots.<\/p>\n<\/div>\n<p>The residual and normal probability plots have changed little, still not indicating any issues with the regression assumption. By removing the non-significant variable, the model has improved.<\/p>\n<h2>Model Development and Selection<\/h2>\n<p>There are many different reasons for creating a multiple linear regression model and its purpose directly influences how the model is created. Listed below are several of the more commons uses for a regression model:<\/p>\n<ol>\n<li class=\"List-Paragraph-Number-1\">Describing the behavior of your response variable<\/li>\n<li class=\"List-Paragraph-Number-1\">Predicting a response or estimating the average response<\/li>\n<li class=\"List-Paragraph-Number-1\">Estimating the parameters (<span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">0<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">1<\/span>, <span class=\"Symbols\" xml:lang=\"ar-SA\">\u03b2<\/span><span class=\"Subscript SmallText\">2<\/span>, \u2026)<\/li>\n<li class=\"List-Paragraph-Number-1\">Developing an accurate model of the process<\/li>\n<\/ol>\n<p>Depending on your objective for creating a regression model, your methodology may vary when it comes to variable selection, retention, and elimination.<\/p>\n<p>When the object is simple description of your response variable, you are typically less concerned about eliminating non-significant variables. The best representation of the response variable, in terms of minimal residual sums of squares, is the full model, which includes all predictor variables available from the data set. It is less important that the variables are causally related or that the model is realistic.<\/p>\n<p>A common reason for creating a regression model is for prediction and estimating. A researcher wants to be able to define events within the x-space of data that were collected for this model, and it is assumed that the system will continue to function as it did when the data were collected. Any measurable predictor variables that contain information on the response variable should be included. For this reason, non-significant variables may be retained in the model. However, regression equations with fewer variables are easier to use and have an economic advantage in terms of data collection. Additionally, there is a greater confidence attached to models that contain only significant variables.<\/p>\n<p>If the objective is to estimate the model parameters, you will be more cautious when considering variable elimination. You want to avoid introducing a bias by removing a variable that has predictive information about the response. However, there is a statistical advantage in terms of reduced variance of the parameter estimates if variables truly unrelated to the response variable are removed.<\/p>\n<p>Building a realistic model of the process you are studying is often a primary goal of much research. It is important to identify the variables that are linked to the response through some causal relationship. While you can identify which variables have a strong correlation with the response, this only serves as an indicator of which variables require further study. The principal objective is to develop a model whose functional form realistically reflects the behavior of a system.<\/p>\n<p>The following figure is a strategy for building a regression model.<\/p>\n<div style=\"width: 380px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"frame-60\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171801\/153_1_fmt.png\" alt=\"153_1.tif\" width=\"370\" height=\"576\" \/><\/p>\n<p class=\"wp-caption-text\">Figure 4. Strategy for building a regression model.<\/p>\n<\/div>\n<h2>Software Solutions<\/h2>\n<h3>Minitab<\/h3>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-27 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171804\/155_1_fmt.png\" alt=\"155_1.tif\" \/><\/span><\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-70 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171807\/155_2_fmt.png\" alt=\"155_2.tif\" \/><\/span><\/p>\n<p class=\"No-Caption\"><img decoding=\"async\" class=\"frame-66 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171810\/155_3_fmt.png\" alt=\"155_3.tif\" \/><\/p>\n<p>The output and plots are given in the previous example.<\/p>\n<h3>Excel<\/h3>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-107 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171813\/154_1_fmt.png\" alt=\"154_1.tif\" \/><\/span><\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-79 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171817\/154_2_fmt.png\" alt=\"154_2.tif\" \/><\/span><\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-108 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171820\/154_3_fmt.png\" alt=\"154_3.tif\" \/><\/span><\/p>\n<p class=\"No-Caption\"><span class=\"Picture\"><img decoding=\"async\" class=\"frame-173 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1888\/2017\/05\/11171823\/154_4_fmt.png\" alt=\"154_4.tif\" \/><\/span><\/p>\n<\/div>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-936\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Natural Resources Biometrics. <strong>Authored by<\/strong>: Diane Kiernan. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/textbooks.opensuny.org\/natural-resources-biometrics\/\">https:\/\/textbooks.opensuny.org\/natural-resources-biometrics\/<\/a>. <strong>Project<\/strong>: Open SUNY Textbooks. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\">CC BY-NC-SA: Attribution-NonCommercial-ShareAlike<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":622,"menu_order":8,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Natural Resources Biometrics\",\"author\":\"Diane Kiernan\",\"organization\":\"\",\"url\":\"https:\/\/textbooks.opensuny.org\/natural-resources-biometrics\/\",\"project\":\"Open SUNY Textbooks\",\"license\":\"cc-by-nc-sa\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-936","chapter","type-chapter","status-publish","hentry"],"part":21,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/936","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/users\/622"}],"version-history":[{"count":1,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/936\/revisions"}],"predecessor-version":[{"id":1255,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/936\/revisions\/1255"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/parts\/21"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapters\/936\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/media?parent=936"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/pressbooks\/v2\/chapter-type?post=936"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/contributor?post=936"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-natural-resources-biometrics\/wp-json\/wp\/v2\/license?post=936"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}