{"id":13821,"date":"2018-08-24T20:13:37","date_gmt":"2018-08-24T20:13:37","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/precalcone\/?post_type=chapter&#038;p=13821"},"modified":"2025-02-05T05:18:39","modified_gmt":"2025-02-05T05:18:39","slug":"fitting-linear-models-to-data","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/precalculus\/chapter\/fitting-linear-models-to-data\/","title":{"raw":"Fitting Linear Models to Data","rendered":"Fitting Linear Models to Data"},"content":{"raw":"<div class=\"bcc-box bcc-highlight\">\r\n<h3>Learning Outcomes<\/h3>\r\n<ul>\r\n \t<li>Draw and interpret scatter plots.<\/li>\r\n \t<li>Find the line of best fit.<\/li>\r\n \t<li>Distinguish between linear and nonlinear relations.<\/li>\r\n \t<li>Use a linear model to make predictions.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<p id=\"fs-id1165137641419\">A professor is attempting to identify trends among final exam scores. His class has a mixture of students, so he wonders if there is any relationship between age and final exam scores. One way for him to analyze the scores is by creating a diagram that relates the age of each student to the exam score received. In this section, we will examine one such diagram known as a scatter plot.<\/p>\r\n\r\n<h2 style=\"text-align: center;\">Drawing and Interpreting Scatter Plots<\/h2>\r\nA <strong>scatter plot<\/strong> is a graph of plotted points that may show a relationship between two sets of data. If the relationship is from a <strong>linear model<\/strong>, or a model that is nearly linear, the professor can draw conclusions using his knowledge of linear functions. Below is\u00a0a sample scatter plot.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"487\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010659\/CNX_Precalc_Figure_02_04_0012.jpg\" alt=\"Scatter plot, titled 'Final Exam Score VS Age'. The x-axis is the age, and the y-axis is the final exam score. The range of ages are between 20s - 50s, and the range for scores are between upper 50s and 90s.\" width=\"487\" height=\"337\" \/> <b>Figure 1.<\/b> A scatter plot of age and final exam score variables[\/caption]\r\n<p id=\"fs-id1165137658014\">Notice this scatter plot does <em>not<\/em> indicate a <strong>linear relationship<\/strong>. The points do not appear to follow a trend. In other words, there does not appear to be a relationship between the age of the student and the score on the final exam.<\/p>\r\n\r\n<div id=\"Example_02_04_01\" class=\"example\">\r\n<div id=\"fs-id1165137393214\" class=\"exercise\">\r\n<div id=\"fs-id1165137735052\" class=\"problem textbox shaded\">\r\n<h3>Example 1: Using a Scatter Plot to Investigate Cricket Chirps<\/h3>\r\n<p id=\"fs-id1165137874471\">The table below\u00a0shows the number of cricket chirps in 15 seconds, for several different air temperatures, in degrees Fahrenheit.[footnote]Selected data from <a href=\"http:\/\/classic.globe.gov\/fsl\/scientistsblog\/2007\/10\/\" target=\"_blank\" rel=\"noopener\">http:\/\/classic.globe.gov\/fsl\/scientistsblog\/2007\/10\/<\/a>. Retrieved Aug 3, 2010[\/footnote] Plot this data, and determine whether the data appears to be linearly related.<\/p>\r\n\r\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><strong>Chirps<\/strong><\/td>\r\n<td>44<\/td>\r\n<td>35<\/td>\r\n<td>20.4<\/td>\r\n<td>33<\/td>\r\n<td>31<\/td>\r\n<td>35<\/td>\r\n<td>18.5<\/td>\r\n<td>37<\/td>\r\n<td>26<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Temperature<\/strong><\/td>\r\n<td>80.5<\/td>\r\n<td>70.5<\/td>\r\n<td>57<\/td>\r\n<td>66<\/td>\r\n<td>68<\/td>\r\n<td>72<\/td>\r\n<td>52<\/td>\r\n<td>73.5<\/td>\r\n<td>53<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"416011\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"416011\"]\r\n\r\nPlotting this data\u00a0suggests that there may be a trend. We can see from the trend in the data that the number of chirps increases as the temperature increases. The trend appears to be roughly linear, though certainly not perfectly so.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"487\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010659\/CNX_Precalc_Figure_02_04_0022.jpg\" alt=\"Scatter plot, titled 'Cricket Chirps Vs Air Temperature'. The x-axis is the Cricket Chirps in 15 Seconds, and the y-axis is the Temperature (F). The line regression is generally positive.\" width=\"487\" height=\"386\" \/> <b>Figure 2<\/b>[\/caption]\r\n\r\n[\/hidden-answer]<b><\/b>\r\n\r\n<\/div>\r\n<h2 style=\"text-align: center;\">Finding the Line of Best Fit<\/h2>\r\n<p id=\"fs-id1165135443922\">Once we recognize a need for a linear function to model the\u00a0data in \"<a href=\"https:\/\/courses.lumenlearning.com\/precalcone\/chapter\/draw-and-interpret-scatter-plots\/\" target=\"_blank\" rel=\"noopener\">Draw and interpret scatter plots<\/a>,\" the natural follow-up question is \"what is that linear function?\" One way to approximate our linear function is to sketch the line that seems to best fit the data. Then we can extend the line until we can verify the <em>y<\/em>-intercept. We can approximate the slope of the line by extending it until we can estimate the [latex]\\frac{\\text{rise}}{\\text{run}}[\/latex].<\/p>\r\n\r\n<div id=\"Example_02_04_02\" class=\"example\">\r\n<div id=\"fs-id1165135441718\" class=\"exercise\">\r\n<div id=\"fs-id1165137416925\" class=\"problem textbox shaded\">\r\n<h3>Example 2: Finding a Line of Best Fit<\/h3>\r\nFind a linear function that fits the data in the table below\u00a0by \"eyeballing\" a line that seems to fit.\r\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><strong>Chirps<\/strong><\/td>\r\n<td>44<\/td>\r\n<td>35<\/td>\r\n<td>20.4<\/td>\r\n<td>33<\/td>\r\n<td>31<\/td>\r\n<td>35<\/td>\r\n<td>18.5<\/td>\r\n<td>37<\/td>\r\n<td>26<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Temperature<\/strong><\/td>\r\n<td>80.5<\/td>\r\n<td>70.5<\/td>\r\n<td>57<\/td>\r\n<td>66<\/td>\r\n<td>68<\/td>\r\n<td>72<\/td>\r\n<td>52<\/td>\r\n<td>73.5<\/td>\r\n<td>53<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"581479\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"581479\"]\r\n<p id=\"fs-id1165137507732\">On a graph, we could try sketching a line.<\/p>\r\n<p id=\"fs-id1165137581362\">Using the starting and ending points of our hand drawn line, points (0, 30) and (50, 90), this graph has a slope of<\/p>\r\n<p style=\"text-align: center;\">[latex]m=\\frac{60}{50}=1.2[\/latex]<\/p>\r\n<p id=\"fs-id1165135536506\">and a <em>y<\/em>-intercept at 30. This gives an equation of<\/p>\r\n<p style=\"text-align: center;\">[latex]T\\left(c\\right)=1.2c+30[\/latex]<\/p>\r\n<p id=\"fs-id1165137573540\">where <em>c<\/em>\u00a0is the number of chirps in 15 seconds, and <em>T<\/em>(<em>c<\/em>)\u00a0is the temperature in degrees Fahrenheit. The resulting equation is represented in the graph below.<\/p>\r\n\r\n<figure id=\"CNX_Precalc_Figure_02_04_003\">\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"487\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0032.jpg\" alt=\"Scatter plot, showing the line of best fit. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"487\" height=\"432\" \/> <b>Figure 3<\/b>[\/caption]<\/figure>\r\n<h4>Analysis of the Solution<\/h4>\r\n<p id=\"fs-id1165137745268\">This linear equation can then be used to approximate answers to various questions we might ask about the trend.<\/p>\r\n[\/hidden-answer]<b><\/b>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<section id=\"fs-id1165137456060\">\r\n<h2 style=\"text-align: center;\">Recognizing Interpolation or Extrapolation<\/h2>\r\n<p id=\"fs-id1165135255246\">While the data for most examples does not fall perfectly on the line, the equation is our best guess as to how the relationship will behave outside of the values for which we have data. We use a process known as <strong>interpolation<\/strong> when we predict a value inside the domain and range of the data. The process of <strong>extrapolation<\/strong> is used when we predict a value outside the domain and range of the data.<\/p>\r\n<p id=\"fs-id1165137911398\">The graph below compares the two processes for the cricket-chirp data addressed in Example 2. We can see that interpolation would occur if we used our model to predict temperature when the values for chirps are between 18.5 and 44. Extrapolation would occur if we used our model to predict temperature when the values for chirps are less than 18.5 or greater than 44.<\/p>\r\n<p id=\"fs-id1165137399663\">There is a difference between making predictions inside the domain and range of values for which we have data and outside that domain and range. Predicting a value outside of the domain and range has its limitations. When our model no longer applies after a certain point, it is sometimes called <strong>model breakdown<\/strong>. For example, predicting a cost function for a period of two years may involve examining the data where the input is the time in years and the output is the cost. But if we try to extrapolate a cost when <em>x<\/em> = 50, that is in 50 years, the model would not apply because we could not account for factors fifty years in the future.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"487\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0042.jpg\" alt=\"Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"487\" height=\"430\" \/> <b>Figure 4.<\/b> Interpolation occurs within the domain and range of the provided data whereas extrapolation occurs outside.[\/caption]\r\n\r\n<div id=\"fs-id1165137447983\" class=\"note textbox\">\r\n<h3 class=\"title\">A General Note: Interpolation and Extrapolation<\/h3>\r\n<p id=\"fs-id1165135646198\">Different methods of making predictions are used to analyze data.<\/p>\r\n\r\n<ul id=\"fs-id1165137666284\">\r\n \t<li>The method of <strong>interpolation<\/strong> involves predicting a value inside the domain and\/or range of the data.<\/li>\r\n \t<li>The method of <strong>extrapolation<\/strong> involves predicting a value outside the domain and\/or range of the data.<\/li>\r\n \t<li><strong>Model breakdown<\/strong> occurs at the point when the model no longer applies.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<div id=\"Example_02_04_03\" class=\"example\">\r\n<div id=\"fs-id1165137584562\" class=\"exercise\">\r\n<div id=\"fs-id1165137734950\" class=\"problem textbox shaded\">\r\n<h3>Example 3: Understanding Interpolation and Extrapolation<\/h3>\r\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><strong>Chirps<\/strong><\/td>\r\n<td>44<\/td>\r\n<td>35<\/td>\r\n<td>20.4<\/td>\r\n<td>33<\/td>\r\n<td>31<\/td>\r\n<td>35<\/td>\r\n<td>18.5<\/td>\r\n<td>37<\/td>\r\n<td>26<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Temperature<\/strong><\/td>\r\n<td>80.5<\/td>\r\n<td>70.5<\/td>\r\n<td>57<\/td>\r\n<td>66<\/td>\r\n<td>68<\/td>\r\n<td>72<\/td>\r\n<td>52<\/td>\r\n<td>73.5<\/td>\r\n<td>53<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p id=\"fs-id1165137668321\">Use the cricket data above\u00a0to answer the following questions:<\/p>\r\n\r\n<ol id=\"fs-id1165137843776\">\r\n \t<li>Would predicting the temperature when crickets are chirping 30 times in 15 seconds be interpolation or extrapolation? Make the prediction, and discuss whether it is reasonable.<\/li>\r\n \t<li>Would predicting the number of chirps crickets will make at 40 degrees be interpolation or extrapolation? Make the prediction, and discuss whether it is reasonable.<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"775750\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"775750\"]\r\n<ol id=\"fs-id1165137779181\">\r\n \t<li>The number of chirps in the data provided varied from 18.5 to 44. A prediction at 30 chirps per 15 seconds is inside the domain of our data, so would be interpolation. Using our model:\r\n<div id=\"eip-id1165131886691\" class=\"equation unnumbered\" style=\"text-align: center;\">[latex]\\begin{align}T\\left(30\\right)&amp;=30+1.2\\left(30\\right) \\\\ &amp;=66\\text{degrees} \\end{align}[\/latex]<\/div>\r\nBased on the data we have, this value seems reasonable.<\/li>\r\n \t<li>The temperature values varied from 52 to 80.5. Predicting the number of chirps at 40 degrees is extrapolation because 40 is outside the range of our data. Using our model:\r\n<div id=\"eip-id1165135551132\" class=\"equation unnumbered\" style=\"text-align: center;\">[latex]\\begin{align}40&amp;=30+1.2c \\\\ 10&amp;=1.2c \\\\ c&amp;\\approx 8.33 \\end{align}[\/latex]<\/div><\/li>\r\n<\/ol>\r\n<p id=\"fs-id1165137434356\">We can compare the regions of interpolation and extrapolation using the graph below.<\/p>\r\n\r\n<figure id=\"CNX_Precalc_Figure_02_04_005\">\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"485\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0052.jpg\" alt=\"Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"485\" height=\"429\" \/> <b>Figure 5<\/b>[\/caption]<\/figure>\r\n<h4>Analysis of the Solution<\/h4>\r\n<p id=\"fs-id1165137580571\">Our model predicts the crickets would chirp 8.33 times in 15 seconds. While this might be possible, we have no reason to believe our model is valid outside the domain and range. In fact, generally crickets stop chirping altogether below around 50 degrees.<\/p>\r\n[\/hidden-answer]<b><\/b>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"bcc-box bcc-success\">\r\n<h3>Try It<\/h3>\r\n<p id=\"fs-id1165135695208\">According to the data from the table in Example 3, what temperature can we predict it is if we counted 20 chirps in 15 seconds?<\/p>\r\n[reveal-answer q=\"326758\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"326758\"]\r\n\r\n[latex]54^\\circ \\text{F}[\/latex]\r\n\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\n<\/section><section id=\"fs-id1165137414565\">\r\n<h2 style=\"text-align: center;\">Finding the Line of Best Fit Using a Graphing Utility<\/h2>\r\n<p id=\"fs-id1165137598255\">While eyeballing a line works reasonably well, there are statistical techniques for fitting a line to data that minimize the differences between the line and data values.[footnote]Technically, the method minimizes the sum of the squared differences in the vertical direction between the line and the data values.[\/footnote] One such technique is called <strong>least squares regression<\/strong> and can be computed by many graphing calculators, spreadsheet software, statistical software, and many web-based calculators.[footnote]For example, <a href=\"http:\/\/www.shodor.org\/unchem\/math\/lls\/leastsq.html\" target=\"_blank\" rel=\"noopener\">http:\/\/www.shodor.org\/unchem\/math\/lls\/leastsq.html<\/a>[\/footnote] Least squares regression is one means to determine the line that best fits the data, and here we will refer to this method as linear regression.<\/p>\r\n\r\n<div id=\"fs-id1165137534286\" class=\"note precalculus howto textbox\">\r\n<h3 id=\"fs-id1165137472097\">How To: Given data of input and corresponding outputs from a linear function, find the best fit line using linear regression.<\/h3>\r\n<ol id=\"fs-id1165135192714\">\r\n \t<li>Enter the input in List 1 (L1).<\/li>\r\n \t<li>Enter the output in List 2 (L2).<\/li>\r\n \t<li>On a graphing utility, select Linear Regression (LinReg).<\/li>\r\n<\/ol>\r\n<\/div>\r\n<div id=\"Example_02_04_04\" class=\"example\">\r\n<div id=\"fs-id1165137557131\" class=\"exercise\">\r\n<div id=\"fs-id1165137862815\" class=\"problem textbox shaded\">\r\n<h3>Example 4: Finding a Least Squares Regression Line<\/h3>\r\nFind the least squares regression line using the cricket-chirp data in the table below.\r\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><strong>Chirps<\/strong><\/td>\r\n<td>44<\/td>\r\n<td>35<\/td>\r\n<td>20.4<\/td>\r\n<td>33<\/td>\r\n<td>31<\/td>\r\n<td>35<\/td>\r\n<td>18.5<\/td>\r\n<td>37<\/td>\r\n<td>26<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Temperature<\/strong><\/td>\r\n<td>80.5<\/td>\r\n<td>70.5<\/td>\r\n<td>57<\/td>\r\n<td>66<\/td>\r\n<td>68<\/td>\r\n<td>72<\/td>\r\n<td>52<\/td>\r\n<td>73.5<\/td>\r\n<td>53<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"98516\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"98516\"]\r\n<ol id=\"fs-id1165137415920\">\r\n \t<li>Enter the input (chirps) in List 1 (L1).<\/li>\r\n \t<li>Enter the output (temperature) in List 2 (L2). See the table below.\r\n<table id=\"Table_02_04_02\" summary=\"Two rows and ten columns. The first row is labeled, 'L1'. The second row is labeled is labeled, 'L2'. Reading the remaining rows as ordered pairs (i.e., (L2, L2), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><strong>L1<\/strong><\/td>\r\n<td>44<\/td>\r\n<td>35<\/td>\r\n<td>20.4<\/td>\r\n<td>33<\/td>\r\n<td>31<\/td>\r\n<td>35<\/td>\r\n<td>18.5<\/td>\r\n<td>37<\/td>\r\n<td>26<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>L2<\/strong><\/td>\r\n<td>80.5<\/td>\r\n<td>70.5<\/td>\r\n<td>57<\/td>\r\n<td>66<\/td>\r\n<td>68<\/td>\r\n<td>72<\/td>\r\n<td>52<\/td>\r\n<td>73.5<\/td>\r\n<td>53<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/li>\r\n \t<li>On a graphing utility, select Linear Regression (LinReg). Using the cricket chirp data from earlier, with technology we obtain the equation:\r\n<div id=\"fs-id1140825\" class=\"equation unnumbered\" style=\"text-align: center;\">[latex]T\\left(c\\right)=30.281+1.143c[\/latex]<\/div><\/li>\r\n<\/ol>\r\n<h3>Analysis of the Solution<\/h3>\r\n<p id=\"fs-id1165137627652\">Notice that this line is quite similar to the equation we \"eyeballed\" but should fit the data better. Notice also that using this equation would change our prediction for the temperature when hearing 30 chirps in 15 seconds from 66 degrees to:<\/p>\r\n<p style=\"text-align: center;\">[latex]\\begin{align}T\\left(30\\right)&amp;=30.281+1.143\\left(30\\right) \\\\ &amp;=64.571 \\\\ &amp;\\approx 64.6\\text{ degrees} \\end{align}[\/latex]<\/p>\r\nThe graph of the scatter plot with the least squares regression line is shown in below.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"487\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0062.jpg\" alt=\"Scatter plot, showing the line of best fit. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"487\" height=\"408\" \/> <b>Figure 6<\/b>[\/caption]\r\n\r\n[\/hidden-answer]<span id=\"fs-id1165137692164\">\r\n<\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"fs-id1165135260732\" class=\"note precalculus qa textbox\">\r\n<h3 id=\"fs-id1165137611629\">Q &amp; A<\/h3>\r\n<strong>Will there ever be a case where two different lines will serve as the best fit for the data?<\/strong>\r\n<p id=\"fs-id1165137507058\"><em>Although there are other ways to find \"best fit\" lines, we will always use the least squares regression line.<\/em><\/p>\r\n\r\n<\/div>\r\n<h2>Distinguishing Between Linear and Non-Linear Models<\/h2>\r\n<section id=\"fs-id1165137594434\">\r\n<p id=\"fs-id1165135160844\">Some data exhibit strong linear trends, but other data are nonlinear. Most calculators and computer software can also provide us with the <strong>correlation coefficient<\/strong>, which is a measure of how closely the line fits the data. Many graphing calculators require the user to turn a \"diagnostic on\" selection to find the correlation coefficient, which mathematicians label as <em>r<\/em>. The correlation coefficient provides an easy way to get an idea of how close to a line the data falls.<\/p>\r\nWe should compute the correlation coefficient only for data that follows a linear pattern or to determine the degree to which a data set is linear. If the data exhibits a nonlinear pattern, the correlation coefficient for a linear regression is meaningless. To get a sense for the relationship between the value of <em>r<\/em>\u00a0and the graph of the data, the image below\u00a0shows some large data sets with their correlation coefficients. Remember, for all plots, the horizontal axis shows the input and the vertical axis shows the output.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"901\"]<img class=\"\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0072.jpg\" alt=\"A series of scatterplot graphs. Some are linear and some are not.\" width=\"901\" height=\"401\" \/> <b>Figure 7.<\/b> Plotted data and related correlation coefficients. (credit: \"DenisBoigelot,\" Wikimedia Commons)[\/caption]\r\n\r\n<div id=\"fs-id1165137443573\" class=\"note textbox\">\r\n<h3 class=\"title\">A General Note: Correlation Coefficient<\/h3>\r\n<p id=\"fs-id1165137416387\">The <strong>correlation coefficient<\/strong> is a value, <em>r<\/em>, between \u20131 and 1.<\/p>\r\n\r\n<ul id=\"eip-id1165133093343\">\r\n \t<li><em>r<\/em> &gt; 0 suggests a positive (increasing) relationship<\/li>\r\n \t<li><em>r<\/em> &lt; 0 suggests a negative (decreasing) relationship<\/li>\r\n \t<li>The closer the value is to 0, the more scattered the data.<\/li>\r\n \t<li>The closer the value is to 1 or \u20131, the less scattered the data is.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<div id=\"Example_02_04_05\" class=\"example\">\r\n<div id=\"fs-id1165137387185\" class=\"exercise\">\r\n<div id=\"fs-id1165137680583\" class=\"problem textbox shaded\">\r\n<h3>Example 5: Finding a Correlation Coefficient<\/h3>\r\n<p id=\"fs-id1165137734908\">Calculate the correlation coefficient for cricket-chirp data in the table below.<\/p>\r\n\r\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\"><colgroup> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/> <col \/><\/colgroup>\r\n<tbody>\r\n<tr>\r\n<td><strong>Chirps<\/strong><\/td>\r\n<td>44<\/td>\r\n<td>35<\/td>\r\n<td>20.4<\/td>\r\n<td>33<\/td>\r\n<td>31<\/td>\r\n<td>35<\/td>\r\n<td>18.5<\/td>\r\n<td>37<\/td>\r\n<td>26<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Temperature<\/strong><\/td>\r\n<td>80.5<\/td>\r\n<td>70.5<\/td>\r\n<td>57<\/td>\r\n<td>66<\/td>\r\n<td>68<\/td>\r\n<td>72<\/td>\r\n<td>52<\/td>\r\n<td>73.5<\/td>\r\n<td>53<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"201965\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"201965\"]\r\n<p id=\"fs-id1165137471985\">Because the data appear to follow a linear pattern, we can use technology to calculate <em>r<\/em>. Enter the inputs and corresponding outputs and select the Linear Regression. The calculator will also provide you with the correlation coefficient, <em>r\u00a0<\/em>= 0.9509. This value is very close to 1, which suggests a strong increasing linear relationship.<\/p>\r\n<p id=\"fs-id1165137473276\">Note: For some calculators, the Diagnostics must be turned \"on\" in order to get the correlation coefficient when linear regression is performed: [2nd]&gt;[0]&gt;[alpha][x\u20131], then scroll to DIAGNOSTICSON.<\/p>\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/section><\/section>\r\n<h2 class=\"note precalculus try\" style=\"text-align: center;\">Predicting with a Regression Line<\/h2>\r\n<p id=\"fs-id1165137436049\">Once we determine that a set of data is linear using the correlation coefficient, we can use the regression line to make predictions. As we learned previously, a regression line is a line that is closest to the data in the scatter plot, which means that only one such line is a best fit for the data.<\/p>\r\n\r\n<div id=\"Example_02_04_06\" class=\"example\">\r\n<div id=\"fs-id1165137571292\" class=\"exercise\">\r\n<div id=\"fs-id1165135546014\" class=\"problem textbox shaded\">\r\n<h3>Example 6: Using a Regression Line to Make Predictions<\/h3>\r\n<p id=\"fs-id1165135191292\">Gasoline consumption in the United States has been steadily increasing. Consumption data from 1994 to 2004 is shown in the table below.[footnote]<a href=\"http:\/\/www.bts.gov\/publications\/national_transportation_statistics\/2005\/html\/table_04_10.html\" target=\"_blank\" rel=\"noopener\">http:\/\/www.bts.gov\/publications\/national_transportation_statistics\/2005\/html\/table_04_10.html<\/a>[\/footnote] Determine whether the trend is linear, and if so, find a model for the data. Use the model to predict the consumption in 2008.<\/p>\r\n\r\n<table id=\"Table_02_04_03\" summary=\"Two rows and twelve columns. The first row is labeled, 'Year'. The second row is labeled is labeled, 'Consumption (billions of gallons)'. Reading the remaining rows as ordered pairs (i.e., (Year, Consumption), we have the following values: ('94, 113), ('95, 116), ('96, 118), ('97, 119), ('98, 123), ('99, 125), ('00, 126), ('01, 128), ('02, 131), ('03, 133), and ('04, 136).\">\r\n<tbody>\r\n<tr>\r\n<td><strong>Year<\/strong><\/td>\r\n<td>'94<\/td>\r\n<td>'95<\/td>\r\n<td>'96<\/td>\r\n<td>'97<\/td>\r\n<td>'98<\/td>\r\n<td>'99<\/td>\r\n<td>'00<\/td>\r\n<td>'01<\/td>\r\n<td>'02<\/td>\r\n<td>'03<\/td>\r\n<td>'04<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Consumption (billions of gallons)<\/strong><\/td>\r\n<td>113<\/td>\r\n<td>116<\/td>\r\n<td>118<\/td>\r\n<td>119<\/td>\r\n<td>123<\/td>\r\n<td>125<\/td>\r\n<td>126<\/td>\r\n<td>128<\/td>\r\n<td>131<\/td>\r\n<td>133<\/td>\r\n<td>136<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nThe scatter plot of the data, including the least squares regression line, is shown in Figure 8.\r\n\r\n&nbsp;\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"487\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010701\/CNX_Precalc_Figure_02_04_0082.jpg\" alt=\"Scatter plot, showing the line of best fit. It is titled 'Gas Consumption VS Year'. The x-axis is 'Year After 1994', and the y-axis is 'Gas Consumption (billions of gallons)'.\" width=\"487\" height=\"384\" \/> <b>Figure 8<\/b>[\/caption]\r\n\r\n[reveal-answer q=\"490501\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"490501\"]\r\n<p id=\"fs-id1165137767360\">We can introduce new input variable, <em>t<\/em>, representing years since 1994.<\/p>\r\n<p id=\"fs-id1165137552875\">The least squares regression equation is:<\/p>\r\n<p style=\"text-align: center;\">[latex]C\\left(t\\right)=113.318+2.209t[\/latex]<\/p>\r\n<p id=\"fs-id1165137767812\">Using technology, the correlation coefficient was calculated to be 0.9965, suggesting a very strong increasing linear trend.<\/p>\r\n<p id=\"fs-id1165137444077\">Using this to predict consumption in 2008 [latex]\\left(t=14\\right)[\/latex],<\/p>\r\n<p style=\"text-align: center;\">[latex]\\begin{align}C\\left(14\\right)&amp;=113.318+2.209\\left(14\\right) \\\\ &amp;=144.244 \\end{align}[\/latex]<\/p>\r\n<p id=\"fs-id1165135207471\">The model predicts 144.244 billion gallons of gasoline consumption in 2008.<\/p>\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"bcc-box bcc-success\">\r\n<h3>Try It<\/h3>\r\n<p id=\"fs-id1165137600643\">Use the model we created using technology in Example 6\u00a0to predict the gas consumption in 2011. Is this an interpolation or an extrapolation?<\/p>\r\n[reveal-answer q=\"11412\"]Show Solution[\/reveal-answer]\r\n[hidden-answer a=\"11412\"]\r\n\r\n150.871 billion gallons; extrapolation\r\n\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\n<h2>Key Concepts<\/h2>\r\n<ul id=\"fs-id1165137785014\">\r\n \t<li>Scatter plots show the relationship between two sets of data.<\/li>\r\n \t<li>Scatter plots may represent linear or non-linear models.<\/li>\r\n \t<li>The line of best fit may be estimated or calculated, using a calculator or statistical software.<\/li>\r\n \t<li>Interpolation can be used to predict values inside the domain and range of the data, whereas extrapolation can be used to predict values outside the domain and range of the data.<\/li>\r\n \t<li>The correlation coefficient, <span id=\"MathJax-Element-329-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-4867\" class=\"math\"><span id=\"MathJax-Span-4868\" class=\"mrow\"><span id=\"MathJax-Span-4869\" class=\"semantics\"><span id=\"MathJax-Span-4870\" class=\"mrow\"><span id=\"MathJax-Span-4871\" class=\"mrow\"><em><span id=\"MathJax-Span-4872\" class=\"mi\">r<\/span><\/em><span id=\"MathJax-Span-4873\" class=\"mo\">,<\/span><\/span><\/span><\/span><\/span><\/span><\/span> indicates the degree of linear relationship between data.<\/li>\r\n \t<li>A regression line best fits the data.<\/li>\r\n \t<li>The least squares regression line is found by minimizing the squares of the distances of points from a line passing through the data and may be used to make predictions regarding either of the variables.<\/li>\r\n<\/ul>\r\n<h2>Glossary<\/h2>\r\n<dl id=\"fs-id1165137705061\" class=\"definition\">\r\n \t<dt><strong>correlation coefficient<\/strong><\/dt>\r\n \t<dd id=\"fs-id1165135250649\">a value, <em>r<\/em>, between \u20131 and 1 that indicates the degree of linear correlation of variables, or how closely a regression line fits a data set.<\/dd>\r\n<\/dl>\r\n<dl id=\"fs-id1165137549428\" class=\"definition\">\r\n \t<dt><strong>extrapolation<\/strong><\/dt>\r\n \t<dd id=\"fs-id1165135485274\">predicting a value outside the domain and range of the data<\/dd>\r\n<\/dl>\r\n<dl id=\"fs-id1165135485278\" class=\"definition\">\r\n \t<dt><strong>interpolation<\/strong><\/dt>\r\n \t<dd id=\"fs-id1165135184191\">predicting a value inside the domain and range of the data<\/dd>\r\n<\/dl>\r\n<dl id=\"fs-id1165137761665\" class=\"definition\">\r\n \t<dt><strong>least squares regression<\/strong><\/dt>\r\n \t<dd id=\"fs-id1165135192379\">a statistical technique for fitting a line to data in a way that minimizes the differences between the line and data values<\/dd>\r\n<\/dl>\r\n<dl id=\"fs-id1165137446440\" class=\"definition\">\r\n \t<dt><strong>model breakdown<\/strong><\/dt>\r\n \t<dd id=\"fs-id1165137446445\">when a model no longer applies after a certain point<\/dd>\r\n<\/dl>\r\n<\/div>\r\n<\/div>","rendered":"<div class=\"bcc-box bcc-highlight\">\n<h3>Learning Outcomes<\/h3>\n<ul>\n<li>Draw and interpret scatter plots.<\/li>\n<li>Find the line of best fit.<\/li>\n<li>Distinguish between linear and nonlinear relations.<\/li>\n<li>Use a linear model to make predictions.<\/li>\n<\/ul>\n<\/div>\n<p id=\"fs-id1165137641419\">A professor is attempting to identify trends among final exam scores. His class has a mixture of students, so he wonders if there is any relationship between age and final exam scores. One way for him to analyze the scores is by creating a diagram that relates the age of each student to the exam score received. In this section, we will examine one such diagram known as a scatter plot.<\/p>\n<h2 style=\"text-align: center;\">Drawing and Interpreting Scatter Plots<\/h2>\n<p>A <strong>scatter plot<\/strong> is a graph of plotted points that may show a relationship between two sets of data. If the relationship is from a <strong>linear model<\/strong>, or a model that is nearly linear, the professor can draw conclusions using his knowledge of linear functions. Below is\u00a0a sample scatter plot.<\/p>\n<div style=\"width: 497px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010659\/CNX_Precalc_Figure_02_04_0012.jpg\" alt=\"Scatter plot, titled 'Final Exam Score VS Age'. The x-axis is the age, and the y-axis is the final exam score. The range of ages are between 20s - 50s, and the range for scores are between upper 50s and 90s.\" width=\"487\" height=\"337\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 1.<\/b> A scatter plot of age and final exam score variables<\/p>\n<\/div>\n<p id=\"fs-id1165137658014\">Notice this scatter plot does <em>not<\/em> indicate a <strong>linear relationship<\/strong>. The points do not appear to follow a trend. In other words, there does not appear to be a relationship between the age of the student and the score on the final exam.<\/p>\n<div id=\"Example_02_04_01\" class=\"example\">\n<div id=\"fs-id1165137393214\" class=\"exercise\">\n<div id=\"fs-id1165137735052\" class=\"problem textbox shaded\">\n<h3>Example 1: Using a Scatter Plot to Investigate Cricket Chirps<\/h3>\n<p id=\"fs-id1165137874471\">The table below\u00a0shows the number of cricket chirps in 15 seconds, for several different air temperatures, in degrees Fahrenheit.<a class=\"footnote\" title=\"Selected data from http:\/\/classic.globe.gov\/fsl\/scientistsblog\/2007\/10\/. Retrieved Aug 3, 2010\" id=\"return-footnote-13821-1\" href=\"#footnote-13821-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a> Plot this data, and determine whether the data appears to be linearly related.<\/p>\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><strong>Chirps<\/strong><\/td>\n<td>44<\/td>\n<td>35<\/td>\n<td>20.4<\/td>\n<td>33<\/td>\n<td>31<\/td>\n<td>35<\/td>\n<td>18.5<\/td>\n<td>37<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td><strong>Temperature<\/strong><\/td>\n<td>80.5<\/td>\n<td>70.5<\/td>\n<td>57<\/td>\n<td>66<\/td>\n<td>68<\/td>\n<td>72<\/td>\n<td>52<\/td>\n<td>73.5<\/td>\n<td>53<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q416011\">Show Solution<\/span><\/p>\n<div id=\"q416011\" class=\"hidden-answer\" style=\"display: none\">\n<p>Plotting this data\u00a0suggests that there may be a trend. We can see from the trend in the data that the number of chirps increases as the temperature increases. The trend appears to be roughly linear, though certainly not perfectly so.<\/p>\n<div style=\"width: 497px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010659\/CNX_Precalc_Figure_02_04_0022.jpg\" alt=\"Scatter plot, titled 'Cricket Chirps Vs Air Temperature'. The x-axis is the Cricket Chirps in 15 Seconds, and the y-axis is the Temperature (F). The line regression is generally positive.\" width=\"487\" height=\"386\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 2<\/b><\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><b><\/b><\/p>\n<\/div>\n<h2 style=\"text-align: center;\">Finding the Line of Best Fit<\/h2>\n<p id=\"fs-id1165135443922\">Once we recognize a need for a linear function to model the\u00a0data in &#8220;<a href=\"https:\/\/courses.lumenlearning.com\/precalcone\/chapter\/draw-and-interpret-scatter-plots\/\" target=\"_blank\" rel=\"noopener\">Draw and interpret scatter plots<\/a>,&#8221; the natural follow-up question is &#8220;what is that linear function?&#8221; One way to approximate our linear function is to sketch the line that seems to best fit the data. Then we can extend the line until we can verify the <em>y<\/em>-intercept. We can approximate the slope of the line by extending it until we can estimate the [latex]\\frac{\\text{rise}}{\\text{run}}[\/latex].<\/p>\n<div id=\"Example_02_04_02\" class=\"example\">\n<div id=\"fs-id1165135441718\" class=\"exercise\">\n<div id=\"fs-id1165137416925\" class=\"problem textbox shaded\">\n<h3>Example 2: Finding a Line of Best Fit<\/h3>\n<p>Find a linear function that fits the data in the table below\u00a0by &#8220;eyeballing&#8221; a line that seems to fit.<\/p>\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><strong>Chirps<\/strong><\/td>\n<td>44<\/td>\n<td>35<\/td>\n<td>20.4<\/td>\n<td>33<\/td>\n<td>31<\/td>\n<td>35<\/td>\n<td>18.5<\/td>\n<td>37<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td><strong>Temperature<\/strong><\/td>\n<td>80.5<\/td>\n<td>70.5<\/td>\n<td>57<\/td>\n<td>66<\/td>\n<td>68<\/td>\n<td>72<\/td>\n<td>52<\/td>\n<td>73.5<\/td>\n<td>53<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q581479\">Show Solution<\/span><\/p>\n<div id=\"q581479\" class=\"hidden-answer\" style=\"display: none\">\n<p id=\"fs-id1165137507732\">On a graph, we could try sketching a line.<\/p>\n<p id=\"fs-id1165137581362\">Using the starting and ending points of our hand drawn line, points (0, 30) and (50, 90), this graph has a slope of<\/p>\n<p style=\"text-align: center;\">[latex]m=\\frac{60}{50}=1.2[\/latex]<\/p>\n<p id=\"fs-id1165135536506\">and a <em>y<\/em>-intercept at 30. This gives an equation of<\/p>\n<p style=\"text-align: center;\">[latex]T\\left(c\\right)=1.2c+30[\/latex]<\/p>\n<p id=\"fs-id1165137573540\">where <em>c<\/em>\u00a0is the number of chirps in 15 seconds, and <em>T<\/em>(<em>c<\/em>)\u00a0is the temperature in degrees Fahrenheit. The resulting equation is represented in the graph below.<\/p>\n<figure id=\"CNX_Precalc_Figure_02_04_003\">\n<div style=\"width: 497px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0032.jpg\" alt=\"Scatter plot, showing the line of best fit. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"487\" height=\"432\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 3<\/b><\/p>\n<\/div>\n<\/figure>\n<h4>Analysis of the Solution<\/h4>\n<p id=\"fs-id1165137745268\">This linear equation can then be used to approximate answers to various questions we might ask about the trend.<\/p>\n<\/div>\n<\/div>\n<p><b><\/b><\/p>\n<\/div>\n<\/div>\n<\/div>\n<section id=\"fs-id1165137456060\">\n<h2 style=\"text-align: center;\">Recognizing Interpolation or Extrapolation<\/h2>\n<p id=\"fs-id1165135255246\">While the data for most examples does not fall perfectly on the line, the equation is our best guess as to how the relationship will behave outside of the values for which we have data. We use a process known as <strong>interpolation<\/strong> when we predict a value inside the domain and range of the data. The process of <strong>extrapolation<\/strong> is used when we predict a value outside the domain and range of the data.<\/p>\n<p id=\"fs-id1165137911398\">The graph below compares the two processes for the cricket-chirp data addressed in Example 2. We can see that interpolation would occur if we used our model to predict temperature when the values for chirps are between 18.5 and 44. Extrapolation would occur if we used our model to predict temperature when the values for chirps are less than 18.5 or greater than 44.<\/p>\n<p id=\"fs-id1165137399663\">There is a difference between making predictions inside the domain and range of values for which we have data and outside that domain and range. Predicting a value outside of the domain and range has its limitations. When our model no longer applies after a certain point, it is sometimes called <strong>model breakdown<\/strong>. For example, predicting a cost function for a period of two years may involve examining the data where the input is the time in years and the output is the cost. But if we try to extrapolate a cost when <em>x<\/em> = 50, that is in 50 years, the model would not apply because we could not account for factors fifty years in the future.<\/p>\n<div style=\"width: 497px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0042.jpg\" alt=\"Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"487\" height=\"430\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 4.<\/b> Interpolation occurs within the domain and range of the provided data whereas extrapolation occurs outside.<\/p>\n<\/div>\n<div id=\"fs-id1165137447983\" class=\"note textbox\">\n<h3 class=\"title\">A General Note: Interpolation and Extrapolation<\/h3>\n<p id=\"fs-id1165135646198\">Different methods of making predictions are used to analyze data.<\/p>\n<ul id=\"fs-id1165137666284\">\n<li>The method of <strong>interpolation<\/strong> involves predicting a value inside the domain and\/or range of the data.<\/li>\n<li>The method of <strong>extrapolation<\/strong> involves predicting a value outside the domain and\/or range of the data.<\/li>\n<li><strong>Model breakdown<\/strong> occurs at the point when the model no longer applies.<\/li>\n<\/ul>\n<\/div>\n<div id=\"Example_02_04_03\" class=\"example\">\n<div id=\"fs-id1165137584562\" class=\"exercise\">\n<div id=\"fs-id1165137734950\" class=\"problem textbox shaded\">\n<h3>Example 3: Understanding Interpolation and Extrapolation<\/h3>\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><strong>Chirps<\/strong><\/td>\n<td>44<\/td>\n<td>35<\/td>\n<td>20.4<\/td>\n<td>33<\/td>\n<td>31<\/td>\n<td>35<\/td>\n<td>18.5<\/td>\n<td>37<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td><strong>Temperature<\/strong><\/td>\n<td>80.5<\/td>\n<td>70.5<\/td>\n<td>57<\/td>\n<td>66<\/td>\n<td>68<\/td>\n<td>72<\/td>\n<td>52<\/td>\n<td>73.5<\/td>\n<td>53<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p id=\"fs-id1165137668321\">Use the cricket data above\u00a0to answer the following questions:<\/p>\n<ol id=\"fs-id1165137843776\">\n<li>Would predicting the temperature when crickets are chirping 30 times in 15 seconds be interpolation or extrapolation? Make the prediction, and discuss whether it is reasonable.<\/li>\n<li>Would predicting the number of chirps crickets will make at 40 degrees be interpolation or extrapolation? Make the prediction, and discuss whether it is reasonable.<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q775750\">Show Solution<\/span><\/p>\n<div id=\"q775750\" class=\"hidden-answer\" style=\"display: none\">\n<ol id=\"fs-id1165137779181\">\n<li>The number of chirps in the data provided varied from 18.5 to 44. A prediction at 30 chirps per 15 seconds is inside the domain of our data, so would be interpolation. Using our model:\n<div id=\"eip-id1165131886691\" class=\"equation unnumbered\" style=\"text-align: center;\">[latex]\\begin{align}T\\left(30\\right)&=30+1.2\\left(30\\right) \\\\ &=66\\text{degrees} \\end{align}[\/latex]<\/div>\n<p>Based on the data we have, this value seems reasonable.<\/li>\n<li>The temperature values varied from 52 to 80.5. Predicting the number of chirps at 40 degrees is extrapolation because 40 is outside the range of our data. Using our model:\n<div id=\"eip-id1165135551132\" class=\"equation unnumbered\" style=\"text-align: center;\">[latex]\\begin{align}40&=30+1.2c \\\\ 10&=1.2c \\\\ c&\\approx 8.33 \\end{align}[\/latex]<\/div>\n<\/li>\n<\/ol>\n<p id=\"fs-id1165137434356\">We can compare the regions of interpolation and extrapolation using the graph below.<\/p>\n<figure id=\"CNX_Precalc_Figure_02_04_005\">\n<div style=\"width: 495px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0052.jpg\" alt=\"Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"485\" height=\"429\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 5<\/b><\/p>\n<\/div>\n<\/figure>\n<h4>Analysis of the Solution<\/h4>\n<p id=\"fs-id1165137580571\">Our model predicts the crickets would chirp 8.33 times in 15 seconds. While this might be possible, we have no reason to believe our model is valid outside the domain and range. In fact, generally crickets stop chirping altogether below around 50 degrees.<\/p>\n<\/div>\n<\/div>\n<p><b><\/b><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"bcc-box bcc-success\">\n<h3>Try It<\/h3>\n<p id=\"fs-id1165135695208\">According to the data from the table in Example 3, what temperature can we predict it is if we counted 20 chirps in 15 seconds?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q326758\">Show Solution<\/span><\/p>\n<div id=\"q326758\" class=\"hidden-answer\" style=\"display: none\">\n<p>[latex]54^\\circ \\text{F}[\/latex]<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n<section id=\"fs-id1165137414565\">\n<h2 style=\"text-align: center;\">Finding the Line of Best Fit Using a Graphing Utility<\/h2>\n<p id=\"fs-id1165137598255\">While eyeballing a line works reasonably well, there are statistical techniques for fitting a line to data that minimize the differences between the line and data values.<a class=\"footnote\" title=\"Technically, the method minimizes the sum of the squared differences in the vertical direction between the line and the data values.\" id=\"return-footnote-13821-2\" href=\"#footnote-13821-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a> One such technique is called <strong>least squares regression<\/strong> and can be computed by many graphing calculators, spreadsheet software, statistical software, and many web-based calculators.<a class=\"footnote\" title=\"For example, http:\/\/www.shodor.org\/unchem\/math\/lls\/leastsq.html\" id=\"return-footnote-13821-3\" href=\"#footnote-13821-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a> Least squares regression is one means to determine the line that best fits the data, and here we will refer to this method as linear regression.<\/p>\n<div id=\"fs-id1165137534286\" class=\"note precalculus howto textbox\">\n<h3 id=\"fs-id1165137472097\">How To: Given data of input and corresponding outputs from a linear function, find the best fit line using linear regression.<\/h3>\n<ol id=\"fs-id1165135192714\">\n<li>Enter the input in List 1 (L1).<\/li>\n<li>Enter the output in List 2 (L2).<\/li>\n<li>On a graphing utility, select Linear Regression (LinReg).<\/li>\n<\/ol>\n<\/div>\n<div id=\"Example_02_04_04\" class=\"example\">\n<div id=\"fs-id1165137557131\" class=\"exercise\">\n<div id=\"fs-id1165137862815\" class=\"problem textbox shaded\">\n<h3>Example 4: Finding a Least Squares Regression Line<\/h3>\n<p>Find the least squares regression line using the cricket-chirp data in the table below.<\/p>\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><strong>Chirps<\/strong><\/td>\n<td>44<\/td>\n<td>35<\/td>\n<td>20.4<\/td>\n<td>33<\/td>\n<td>31<\/td>\n<td>35<\/td>\n<td>18.5<\/td>\n<td>37<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td><strong>Temperature<\/strong><\/td>\n<td>80.5<\/td>\n<td>70.5<\/td>\n<td>57<\/td>\n<td>66<\/td>\n<td>68<\/td>\n<td>72<\/td>\n<td>52<\/td>\n<td>73.5<\/td>\n<td>53<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q98516\">Show Solution<\/span><\/p>\n<div id=\"q98516\" class=\"hidden-answer\" style=\"display: none\">\n<ol id=\"fs-id1165137415920\">\n<li>Enter the input (chirps) in List 1 (L1).<\/li>\n<li>Enter the output (temperature) in List 2 (L2). See the table below.<br \/>\n<table id=\"Table_02_04_02\" summary=\"Two rows and ten columns. The first row is labeled, 'L1'. The second row is labeled is labeled, 'L2'. Reading the remaining rows as ordered pairs (i.e., (L2, L2), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><strong>L1<\/strong><\/td>\n<td>44<\/td>\n<td>35<\/td>\n<td>20.4<\/td>\n<td>33<\/td>\n<td>31<\/td>\n<td>35<\/td>\n<td>18.5<\/td>\n<td>37<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td><strong>L2<\/strong><\/td>\n<td>80.5<\/td>\n<td>70.5<\/td>\n<td>57<\/td>\n<td>66<\/td>\n<td>68<\/td>\n<td>72<\/td>\n<td>52<\/td>\n<td>73.5<\/td>\n<td>53<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/li>\n<li>On a graphing utility, select Linear Regression (LinReg). Using the cricket chirp data from earlier, with technology we obtain the equation:\n<div id=\"fs-id1140825\" class=\"equation unnumbered\" style=\"text-align: center;\">[latex]T\\left(c\\right)=30.281+1.143c[\/latex]<\/div>\n<\/li>\n<\/ol>\n<h3>Analysis of the Solution<\/h3>\n<p id=\"fs-id1165137627652\">Notice that this line is quite similar to the equation we &#8220;eyeballed&#8221; but should fit the data better. Notice also that using this equation would change our prediction for the temperature when hearing 30 chirps in 15 seconds from 66 degrees to:<\/p>\n<p style=\"text-align: center;\">[latex]\\begin{align}T\\left(30\\right)&=30.281+1.143\\left(30\\right) \\\\ &=64.571 \\\\ &\\approx 64.6\\text{ degrees} \\end{align}[\/latex]<\/p>\n<p>The graph of the scatter plot with the least squares regression line is shown in below.<\/p>\n<div style=\"width: 497px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0062.jpg\" alt=\"Scatter plot, showing the line of best fit. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.\" width=\"487\" height=\"408\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 6<\/b><\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><span id=\"fs-id1165137692164\"><br \/>\n<\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"fs-id1165135260732\" class=\"note precalculus qa textbox\">\n<h3 id=\"fs-id1165137611629\">Q &amp; A<\/h3>\n<p><strong>Will there ever be a case where two different lines will serve as the best fit for the data?<\/strong><\/p>\n<p id=\"fs-id1165137507058\"><em>Although there are other ways to find &#8220;best fit&#8221; lines, we will always use the least squares regression line.<\/em><\/p>\n<\/div>\n<h2>Distinguishing Between Linear and Non-Linear Models<\/h2>\n<section id=\"fs-id1165137594434\">\n<p id=\"fs-id1165135160844\">Some data exhibit strong linear trends, but other data are nonlinear. Most calculators and computer software can also provide us with the <strong>correlation coefficient<\/strong>, which is a measure of how closely the line fits the data. Many graphing calculators require the user to turn a &#8220;diagnostic on&#8221; selection to find the correlation coefficient, which mathematicians label as <em>r<\/em>. The correlation coefficient provides an easy way to get an idea of how close to a line the data falls.<\/p>\n<p>We should compute the correlation coefficient only for data that follows a linear pattern or to determine the degree to which a data set is linear. If the data exhibits a nonlinear pattern, the correlation coefficient for a linear regression is meaningless. To get a sense for the relationship between the value of <em>r<\/em>\u00a0and the graph of the data, the image below\u00a0shows some large data sets with their correlation coefficients. Remember, for all plots, the horizontal axis shows the input and the vertical axis shows the output.<\/p>\n<div style=\"width: 911px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010700\/CNX_Precalc_Figure_02_04_0072.jpg\" alt=\"A series of scatterplot graphs. Some are linear and some are not.\" width=\"901\" height=\"401\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 7.<\/b> Plotted data and related correlation coefficients. (credit: &#8220;DenisBoigelot,&#8221; Wikimedia Commons)<\/p>\n<\/div>\n<div id=\"fs-id1165137443573\" class=\"note textbox\">\n<h3 class=\"title\">A General Note: Correlation Coefficient<\/h3>\n<p id=\"fs-id1165137416387\">The <strong>correlation coefficient<\/strong> is a value, <em>r<\/em>, between \u20131 and 1.<\/p>\n<ul id=\"eip-id1165133093343\">\n<li><em>r<\/em> &gt; 0 suggests a positive (increasing) relationship<\/li>\n<li><em>r<\/em> &lt; 0 suggests a negative (decreasing) relationship<\/li>\n<li>The closer the value is to 0, the more scattered the data.<\/li>\n<li>The closer the value is to 1 or \u20131, the less scattered the data is.<\/li>\n<\/ul>\n<\/div>\n<div id=\"Example_02_04_05\" class=\"example\">\n<div id=\"fs-id1165137387185\" class=\"exercise\">\n<div id=\"fs-id1165137680583\" class=\"problem textbox shaded\">\n<h3>Example 5: Finding a Correlation Coefficient<\/h3>\n<p id=\"fs-id1165137734908\">Calculate the correlation coefficient for cricket-chirp data in the table below.<\/p>\n<table id=\"Table_02_04_01\" summary=\"Two rows and ten columns. The first row is labeled, 'chirps'. The second row is labeled is labeled, 'Temp'. Reading the remaining rows as ordered pairs (i.e., (chirps, Temp), we have the following values: (44, 80.5), (35, 70.5), (20.4, 57), (33, 66), (31, 68), (35, 72), (18.5, 52), (37, 73.5) and (26, 53).\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td><strong>Chirps<\/strong><\/td>\n<td>44<\/td>\n<td>35<\/td>\n<td>20.4<\/td>\n<td>33<\/td>\n<td>31<\/td>\n<td>35<\/td>\n<td>18.5<\/td>\n<td>37<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td><strong>Temperature<\/strong><\/td>\n<td>80.5<\/td>\n<td>70.5<\/td>\n<td>57<\/td>\n<td>66<\/td>\n<td>68<\/td>\n<td>72<\/td>\n<td>52<\/td>\n<td>73.5<\/td>\n<td>53<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q201965\">Show Solution<\/span><\/p>\n<div id=\"q201965\" class=\"hidden-answer\" style=\"display: none\">\n<p id=\"fs-id1165137471985\">Because the data appear to follow a linear pattern, we can use technology to calculate <em>r<\/em>. Enter the inputs and corresponding outputs and select the Linear Regression. The calculator will also provide you with the correlation coefficient, <em>r\u00a0<\/em>= 0.9509. This value is very close to 1, which suggests a strong increasing linear relationship.<\/p>\n<p id=\"fs-id1165137473276\">Note: For some calculators, the Diagnostics must be turned &#8220;on&#8221; in order to get the correlation coefficient when linear regression is performed: [2nd]&gt;[0]&gt;[alpha][x\u20131], then scroll to DIAGNOSTICSON.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/section>\n<h2 class=\"note precalculus try\" style=\"text-align: center;\">Predicting with a Regression Line<\/h2>\n<p id=\"fs-id1165137436049\">Once we determine that a set of data is linear using the correlation coefficient, we can use the regression line to make predictions. As we learned previously, a regression line is a line that is closest to the data in the scatter plot, which means that only one such line is a best fit for the data.<\/p>\n<div id=\"Example_02_04_06\" class=\"example\">\n<div id=\"fs-id1165137571292\" class=\"exercise\">\n<div id=\"fs-id1165135546014\" class=\"problem textbox shaded\">\n<h3>Example 6: Using a Regression Line to Make Predictions<\/h3>\n<p id=\"fs-id1165135191292\">Gasoline consumption in the United States has been steadily increasing. Consumption data from 1994 to 2004 is shown in the table below.<a class=\"footnote\" title=\"http:\/\/www.bts.gov\/publications\/national_transportation_statistics\/2005\/html\/table_04_10.html\" id=\"return-footnote-13821-4\" href=\"#footnote-13821-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a> Determine whether the trend is linear, and if so, find a model for the data. Use the model to predict the consumption in 2008.<\/p>\n<table id=\"Table_02_04_03\" summary=\"Two rows and twelve columns. The first row is labeled, 'Year'. The second row is labeled is labeled, 'Consumption (billions of gallons)'. Reading the remaining rows as ordered pairs (i.e., (Year, Consumption), we have the following values: ('94, 113), ('95, 116), ('96, 118), ('97, 119), ('98, 123), ('99, 125), ('00, 126), ('01, 128), ('02, 131), ('03, 133), and ('04, 136).\">\n<tbody>\n<tr>\n<td><strong>Year<\/strong><\/td>\n<td>&#8217;94<\/td>\n<td>&#8217;95<\/td>\n<td>&#8217;96<\/td>\n<td>&#8217;97<\/td>\n<td>&#8217;98<\/td>\n<td>&#8217;99<\/td>\n<td>&#8217;00<\/td>\n<td>&#8217;01<\/td>\n<td>&#8217;02<\/td>\n<td>&#8217;03<\/td>\n<td>&#8217;04<\/td>\n<\/tr>\n<tr>\n<td><strong>Consumption (billions of gallons)<\/strong><\/td>\n<td>113<\/td>\n<td>116<\/td>\n<td>118<\/td>\n<td>119<\/td>\n<td>123<\/td>\n<td>125<\/td>\n<td>126<\/td>\n<td>128<\/td>\n<td>131<\/td>\n<td>133<\/td>\n<td>136<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The scatter plot of the data, including the least squares regression line, is shown in Figure 8.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"width: 497px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/1227\/2015\/04\/03010701\/CNX_Precalc_Figure_02_04_0082.jpg\" alt=\"Scatter plot, showing the line of best fit. It is titled 'Gas Consumption VS Year'. The x-axis is 'Year After 1994', and the y-axis is 'Gas Consumption (billions of gallons)'.\" width=\"487\" height=\"384\" \/><\/p>\n<p class=\"wp-caption-text\"><b>Figure 8<\/b><\/p>\n<\/div>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q490501\">Show Solution<\/span><\/p>\n<div id=\"q490501\" class=\"hidden-answer\" style=\"display: none\">\n<p id=\"fs-id1165137767360\">We can introduce new input variable, <em>t<\/em>, representing years since 1994.<\/p>\n<p id=\"fs-id1165137552875\">The least squares regression equation is:<\/p>\n<p style=\"text-align: center;\">[latex]C\\left(t\\right)=113.318+2.209t[\/latex]<\/p>\n<p id=\"fs-id1165137767812\">Using technology, the correlation coefficient was calculated to be 0.9965, suggesting a very strong increasing linear trend.<\/p>\n<p id=\"fs-id1165137444077\">Using this to predict consumption in 2008 [latex]\\left(t=14\\right)[\/latex],<\/p>\n<p style=\"text-align: center;\">[latex]\\begin{align}C\\left(14\\right)&=113.318+2.209\\left(14\\right) \\\\ &=144.244 \\end{align}[\/latex]<\/p>\n<p id=\"fs-id1165135207471\">The model predicts 144.244 billion gallons of gasoline consumption in 2008.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"bcc-box bcc-success\">\n<h3>Try It<\/h3>\n<p id=\"fs-id1165137600643\">Use the model we created using technology in Example 6\u00a0to predict the gas consumption in 2011. Is this an interpolation or an extrapolation?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q11412\">Show Solution<\/span><\/p>\n<div id=\"q11412\" class=\"hidden-answer\" style=\"display: none\">\n<p>150.871 billion gallons; extrapolation<\/p>\n<\/div>\n<\/div>\n<\/div>\n<h2>Key Concepts<\/h2>\n<ul id=\"fs-id1165137785014\">\n<li>Scatter plots show the relationship between two sets of data.<\/li>\n<li>Scatter plots may represent linear or non-linear models.<\/li>\n<li>The line of best fit may be estimated or calculated, using a calculator or statistical software.<\/li>\n<li>Interpolation can be used to predict values inside the domain and range of the data, whereas extrapolation can be used to predict values outside the domain and range of the data.<\/li>\n<li>The correlation coefficient, <span id=\"MathJax-Element-329-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-4867\" class=\"math\"><span id=\"MathJax-Span-4868\" class=\"mrow\"><span id=\"MathJax-Span-4869\" class=\"semantics\"><span id=\"MathJax-Span-4870\" class=\"mrow\"><span id=\"MathJax-Span-4871\" class=\"mrow\"><em><span id=\"MathJax-Span-4872\" class=\"mi\">r<\/span><\/em><span id=\"MathJax-Span-4873\" class=\"mo\">,<\/span><\/span><\/span><\/span><\/span><\/span><\/span> indicates the degree of linear relationship between data.<\/li>\n<li>A regression line best fits the data.<\/li>\n<li>The least squares regression line is found by minimizing the squares of the distances of points from a line passing through the data and may be used to make predictions regarding either of the variables.<\/li>\n<\/ul>\n<h2>Glossary<\/h2>\n<dl id=\"fs-id1165137705061\" class=\"definition\">\n<dt><strong>correlation coefficient<\/strong><\/dt>\n<dd id=\"fs-id1165135250649\">a value, <em>r<\/em>, between \u20131 and 1 that indicates the degree of linear correlation of variables, or how closely a regression line fits a data set.<\/dd>\n<\/dl>\n<dl id=\"fs-id1165137549428\" class=\"definition\">\n<dt><strong>extrapolation<\/strong><\/dt>\n<dd id=\"fs-id1165135485274\">predicting a value outside the domain and range of the data<\/dd>\n<\/dl>\n<dl id=\"fs-id1165135485278\" class=\"definition\">\n<dt><strong>interpolation<\/strong><\/dt>\n<dd id=\"fs-id1165135184191\">predicting a value inside the domain and range of the data<\/dd>\n<\/dl>\n<dl id=\"fs-id1165137761665\" class=\"definition\">\n<dt><strong>least squares regression<\/strong><\/dt>\n<dd id=\"fs-id1165135192379\">a statistical technique for fitting a line to data in a way that minimizes the differences between the line and data values<\/dd>\n<\/dl>\n<dl id=\"fs-id1165137446440\" class=\"definition\">\n<dt><strong>model breakdown<\/strong><\/dt>\n<dd id=\"fs-id1165137446445\">when a model no longer applies after a certain point<\/dd>\n<\/dl>\n<\/div>\n<\/div>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-13821\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Specific attribution<\/div><ul class=\"citation-list\"><li>Precalculus. <strong>Authored by<\/strong>: OpenStax College. <strong>Provided by<\/strong>: OpenStax. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/cnx.org\/contents\/fd53eae1-fa23-47c7-bb1b-972349835c3c@5.175:1\/Preface\">http:\/\/cnx.org\/contents\/fd53eae1-fa23-47c7-bb1b-972349835c3c@5.175:1\/Preface<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section><hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-13821-1\">Selected data from <a href=\"http:\/\/classic.globe.gov\/fsl\/scientistsblog\/2007\/10\/\" target=\"_blank\" rel=\"noopener\">http:\/\/classic.globe.gov\/fsl\/scientistsblog\/2007\/10\/<\/a>. Retrieved Aug 3, 2010 <a href=\"#return-footnote-13821-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-13821-2\">Technically, the method minimizes the sum of the squared differences in the vertical direction between the line and the data values. <a href=\"#return-footnote-13821-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-13821-3\">For example, <a href=\"http:\/\/www.shodor.org\/unchem\/math\/lls\/leastsq.html\" target=\"_blank\" rel=\"noopener\">http:\/\/www.shodor.org\/unchem\/math\/lls\/leastsq.html<\/a> <a href=\"#return-footnote-13821-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-13821-4\"><a href=\"http:\/\/www.bts.gov\/publications\/national_transportation_statistics\/2005\/html\/table_04_10.html\" target=\"_blank\" rel=\"noopener\">http:\/\/www.bts.gov\/publications\/national_transportation_statistics\/2005\/html\/table_04_10.html<\/a> <a href=\"#return-footnote-13821-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":23588,"menu_order":4,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc-attribution\",\"description\":\"Precalculus\",\"author\":\"OpenStax College\",\"organization\":\"OpenStax\",\"url\":\"http:\/\/cnx.org\/contents\/fd53eae1-fa23-47c7-bb1b-972349835c3c@5.175:1\/Preface\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-13821","chapter","type-chapter","status-publish","hentry"],"part":10717,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/chapters\/13821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/wp\/v2\/users\/23588"}],"version-history":[{"count":6,"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/chapters\/13821\/revisions"}],"predecessor-version":[{"id":15834,"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/chapters\/13821\/revisions\/15834"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/parts\/10717"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/chapters\/13821\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/wp\/v2\/media?parent=13821"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/pressbooks\/v2\/chapter-type?post=13821"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/wp\/v2\/contributor?post=13821"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/precalculus\/wp-json\/wp\/v2\/license?post=13821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}