Section 4: Adjusting the input variables to simplify the intercept parameter of linear models
For the sediment data, both parameters for the model have natural meanings: the slope is the daily rate of sediment build-up, and the intercept is the sediment level immediately after cleaning. This is because the zero point of the input parameter, days since cleaning, has a meaning that is naturally related to the situation—zero corresponds to the date the cleaning took place.
However, sometimes the zero point for an input parameter is artificial, and has no natural relationship to the situation. When this is true, the intercept of a linear model for that data will not have a useful meaning. If the input parameter for a dataset of a company’s annual sales is the calendar year, for example, then the intercept of a linear model fit to that data will be the model’s “prediction” for the year 0, over 2000 years ago. This will probably be a very high or very low number that would be almost impossible to find just by making guesses to adjust the intercept parameter of the model. Even if found, the resulting model formula would be difficult to use because of the very large value it contains.
There are a couple of ways to avoid this problem. The one shown below, changing the input variable from “Year” to “Years since 1990”, is the same technique you used in an earlier topic on graphing. When the input is redefined in this way, it becomes as simple to find the model as it was for the sediment data. In a later topic we will show how to adjust the model itself to give the same effect.
Example 4: Find a good linear model for this data, redefining the input parameter as needed, and use the model to predict sales in 2010.
To see what trouble was avoided by redefining the input variable, use this model to “predict” sales in the year 0, which of course is 1990 years before 1990. Since 26∙(−1990) + 460 = −51,280, the corresponding sales model for the unmodified input values would be y = 26 x − 51,280, whose intercept value would be very difficult to guess.
An alternate way of dealing with this same problem is to modify the model formula itself, substituting (x − 1900) for x. If this were done, then the same model would be y = 26∙(x −1990) + 460. This approach has the advantage of not requiring a change in the data, which means that the horizontal scale on the graph would show the year number. On the other hand, the model would have a more complicated formula. Both techniques are used. We will examine this and other ways of modifying the model formula in a later topic.