I1.01: Overview

Topic I—Linear and Quadratic Models

Objectives:

  1. Recognize when a dataset shows a relationship between the variables that is approximately linear.
  2. Use a spreadsheet to adjust the intercept and slope parameters of a linear formula so that the graph of corresponding points on the resulting line are close to the points graphed from a data set.
  3. Use linear formula that best fits the data as a model for the data, predicting the output y value for any specified input x value.
  4. Recognize when a dataset shows a relationship between the variables that is approximately quadratic.
  5. Use a spreadsheet to adjust the location and scale parameters of a quadratic formula so that the graph of corresponding points on the resulting parabola are close to the points graphed from a data set.
  6. Use the quadratic formula that best fits the data as a model for the data, predicting the output y value for any specified input x value.
  7. Distinguish between appropriate and inappropriate extrapolation of a model.

Overview

In previous topics we have dealt with numbers produced from formulas, and separately with datasets showing the relationships between two variables. Now we are going to combine these perspectives and find formulas that approximately match the relationship between variables. Such formulas are models of the measurement data, and their graph will pass close to the data points.

The model formula is used to predict output values. In this topic we will examine models that are linear (that is, their graphs are straight lines), as well as one kind of non-linear model.

The models will not match the data exactly. There will always be some noise due to unavoidable random errors in the data-measurement process. Also, sometimes the actual pattern underlying the data will not match the model’s formula (e.g., if the data has a curved graph and the model is a straight line). In that case even the best linear model will have to go above the data in some areas and below it in others.

Just as we computed deviations from the average when we analyzed the noise in repeated measurements, we will compute deviations from the model when we are trying to decide how well a particular model fits a dataset. A standard deviation based on these deviation values will be a numerical measure of how good the model is. We can also at the deviations to see if the model is too simple, since in an over-simple model most adjacent deviation values will have the same sign, positive or negative (in the correct model, the data will be randomly above or below the model values).

The data variable you want your model to predict should be used for the output y values in the dataset. Thus the other variable should be used for the input x values. Occasionally it is reasonable to also make use of the inverse model, where the role of the data variables is reversed and the second variable is used to predict the first one. If a model is linear, the inverse model for that data is also linear.

Note that which data variable is modeled as output can be different for people with different goals. One person might want to use temperature measurements to predict how long a metal bar will be, while someone else might to use the measured length of the bar to estimate what the temperature is. Both people could use the same set of calibration data, but would assign different x and y roles to the data variables when they make their predictive models.

In this topic we will focus on two simple models (linear and quadratic formulas), but the techniques shown will work in almost exactly the same way for fitting any kind of mathematical model to data. Some other useful models will be discussed in later topics.