L1.07: Section 6

Section 6: Distinguishing between linear and gently-curved data by examining deviation values

While many data relationships are linear, it is also not unusual to have relationships that are close to linear, but have a slight curve in the pattern, too small to be obvious in the graph, due to some small additional nonlinear effect. Linear-modeling spreadsheets often make it possible to detect such a situation by looking at how positive and negative deviation values are distributed (in column C in Models.xls) when the best-fit linear model has been found.

If such data is nonlinear and also low-noise, most deviations around the middle third of the input-value range will have the same sign, with the deviations at each end of the input-value range mostly having the opposite sign. This is similar to the obvious visual effect that fitting a line to strongly parabolic data would produce, but can work even when the deviations are small, as long as they are larger than the noise.

Another way of looking for this effect is to do a scatter plot of the input values and the deviations (for spreadsheets like Models.xls, this is columns A and D, with columns B and C left out). If such a residual graph of the deviations from the best-fit linear model displays a curved line rather than random noise, this is a sign that the data is significantly nonlinear. However, the automatic-scaling feature of spreadsheet graphs means that any outliers in the data will compress the scale so that most deviations look flat—this is why looking at the distribution of the signs of the deviations can be less confusing.