## Section 2: Avoiding confounded parameters—don’t use two parameters to control the same thing

Models formed by adding two basic models together are very useful, but in some cases a problem arises because both models have parameters that control the same thing (e.g., vertical offset). When this is true, there is not any “best-fit” solution for these parameters, since any combination of vertical-offset values that gives a good fit could be replaced by other values which add up to the same thing. In such a situation, what values Solver will find for these “confounded” parameters depends unpredictably on their initial settings.

This problem can be avoided by eliminating one of the confounded parameters. If the compound model is the sum of a linear model and a sinusoidal model, for example, the linear intercept parameter and the sinusoidal baseline parameter both control the vertical offset. In this case, it would be best to leave out the sinusoidal baseline parameter (use only wavelength, amplitude, and phase), because the natural way to think about data of this kind is as a straight line with sinusoidal deviations.

Example 3: A series of monthly calibration measurements of the bias in pounds of an outdoor scale produces the data shown to the right. An examination of the graph of the data (shown below) indicates that its pattern is a combination of a gradual multi-year trend (probably due to wear of some part) and a repeating seasonal variation (probably due to temperature variation). Use a compound model combining a linear model with sinusoidal variation to [i] determine the annual rate of change shown by the multi-year trend and [ii] to predict the bias 8 months after the last data point shown.

Solution:

The compound model should have the two linear parameters of intercept and slope, but it should use only three of the four sinusoidal parameters, since the average parameter is added on to the result in the same way as the intercept. So the formula of the model is y = intercept + x * slope + amplitude*sin(2π*(x+phase)/wavelength), so C3 needs to be “=$G$3+A3*$G$4+$G$6*SIN(2*PI()*(A3+$G$7)/$G$5)”, which is to be spread down beside the data as usual.

Once the spreadsheet is set up, we can use Solver to minimize the sum of squared deviations by changing the five parameters G3:G7.

When you have a lot of parameters, it becomes more important to start Solver with initial values that ensure that are reasonably correct, so that Solver does not get lost in its search for the best-fit values. This is not difficult if you use the data+model graph for feedback, and make some simple estimates from the data. There is no need to start with a close match – Solver will do that work – but you want to avoid situations where Solver tries a very incorrect value for a parameter like wavelength, for example.

In this case, it will work to use values such as 1.0 for the intercept, 12 for the wavelength (since the temperature variations should have a 12-month cycle), and about 0.1 for the amplitude (since the valley-to-peak variation is about 0.2. An initial slope of zero is okay, since the true value is a small positive number, and an initial phase value of about 8 looks about right on the graph.

 Month Bias 1 0.712 2 0.749 3 0.800 4 0.895 5 0.937 6 0.998 7 1.015 8 0.966 9 0.950 10 0.847 11 0.788 12 0.752 13 0.783 14 0.794 15 0.857 16 0.933 17 0.992 18 1.071 19 1.072 20 1.031 21 1.006 22 0.895 23 0.871 24 0.797 25 0.825 26 0.841 27 0.861 28 0.982 29 1.050 30 1.116 31 1.136

Solver produces these results:

 Line+Sinusoidal Parameters 0.837518 Intercept 0.004593 Slope 12.14912 Wavelength 0.143756 Amplitude 8.580468 Phase offset Goodness of fit of this model 0.006831 Sum of squared dev. 0.016209 Standard deviation

Answers: The fitting results show that the multi-year trend in the bias is 0.004593 pounds per month (the linear slope parameter), which is about 0.055 pounds per year. Evaluating the model at 39 months (8 months after the last data value) gives a prediction for bias at that time of about 0.945 pounds.

Note that the predicted bias value at 39 months is lower than the bias shown in the last data value, indicating that between these times the seasonal variation is larger than the long-term upward trend. Another interesting aspect of this problem is that simply fitting a linear model to the data would not have given good results; since the 2½-year pattern includes three upward-sloping segments and only two downward-sloping segments, a plain linear model would give a result more than 20% too large for the long-term trend.

Confounded-parameter problems can usually be avoided by thinking about the meaning of each parameter in the context of the data. If the information it conveys is already being supplied by an earlier parameter, leave it out. In the example below, a three-stage process (with two transitions) can be modeled by adding two logistic models. Since a logistic model has four parameters (rate, center, height, and floor), one might expect eight parameters in the compound model. But are any of these parameters redundant?

Example 4: The data to the right record the depth of the cut of a milling machine during a portion of a production run. Fit an appropriate model to the data to make estimates of the time, to the nearest millisecond, of the midpoint of each transition.

Solution:

Step transitions can be modeled by logistic functions; for this data, the sum of two logistic functions would be suitable.

The horizontal asymptotes in a logistic graph are controlled by the floor and height parameters. But in this sum-of-logistics model, floor2 (the floor of the second logistic) will equal floor1+height1, and does not need a separate parameter in the model.

In this case, it appears that the model can be further simplified by using the same height and rate parameters for both transitions, leaving the model with five parameters: floor, height, rate, center1 ,and center2, (use cells G3–G7 for these).

 Secs Depth 20.0 28.46 20.1 28.47 20.2 28.50 20.3 28.58 20.4 28.61 20.5 28.69 20.6 28.63 20.7 29.89 20.8 38.27 20.9 56.00 21.0 61.44 21.1 62.05 21.2 62.34 21.3 62.35 21.4 62.37 21.5 62.22 21.6 62.19 21.7 62.37 21.8 62.43 21.9 62.49 22.0 64.81 22.1 78.21 22.2 92.96 22.3 95.77 22.4 95.95 22.5 95.93 22.6 95.99 22.7 96.08 22.8 96.06 22.9 96.20 23.0 96.25

The model to be used is thus
y = floor + height / (1+0.018316^(rate*(x–center1))) + height / (1+0.018316^(rate*(x–center2)))

This means that the formula to be placed into C3 (and spread down beside the data) should be =$G$3+$G$4/(1+0.018316^($G$5*(A3-$G$6)))+$G$4/(1+0.018316^($G$5*(A3-$G$7)))

Set initial values for the parameters from the data, checking them by looking at the data+model graph. In this case, the floor is about 30, the step height is about 30, and the transition centers are at roughly 21 and 22. The rate is positive, so start with 1 (which the graph shows is too slow) and adjust it to 10.