L1.01: Overview

Automated Fitting of Models, Comparative Goodness of Fit, and Outliers

Objectives:

  1. Be able to adapt a modeling worksheet to compute a goodness-of-fit indicator from the deviation values between the data and the model.
  2. Be able to use the Solver add-in with to automatically find the model parameters which minimize the indicator, and thus fit the data as well as possible as measured by that indicator.
  3. Derive the standard deviation around the model from the sum of squared deviations, the number of deviation used in the sum, and the number of parameters in the model.
  4. Be able to use alternative goodness-of-fit measures such as maximum deviation.
  5. Select between different kinds of model by comparing best-fit standard deviations.
  6. Be able to use relative deviation to compute an alternative form of standard deviation
  7. Be able to identify data points that are outliers to the model implied by most of the data points, and to remove identified outliers from the model-fitting process when appropriate.
  8. Be able to make and use a modeling spreadsheet to fit any specified function to a dataset.

Overview—getting the computer to do the tedious part so that models can be applied more easily

Splitting the work of modeling. The modeling process has two parts, one requiring thought and the other just requiring patience. The thoughtful part is deciding what model makes sense to use, looking at the graphs to see if a better model is needed of if some data points should be handled specially, and interpreting what the final parameter values imply about the situation in which the data was taken. The second part of fitting is the tedious process of adjusting each parameter in turn until you find the best parameter values for the chosen kind of model.

Tedious tasks are what computers do best. In this topic you will learn to use Solver tool that can makes the computer quickly find the best-fit parameter values for a model. Choosing which kind of model to fit is still your job, but automation will give you more time to do it. Using an automated fitting process will also make it easier to use models that have more parameters, permitting models that are more realistic and can solve a wider range of problems.

What is the “best” fit? So far, we have relied on a person looking at the graph to decide when the parameter settings for a model gave a good fit. People are good at this kind of visual assessment, but computers are not — they are much better at dealing with numbers. So to automate the fitting task we will compute a number that measures how well a particular model fits the data we are working with.

Handling stray data points. It sometimes happens that a small subset of the points in a data set deviate substantially from the overall pattern. These “outlier” points sometimes convey important information, but other times simply reflect bad measurements. In either case, it is usually best to exclude them from the model-fitting process and report them separately. We will discuss how to do this without losing the efficiency of the automated-fitting approach.

Additional types of models. The model-fitting process that we have already applied to linear, quadratic, and exponential formulas can be used with any type of modeling formula. All that is needed for a new type of model formula is to enter the in C3 in spreadsheet format, with the parts that you wish to fit as parameters expressed as absolute references to column G cells. In this topic we will see how to quickly adapt a modeling spreadsheet to fit many different formulas or to provide model enhancements such as a non-zero baseline for an exponential model. In later topics we will extend our list of basic model formulas and discuss how these combinations of these basic models can be used to make predictive formulas for a variety of realistic situations.