“In God we trust. All others must bring data.”
― W. Edwards Deming, statistician
So, you have estimated a bunch of forecasting models and realize (kudos to you!) that they are “all wrong” (ala George Box).
But your forecasting deadline is looming, and you need to find some useful models on which to base a forecast.
How do you decide which models make it to the next round?
Model building process
First, let’s review the forecast model build process:
Step 1: Determine what is the business need;
Step 2: Collect and examine your data; clean and adjust (e.g. frequency change) as necessary;
Step 3: Determine your forecast horizon (i.e. align with the business need);
Step 4: Determine and set aside your holdout sample;
Step 5: Estimate models using the non-holdout portion of your time series (i.e. the “modeling sample”);
Step 6: Gauge each model’s performance in the holdout sample;
Step 7: Recalibrate each model using the full historical sample;
Step 8: Make your forecast for the forecast horizon.
At the end of this process, you should have a few models that “pass muster,” that are potentially useful models.
But how do you whittle down all the models you tried to this select few?
Guidelines for selecting useful models
Here are some guidelines we follow:
Statistically Significant Parameters – Although one can argue that it is the prediction that matters, we still like to see model coefficients that are statistically significant with signs that can be explained. You may be asked to defend your model.
White Noise Residuals – When you estimate your model using the modeling sample, the residuals (difference between the actual and predicted values in the modeling sample) should have no apparent pattern to them. That is, there is no additional variation in the time series that can be explained by your model. What is left over is random or “white” noise.
Strong Holdout Sample Performance – Your model should produce low forecast error and exhibit low systematic bias in the holdout sample.
Robustness – When you recalibrate your model using the entire historical sample (modeling + holdout sample), your model should retain its statistical properties. That is, parameters are still significant with plausible signs and the residuals are still white noise.
Parsimony – If two models are equal in all performance respects except one is more complex than the other, we generally opt for the simpler model. Experience suggests that simpler models perform better when forecasting over the forecast horizon. And they are easier to interpret and explain to business decision makers.
Forecast Plausibility – The forecast produced by your model over the forecast horizon should be consistent with the available knowledge concerning the relevant business environment. In other words, the forecast needs to make sense. It is possible, following the steps above, to arrive at a high performing model which produces a counter intuitive forecast (e.g. declining SALES when the trend in SALES has been nothing but up).
At the end of this model building and testing process, you may have more than 1 model that can be used to generate your forecast. In a later article we will address what you can do in this situation.
The art of forecasting
Our experience is consistent with the opinion of others that there is still quite a bit of “art” to time series forecasting. Especially if you want it to meet a specific business need. Automated forecast routines exist. But we recommend that the process be closely supervised by a human to ensure a reasonable forecast.