Practical Time Series Forecasting – Meta Models

By KDD | February 5, 2018

“There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.”
― John Kenneth Galbraith

After an extensive model building and vetting process, along the lines we previously discussed here and here, the practical forecaster may still be left with several strong performing models.

These models perform similarly in the holdout sample tests. They retain their statistical properties when recalibrated on the full historical sample. But they yield different forecast paths over the forecast horizon.

Any one of the models could be easily defended. But the fact that the models yield different forecasts should make the forecaster pause.

An example

Below is an example of 3 short-run monthly forecasts:

The 3 models perform similarly in the holdout sample. One of the models is a least squares model. The other 2 are ARIMA models.

One model produces a steeply declining forecast. Another a slightly declining forecast. The third model produces an increasing forecast.

What should the forecaster do?

How can this happen?

Models are just that – models. They are abstractions from reality. And no single model will “fit” the holdout sample perfectly.

Two models, especially of different types (e.g. least squares vs. ARIMA), could have very similar holdout sample performance but differ dramatically in their forecast over the forecast horizon.

The holdout sample MAPE (mean absolute percentage error) could be very similar for these models. But the MAPE is an average error across the holdout sample. And the models could have arrived at their MAPEs by focusing on different aspects of the time series in the holdout sample.

Projecting these differences into the forecast horizon can result in very different forecasts.

Solutions

When there is no clear “champion” model, one solution is to combine the forecasts into one. We call this a “meta” forecast.

There are several ways this can be accomplished.

Checkpoint

But first, check to make sure the models to be combined are not “nested.” That is, one model is not a subset of another. If models are nested there usually is no advantage to combining their forecasts into a meta forecast.

In fact, a meta forecast will more likely be superior the greater the differences between the constituent models.

A meta forecast based on a least squares model and an ARIMA model will likely yield a smaller forecast error than that associated with either of the two models. However, if the two models were both least squares models, the superiority of a meta forecast might be questionable (Granger, 1989).

Solution 1

The simplest approach to arriving at a meta forecast is to simply average the forecasts of the individual models.

This essentially assumes that each model’s forecast is equally important in the meta forecast (i.e. receives equal weighting). This is a quick and uncomplicated way to generate a meta forecast.

Solution 2

Another approach makes use of each model’s holdout sample performance measures of forecast accuracy and bias. A weighting for each model’s forecast can be calculated using each model’s MAPE and MPE (mean percentage error) relative to that of all the models combined.

The meta forecast would then be a weighted average of the individual model forecasts. Models with lower MAPE and MPE would receive higher weights and contribute more to the meta forecast.

Solution 3

A third approach is to use regression to estimate the weights.

Using the holdout sample, or if too small, the full sample, regress the actual value on the forecasted value from each model. The goal is to find a regression with no constant and all regression coefficients positive and statistically significant.

The regression coefficients should then sum very close to one. These coefficients then become the weights by which forecasts are combined into a meta forecast (see Wilson and Keating).

Back to our example

The forecaster could go with candidate 3 since it “splits the difference.” However, the forecaster is still left with the task of defending why the other two equally plausible models were not chosen.

Alternatively, a meta forecast can be used. As an example, we created a simple average forecast across the 3 candidate models. As discussed above, this assumes an equal weighting across the 3 short-run forecasts. A more sophisticated approach would have been to estimate the weights using a regression approach.

Not surprisingly, the meta forecast is quite like the essentially flat forecast of candidate 3 (which lies almost half way between candidate 1’s and 2’s forecast). But not all cases will be like this.

If a regression approach to estimating the weights was used, the meta forecast could be quite different from that of candidate 3.

Yes, the meta forecast will lie between the two forecast extremes. But the assumed or estimated weights will dictate where the meta forecast will lie.