# Practical Time Series Forecasting – Potentially Useful Models

“*All models are wrong, but some are useful.*”

― attributed to statistician **George Box**

This quote pretty well sums up time series forecasting models.

**Any given model is unlikely to be spot on. And some can be wildly off.**

But through a careful methodical process, we can **whittle** the pool of candidate models **down** **to a set of useful models,** if not a single preferred model.

When all is said and done, though, our guiding principle when building forecasting models is…**how well the model predicts**!

In practice, what this means for the types of models we consider is that **we don’t rule anything out**.

Yes, we have specific things we look for in an acceptable model (which we will cover later). But we don’t rule out a simple TIME trend model simply because it is too “simple.”

Our focus is on finding a forecasting model that can yield **defensible short-run forecasts in a cost-effective manner**.

### Potentially useful models

So what kind of models do we typically examine?

As discussed in a **previous article**, a time series such as monthly sales (SALES) can have 3 components: **trend, seasonal and cyclical**. So, the type of model we consider depends on the extent to which 1, 2 or all 3 of these dynamics are present.

There are 3 classes of models that we typically consider. We will use a bit of math here to describe these models…think back to the formula of a line you learned in algebra: Y = a + bX.

#### Regression models

First are **least squares regression** models. Using SALES as our example, we could have a TIME trend model with, say, quarterly seasonality if we were examining SALES by quarter:

SALES_{t} = b_{0} + b_{1}*TIME + b_{2}*Q1 + b_{3}*Q2 + b_{4}*Q3 + ε_{t}

Or a lagged least squares model with quarterly seasonality:

SALES_{t} = b_{0} + b_{1}*SALES_{t-1} + b_{2}*SALES_{t-2} + b_{3}*Q1 + b_{4}*Q2 +b_{5}*Q3 +ε_{t}

*In these model formulae, b _{0} is the “intercept.” b_{1}, b_{2},…etc. indicate the incremental effect (i.e. slope) on sales of a change in the value of a “right hand side” variable. ε_{t} is “residual” SALES, what is left “unexplained” by the model. And t is the time period, whether it is months, quarters, years, etc.*

#### ARMA models

The second class of models are ARMA models.

An **ARMA process** models SALES as being based on past SALES as well as on unobservable shocks to SALES over time. Such models can include two types of components:

An **autoregressive (AR)** component captures the effect of past SALES on current SALES while a **moving average (MA)** component captures random shocks to the SALES series. These are typically estimated using a **maximum likelihood** technique.

We could have a model that is a **pure ARMA** model, for example:

SALES_{t} = b_{0} + b_{1}*AR(1) + b_{2}*AR(2) + b_{3}*MA(1) +ε_{t}

Or a **mixed regression-ARMA** model, sometimes called “regression with ARMA errors,” like this:

SALES_{t} = b_{0} + b_{1}*TIME + b_{2}*Q1 + b_{3}*Q2 + b_{4}*Q3 + b_{4}*AR(1) + b_{5}*MA(1) +ε_{t}

#### ARIMA models

A third class of models is related to the ARMA models above: **ARIMA**. According to standard **Box-Jenkins** methodology, if you know the **underlying trend in SALES is “stochastic”** (i.e. random), **remove it by differencing** SALES. Then model the differenced series as an ARMA process. For example:

SALES_{t} – SALES_{t-1} = b_{0} + b_{1}*AR(1) + b_{2}*MA(1) + b_{3}*MA(2) +ε_{t}

However, “it is sometimes **very difficult to decide whether trend is best modeled as deterministic or stochastic**, and the decision is an important part of the **science – and art – of building forecasting models**.” (**Diebold, Elements of Forecasting, 1998**)

We will revisit this issue in a later article.

#### Other considerations

In addition to these 3 general classes of models we typically also try these variations:

**ARCH/GARCH****models.**

These models address **heteroscedasticity** in the residuals (ε_{t}). ARCH/GARCH models are **used in the financial arena** to help model return and risk where market volatility can fluctuate in a predictable manner.

**Inclusion of additional “right hand side variables.”**

In the case of least squares and mixed regression-ARMA models, if the data are available, we often consider **whether additional variables will improve predictive accuracy**. In the case of SALES, for example, we could consider adding lagged values of advertising spending (AD SPEND). **But** if we are tasked with **forecasting out 6 months**, for example, then we **cannot use lags** of AD SPEND (in this example) **shorter than 5 months**. Else we would **also have to forecast AD SPEND**.

**Transformations**.

For example, using the **natural log** of SALES can help **model non-linear trends** and/or **dampen variation** in SALES over time which may help to **improve predictive accuracy**.

### Bottom line

There are **many “specifications,” many potentially useful models **that we estimate.

But **not all end up in a final “pool” of candidates** for the forecasting model. Each estimated **model must pass certain tests** to stay in the candidate pool.

In a later article we will cover the tests we use to help **whittle down the pool of candidates to a set of truly useful models**.

**Part I – Practical Time Series Forecasting – Introduction**

**Part II – Practical Time Series Forecasting – Some basics**