<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>Forecasting Archives - KDD Analytics</title>
	<atom:link href="https://www.kddanalytics.com/category/forecasting/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.kddanalytics.com/category/forecasting/</link>
	<description>Data to Decisions</description>
	<lastBuildDate>Sat, 05 Jan 2019 22:28:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.3</generator>

<image>
	<url>https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2016/08/cropped-imageedit_1_7939659602.png?fit=32%2C32&#038;ssl=1</url>
	<title>Forecasting Archives - KDD Analytics</title>
	<link>https://www.kddanalytics.com/category/forecasting/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">114932494</site>	<item>
		<title>How to Visualize Changing Recession Start Date Forecasts</title>
		<link>https://www.kddanalytics.com/visualize-revisions-recession-start-date-forecasts/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Sat, 05 Jan 2019 22:19:59 +0000</pubDate>
				<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Tableau]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=1515</guid>

					<description><![CDATA[<p>In case you missed it, we are in a recession. According to Intensity’s latest US recession start date forecast, there is a 50% probability of a recession starting sometime in the January to February 2019 period.  And a 97% probability of it starting sometime within the next 6 months. Their “point estimate” of a recession&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/visualize-revisions-recession-start-date-forecasts/">How to Visualize Changing Recession Start Date Forecasts</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>In case you missed it, <strong>we are in a recession</strong>.</p>
<p>According to <a href="https://intensity.com/news/intensity-recession-forecast-january-3-2019" target="_blank" rel="noopener"><strong>Intensity’s latest US recession start date forecast</strong></a>, there is a 50% probability of a recession starting sometime in the January to February 2019 period.  And a 97% probability of it starting sometime within the next 6 months.</p>
<p>Their “<a href="https://en.wikipedia.org/wiki/Point_estimation" target="_blank" rel="noopener"><strong>point estimate</strong></a>” of a recession start is January 2019.</p>
<p><strong>Like, as in, right now!</strong></p>
<p>If true, it will take awhile for the impacts to start showing up in the official government statistics.  But the stock market sell-off last quarter may be a harbinger of things to come.</p>
<p><a href="https://intensity.com/" target="_blank" rel="noopener"><strong>Intensity</strong></a>, an economics and data science firm based in San Diego, CA, developed and back-tested a machine learning prediction algorithm for its clients.  The firm started releasing a monthly forecast of the next US recession start date to the public starting in March 2018.</p>
<p>Over the course of the last 11 months, it has been interesting following the updates to their forecast as economic conditions changed.</p>
<p>Intuitively, one would expect that the forecast would “settle down”, the closer the expected start date became.</p>
<p>And it got me thinking about what the best way is to visualize these changing forecasts.</p>
<h3>Visualizing Forecast Updates Over Time</h3>
<p>The forecasted recession start date is not linear with time.  For example, in March 2018, the next recession was forecasted by Intensity to start in April 2019.  But in April 2018, the forecast was revised, and the recession was to start <strong>6 months earlier</strong> in October 2018.</p>
<p>Plotting the month of the forecast on the x-axis and the forecasted month of the recession start on the y-axis yields a “traditional time series” view as shown below.</p>
<p>&nbsp;</p>
<p><img data-recalc-dims="1" fetchpriority="high" decoding="async" class="alignnone size-large wp-image-1532" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-forecast-shown-horizontally.png?resize=1024%2C727&#038;ssl=1" alt="Intensity recession forecast - shown horizontally" width="1024" height="727" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-forecast-shown-horizontally.png?resize=1024%2C727&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-forecast-shown-horizontally.png?resize=300%2C213&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-forecast-shown-horizontally.png?resize=768%2C545&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-forecast-shown-horizontally.png?w=1332&amp;ssl=1 1332w" sizes="(max-width: 1000px) 100vw, 1000px" /></p>
<p>As time progresses from left to right, we can see the forecasted recession start date fluctuating up and down, settling on January 2019, the most recent forecasted start date.</p>
<p>However, another way to visualize this is to show the progression of time vertically, from bottom to top.  In this case the forecasted recession start date would fluctuate horizontally, left and right, as shown below.</p>
<p><img data-recalc-dims="1" decoding="async" class="alignnone size-large wp-image-1528" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-Forecast-shown-vertically-1.png?resize=1024%2C734&#038;ssl=1" alt="Intensity Recession Forecast - shown vertically" width="1024" height="734" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-Forecast-shown-vertically-1.png?resize=1024%2C734&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-Forecast-shown-vertically-1.png?resize=300%2C215&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-Forecast-shown-vertically-1.png?resize=768%2C551&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Intensity-Forecast-shown-vertically-1.png?w=1325&amp;ssl=1 1325w" sizes="(max-width: 1000px) 100vw, 1000px" /></p>
<p>I don’t know about you, but I find this second view more appealing.  Maybe it is the old economist in me, trained on the <a href="https://en.wikipedia.org/wiki/Phillips_curve" target="_blank" rel="noopener"><strong>Phillips Curve</strong></a> in graduate school.  But for me, the vertical, “up-down” orientation makes the variation in the forecasted recession start date “pop” more than in the horizontal, “left-to-right” view.</p>
<h3>So, Recession in 2019?</h3>
<p>It will be very interesting to see if Intensity sticks to its January 2019 point estimate.  Prior to the unexpectedly positive <a href="https://www.marketwatch.com/amp/story/guid/C82CF1F6-0F91-11E9-835D-C91F740D86E0" target="_blank" rel="noopener"><strong>December 2018 jobs report</strong></a><strong>,</strong> the consensus seemed to be a recession starting some time in 2019 or 2020.  For example, <a href="https://news.yahoo.com/gary-shilling-sees-66-chance-041710124.html" target="_blank" rel="noopener"><strong>Gary Shilling</strong></a> recently tossed his hat into the recession ring with a predicted 66% chance of a recession in 2019.</p>
<p>However, the positive jobs report apparently has many economists now <a href="https://www.washingtonpost.com/business/economy/us-jobs-data-boosts-wall-street-and-reassures-investors-about-economy/2019/01/04/b910ac92-105b-11e9-8938-5898adc28fa2_story.html?noredirect=on&amp;utm_term=.7685c12bcb54" target="_blank" rel="noopener"><strong>softening their stance</strong></a> on a recession this year.  And there is talk of policy makers being able to <strong><a href="https://www.csmonitor.com/Business/2019/0102/Recession-is-a-risk-in-2019.-But-maybe-one-that-policymakers-can-avoid" target="_blank" rel="noopener">sidestep a recession</a></strong>.</p>
<p>Only time will tell…so stay tuned!</p>
<h3>Plotting Ordered Times Series in Tableau</h3>
<p>By the way, these charts were made in <a href="https://www.tableau.com/" target="_blank" rel="noopener"><strong>Tableau</strong></a>.  And it was not as straight forward as flipping the axes to get the vertical view.  Tableau’s default inclination is to “connect the dots” from left to right when time is involved.</p>
<p>Fortunately, there is an easy way to get Tableau to connect the dots vertically.  This makes use of the <a href="https://onlinehelp.tableau.com/current/pro/desktop/en-us/viewparts_marks_markproperties.htm#PathProp" target="_blank" rel="noopener"><strong>Path property</strong></a> in the Marks card.  I simply added a field to my raw data that indicated the order of my data, which, of course was calendar order.</p>
<p><img data-recalc-dims="1" decoding="async" class="size-full wp-image-1518 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Path-Order.png?resize=638%2C362&#038;ssl=1" alt="Tableau data input - Path Order" width="638" height="362" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Path-Order.png?w=638&amp;ssl=1 638w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Path-Order.png?resize=300%2C170&amp;ssl=1 300w" sizes="(max-width: 638px) 100vw, 638px" /></p>
<p>Then dropping this field on the Path property in the Marks card tells Tableau to connect the dots (or “Marks” in Tableau-speak) in this order.  With the date of the forecast on the vertical, y-axis, Tableau connects the dots from bottom to top.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="alignnone size-large wp-image-1529" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Tableau-Path-Order.png?resize=1024%2C809&#038;ssl=1" alt="Tableau Path Property on Marks Card" width="1024" height="809" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Tableau-Path-Order.png?resize=1024%2C809&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Tableau-Path-Order.png?resize=300%2C237&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Tableau-Path-Order.png?resize=768%2C607&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2019/01/Tableau-Path-Order.png?w=1255&amp;ssl=1 1255w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></p>
<p>Very slick!</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=US+Recession+starting+January+2019%3F&url=https%3A%2F%2Fwww.kddanalytics.com%2Fvisualize-revisions-recession-start-date-forecasts%2F"><div class="dpsp-click-to-tweet-content">US Recession starting January 2019?</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter --> <!-- Deuteranopia filter --> <!-- Tritanopia filter --></p>
<p><!-- Protanopia filter -->    <!-- Deuteranopia filter -->    <!-- Tritanopia filter --></p>
<p>The post <a href="https://www.kddanalytics.com/visualize-revisions-recession-start-date-forecasts/">How to Visualize Changing Recession Start Date Forecasts</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1515</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting &#8211; Bounding Uncertainty</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-forecast-uncertainty/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 12 Feb 2018 03:46:22 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[confidence interval]]></category>
		<category><![CDATA[forecast interval]]></category>
		<category><![CDATA[forecast uncertainty]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1356</guid>

					<description><![CDATA[<p>“A good forecaster is not smarter than everyone else, he merely has his ignorance better organized.” ― Anonymous Predicting the future is an exercise in probability rather than certainty. As we have mentioned several times over the course of these articles, your forecast model will be wrong. It is just a matter of how useful&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-forecast-uncertainty/">Practical Time Series Forecasting &#8211; Bounding Uncertainty</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>A good forecaster is not smarter than everyone else, he merely has his ignorance better organized</em>.”<br />
― <strong>Anonymous</strong></p>
<p>Predicting the future is an exercise in probability rather than certainty. As we have mentioned several times over the course of these articles, <strong>your forecast model will be wrong</strong>.</p>
<p><strong> It is just a matter of how useful it might be.</strong></p>
<p>A time series model will <strong>forecast a path</strong> through the forecast horizon, a “point forecast.” But <strong>this path is just one of the paths</strong> your forecast can take based on your estimated model.</p>
<p>Providing a sense of the <strong>uncertainty surrounding your forecast</strong> is an essential part of your job as a forecaster.</p>
<h3>Forecast intervals</h3>
<p>The standard approach is to provide the “<a href="https://en.wikipedia.org/wiki/Prediction_interval" target="_blank" rel="noopener"><strong>forecast interval</strong></a>” for your forecast.</p>
<p>Typically, this is cast in terms of a 95% prediction interval. That is, 95 times out of 100, the actual value will fall within the specified range. (Note that there is a <a href="https://www.ma.utexas.edu/users/mks/statmistakes/CIvsPI.html" target="_blank" rel="noopener"><strong>difference between</strong></a> a &#8220;confidence&#8221; interval and a &#8220;forecast&#8221; interval.)</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1358 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Forecast-Interval.png?resize=603%2C371&#038;ssl=1" alt="Forecast Interval" width="603" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Forecast-Interval.png?w=603&amp;ssl=1 603w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Forecast-Interval.png?resize=300%2C185&amp;ssl=1 300w" sizes="auto, (max-width: 603px) 100vw, 603px" /></p>
<h3>Sources of forecast uncertainty</h3>
<p>There are at least <a href="http://www.eviews.com/help/helpintro.html#page/content%2FForecast-Forecast_Basics.html%23ww181365" target="_blank" rel="noopener"><strong>two sources</strong></a> of forecast uncertainty over the forecast horizon.</p>
<p>The <strong>first results from our ignorance of what the model’s error will be in the forecast horizon</strong>. So, we must rely on how well the model did in the recalibration sample (estimation + holdout) as an estimate.</p>
<p>The <strong>second source of uncertainty results from the model’s coefficients </strong>(or parameters)<strong> being estimates of their true values</strong>. As estimates, they have their own “confidence” interval.</p>
<p>As a result, <strong>the forecast interval can be quite large</strong> (as shown above). And, due to error compounding over time, the <strong>forecast interval widens</strong> the further into the forecast horizon you go.</p>
<p>In our example above, during the <strong>first month</strong> of the forecast horizon, the forecast interval is <strong>plus or minus 0.63%</strong> of the forecasted value. By <strong>month 6</strong>, this spread widens to <strong>plus or minus 2.95%</strong>.</p>
<p>Even accounting for forecast error and parameter uncertainty, these forecast intervals may still be <a href="https://robjhyndman.com/hyndsight/narrow-pi/" target="_blank" rel="noopener"><strong>too narrow</strong></a>.</p>
<h3>What about meta forecasts?</h3>
<p>In an <a href="https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/" target="_blank" rel="noopener"><strong>earlier article</strong></a> we discussed <strong>combining forecasts into a meta forecast</strong>. The <strong>challenge</strong> in terms of a <strong>meta prediction interval</strong> is that it is <strong>not a simple matter to combine the prediction intervals of the constituents’ forecasts</strong>.</p>
<p><strong>One approach</strong> is to simply <strong>show the extreme upper and lower forecast paths</strong> along with the meta forecast path, which will lie somewhere between the two extremes.</p>
<p>And then to <strong>caution the consumer of your forecast</strong> that this is just to give a sense of the possible forecast range, which <strong>will likely be too narrow</strong> (since the upper and lower forecast will each have their own prediction interval).</p>
<h3>Probability-based assessment of forecast uncertainty</h3>
<p>Another approach is to <strong>couch your forecast uncertainty </strong><strong>in terms of a probability</strong>.</p>
<p>For example, based on your SALES forecast, <strong>what are the chances of hitting a certain level of sales by a certain date</strong>? If you are forecasting procurement needs for a warehouse, <strong>what is the chance of running out of inventory by a certain date</strong>? If you are a macroeconomist forecasting GDP, <strong>what are the chances of the economy falling into a recession by a certain date</strong>?</p>
<p><strong>Suppose</strong> you are tasked with forecasting daily SALES over the next year.</p>
<p><strong>Management has targeted a certain level of SALES and wants to know when that target will be hit.</strong> You can use the forecast uncertainty produced by your model to generate the following chart:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1361 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Forecast-Risk-Curve.png?resize=603%2C372&#038;ssl=1" alt="Forecast risk curve" width="603" height="372" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Forecast-Risk-Curve.png?w=603&amp;ssl=1 603w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Forecast-Risk-Curve.png?resize=300%2C185&amp;ssl=1 300w" sizes="auto, (max-width: 603px) 100vw, 603px" />The vertical axis is the chance of hitting the SALES target by a certain date (in this case, days into the next year). So, <strong>160 days into the year, there is a 10% chance of hitting the sales target.</strong></p>
<p><strong>By day 192</strong>, a month later, the <strong>chance has grown to 30%</strong>. And <strong>by day 218, there is a 50/50 chance</strong> the sales target will be reached.</p>
<p>Stating these chances in terms of odds may be an easier way to present this:</p>
<p><strong> By day 160</strong>, the odds of hitting the target would be <strong>9 to 1</strong>. By <strong>day 192</strong> it would be a little over <strong>2 to 1</strong>. And by <strong>day 218</strong>, it would be <strong>1 to 1…a flip of the coin.</strong></p>
<h3>Bottom line</h3>
<p>Uncertainty is a fact of life and your forecasts will be “wrong.”</p>
<p>But quantifying how wrong they can be will go a long way towards making them “useful.”</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=quantify+forecast+uncertainty+and+make+your+forecast+useful&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-forecast-uncertainty%2F"><div class="dpsp-click-to-tweet-content">quantify forecast uncertainty and make your forecast useful</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Part 5 &#8211; Practical Time Series Forecasting &#8211; Know When to Hold &#8217;em</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/" target="_blank" rel="noopener"><strong>Part 6 &#8211; Practical Time Series Forecasting &#8211; What Makes a Model Useful?</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend/" target="_blank" rel="noopener"><strong>Part 7 &#8211; Practical Time Series Forecasting &#8211; To Difference or Not to Difference</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-times-series-forecasting-rolling-holdout-sample/" target="_blank" rel="noopener"><strong>Part 8 &#8211; Practical Time Series Forecasting &#8211; Know When to Roll &#8217;em</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/" target="_blank" rel="noopener"><strong>Part 9 &#8211; Practical Time Series Forecasting &#8211; Meta Models</strong></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-forecast-uncertainty/">Practical Time Series Forecasting &#8211; Bounding Uncertainty</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1356</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – Meta Models</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 05 Feb 2018 01:47:38 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[forecast error]]></category>
		<category><![CDATA[MAPE]]></category>
		<category><![CDATA[meta forecast]]></category>
		<category><![CDATA[MPE]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[weighting]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1331</guid>

					<description><![CDATA[<p>“There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.” ― John Kenneth Galbraith After an extensive model building and vetting process, along the lines we previously discussed here and here, the practical forecaster may still be left with several strong performing models. These models perform similarly&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/">Practical Time Series Forecasting – Meta Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.</em>”<br />
― <a href="https://en.wikipedia.org/wiki/John_Kenneth_Galbraith" target="_blank" rel="noopener"><strong>John Kenneth Galbraith</strong></a></p>
<p>After an extensive model building and vetting process, along the lines we previously discussed <strong><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener">here</a></strong> and <a href="https://www.kddanalytics.com/practical-time-series-forecasting-rolling-holdout-sample-analysis/" target="_blank" rel="noopener"><strong>here</strong></a>, the practical forecaster may still be left with several strong performing models.</p>
<p>These models perform similarly in the holdout sample tests. They retain their statistical properties when recalibrated on the full historical sample. But they <strong>yield different forecast paths over the forecast horizon</strong>.</p>
<p>Any one of the models could be easily defended. But the <strong>fact that the models yield different forecasts should make the forecaster pause</strong>.</p>
<h3>An example</h3>
<p>Below is an example of 3 short-run monthly forecasts:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1334 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Different-FC.png?resize=603%2C371&#038;ssl=1" alt="Examples of competiting forecasts" width="603" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Different-FC.png?w=603&amp;ssl=1 603w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Different-FC.png?resize=300%2C185&amp;ssl=1 300w" sizes="auto, (max-width: 603px) 100vw, 603px" /></p>
<p>The 3 models perform similarly in the holdout sample. One of the models is a least squares model. The other 2 are ARIMA models.</p>
<p>One model produces a <strong>steeply declining forecast</strong>. Another a <strong>slightly declining forecast</strong>. The third model produces an <strong>increasing forecast</strong>.</p>
<p>What should the forecaster do?</p>
<h3>How can this happen?</h3>
<p>Models are just that – models. They are abstractions from reality. And <strong>no single model will “fit” the holdout sample perfectly</strong>.</p>
<p>Two <strong>models</strong>, especially <strong>of different types</strong> (e.g. least squares vs. ARIMA), could have very <strong>similar holdout sample performance but differ</strong> dramatically <strong>in their forecast</strong> over the forecast horizon.</p>
<p>The holdout sample <strong>MAPE</strong> (<a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>mean absolute percentage error</strong></a>) could be very similar for these models. But the <strong>MAPE is an average error across the holdout sample</strong>. And the models could have arrived at their MAPEs by <strong>focusing on different aspects of the time series in the holdout sample.</strong></p>
<p>Projecting these differences into the forecast horizon can result in very different forecasts.</p>
<h3>Solutions</h3>
<p>When there is no clear “champion” model, one <strong>solution is to combine the forecasts into one</strong>. We call this a “<strong><a href="https://en.wikipedia.org/wiki/Metamodeling">meta</a></strong>” forecast.</p>
<p>There are several ways this can be accomplished.</p>
<h4>Checkpoint</h4>
<p><strong>But first</strong>, <strong>check</strong> to make sure the <strong>models</strong> to be combined are <strong>not “nested.”</strong> That is, <strong>one model is not a subset of another</strong>. If models are nested there usually is no advantage to combining their forecasts into a meta forecast.</p>
<p>In fact, a <strong>meta forecast will more likely be superior the greater the differences between the constituent models</strong>.</p>
<p>A meta forecast based on a least squares model and an ARIMA model will likely yield a smaller forecast error than that associated with either of the two models. However, if the two models were both least squares models, the superiority of a meta forecast might be questionable (<a href="https://www.amazon.com/Forecasting-Business-Economics-Econometrics-Mathematical/dp/0122951816"><strong>Granger, 1989</strong></a>).</p>
<h4>Solution 1</h4>
<p>The simplest approach to arriving at a meta forecast is to <strong>simply average the forecasts</strong> of the individual models.</p>
<p>This essentially assumes that <strong>each model’s forecast is equally important in the meta forecast </strong>(i.e. receives equal weighting). This is a quick and uncomplicated way to generate a meta forecast.</p>
<h4>Solution 2</h4>
<p>Another approach <strong>makes use</strong> of each model’s <strong>holdout sample performance measures of forecast accuracy and bias</strong>. A weighting for each model&#8217;s forecast can be calculated using each model’s <strong>MAPE</strong> and <strong>MPE</strong> (<a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>mean percentage error</strong></a>) relative to that of all the models combined.</p>
<p>The meta forecast would then be a <strong>weighted average</strong> of the individual model forecasts. Models with <strong>lower MAPE and MPE</strong> would receive <strong>higher weights and contribute more</strong> to the meta forecast.</p>
<h4>Solution 3</h4>
<p>A third approach is to use <strong>regression</strong> to estimate the weights.</p>
<p>Using the holdout sample, or if too small, the full sample, <strong>regress the actual value on the forecasted value from each model</strong>. The goal is to find a regression with <strong>no constant and all regression coefficients positive and statistically significant</strong>.</p>
<p>The regression <strong>coefficients should then sum very close to one</strong>. These <strong>coefficients then become the weights</strong> by which forecasts are combined into a meta forecast (see <a href="https://www.amazon.com/Business-Forecasting-ForecastX-Holton-Wilson/dp/0073373648/ref=sr_1_2?s=books&amp;ie=UTF8&amp;qid=1512008807&amp;sr=1-2&amp;keywords=wilson+keating+forecasting"><strong>Wilson and Keating</strong></a>).</p>
<h3>Back to our example</h3>
<p>The forecaster could go with candidate 3 since it &#8220;splits the difference.&#8221; However, the forecaster is still left with the task of defending why the other two equally plausible models were not chosen.</p>
<p>Alternatively, a meta forecast can be used. As an example, we created a <strong>simple average forecast</strong> across the 3 candidate models. As discussed above, this <strong>assumes an equal weighting across the 3 short-run forecasts</strong>. A more sophisticated approach would have been to estimate the weights using a regression approach.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1335 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-a-meta-forecast.png?resize=605%2C371&#038;ssl=1" alt="Example of a meta forecast" width="605" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-a-meta-forecast.png?w=605&amp;ssl=1 605w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-a-meta-forecast.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 605px) 100vw, 605px" /></p>
<p>Not surprisingly, the meta forecast is quite like the essentially flat forecast of candidate 3 (which lies almost half way between candidate 1’s and 2’s forecast). <strong>But not all cases will be like this</strong>.</p>
<p>If a regression approach to estimating the weights was used, the meta forecast could be quite different from that of candidate 3.</p>
<p>Yes, the meta forecast will lie between the two forecast extremes. But the <strong>assumed or estimated weights will dictate where the meta forecast will lie</strong>.</p>
<h3>Bottom line</h3>
<p>Combining forecasts from equally strong models is intuitively appealing since <strong>each model has its strengths and weaknesses</strong>.</p>
<p><strong> Combining</strong> models’ forecasts in a <strong>complementary fashion</strong> should lead to <strong>more robust and accurate short-run forecasts</strong>.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=Combine+forecasts+into+a+meta+forecast+for+a+more+accurate+forecast&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-meta-models%2F"><div class="dpsp-click-to-tweet-content">Combine forecasts into a meta forecast for a more accurate forecast</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Part 5 &#8211; Practical Time Series Forecasting &#8211; Know When to Hold &#8217;em</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/" target="_blank" rel="noopener"><strong>Part 6 &#8211; Practical Time Series Forecasting &#8211; What Makes a Model Useful?</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend/" target="_blank" rel="noopener"><strong>Part 7 &#8211; Practical Time Series Forecasting &#8211; To Difference or Not to Difference</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-times-series-forecasting-rolling-holdout-sample/" target="_blank" rel="noopener"><strong>Part 8 &#8211; Practical Time Series Forecasting &#8211; Know When to Roll &#8217;em</strong></a></p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/">Practical Time Series Forecasting – Meta Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1331</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – Know When to Roll ‘em</title>
		<link>https://www.kddanalytics.com/practical-times-series-forecasting-rolling-holdout-sample/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 29 Jan 2018 01:33:32 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[forecast error]]></category>
		<category><![CDATA[holdout sample]]></category>
		<category><![CDATA[rolling analysis]]></category>
		<category><![CDATA[times series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1322</guid>

					<description><![CDATA[<p>“Prediction is very difficult, especially if it&#8217;s about the future.” ― Niels Bohr, physicist Holdout samples are a key component to estimating a “useful” forecasting model. Set aside data at least equal in length to your forecast horizon (“holdout sample”). Build your models on the remaining data (“modeling sample”). And compare the candidate models’ forecast&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-times-series-forecasting-rolling-holdout-sample/">Practical Time Series Forecasting – Know When to Roll ‘em</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><strong>“</strong><em>Prediction is very difficult, especially if it&#8217;s about the future.</em><strong>”<br />
― <a href="https://en.wikipedia.org/wiki/Niels_Bohr" target="_blank" rel="noopener">Niels Bohr</a></strong>, physicist</p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Holdout samples</strong></a> are a key component to estimating a “useful” forecasting model. <strong>Set aside data at least equal in length to your forecast horizon</strong> (“holdout sample”). Build your models on the remaining data (“modeling sample”). And <strong>compare the candidate models’ forecast performance over the holdout sample.</strong></p>
<p>At a minimum, a single holdout sample should be used.</p>
<p>But to get a <strong>better sense of a model’s future performance, consider using multiple holdout samples</strong>.</p>
<p>This <strong>guards against</strong> basing your model on a <strong>holdout sample</strong> that is <strong>unrepresentative</strong> of the overall characteristics of the time series.</p>
<p>One way to achieve this is to use<strong> “rolling” holdout samples</strong>.</p>
<h3>Rolling analysis</h3>
<p>A <a href="https://link.springer.com/chapter/10.1007%2F978-0-387-32348-0_9" target="_blank" rel="noopener"><strong>rolling analysis</strong></a> of a time series is generally used to test a model’s stability. That is, <strong>are a model’s parameters stable across time</strong> or do they change, especially in a systematic way?</p>
<p>This is important for a forecasting model. We <strong>don’t want</strong> a forecasting model whose <strong>parameters</strong> are <strong>changing during the forecast horizon in an unexpected (i.e. unmodeled) manner.</strong></p>
<p>Suppose our forecast horizon is 6 months.</p>
<p><strong> Under a single holdout sample</strong>, we would <strong>set aside the last 6 months of data as the holdout sample</strong>. Then using the remaining data as the modeling sample, estimate models, forecast over the single holdout sample and compare the models’ performance.</p>
<p>This will help narrow down the pool of candidate models.</p>
<h4>Rolling holdout samples</h4>
<p>But under a rolling holdout approach, also called &#8220;<a href="http://otexts.org/fpp2/accuracy.html" target="_blank" rel="noopener"><strong>time series cross-validation</strong></a>,&#8221;  <strong>we would set aside a longer sample of data</strong>, say, the last 12 months. Then:</p>
<p><strong>Step 1:</strong>  Estimate a model and forecast over the <strong>first</strong> 6-months of this 12-month period (&#8220;roll 1&#8221;);</p>
<p><strong>Step 2:</strong>  Then add one 1 month to the tail-end of the estimation sample, recalibrate the model, and forecast over the subsequent 6-months (“roll 2”);</p>
<p><strong>Step 3:</strong>  Then add another month to the estimation sample, recalibrate and forecast over the subsequent 6-months (“roll 3”);</p>
<p><strong>Step 4:</strong>  Repeat until there are no more 6-month periods (&#8220;rolls&#8221;) remaining in the 12-month period.</p>
<p>So, <strong>in this example</strong>, we would have <strong>recalibrated our model 7 times</strong> (each with a modeling sample that is one additional month longer than the previous). And we would have <strong>made 7 forecasts over the rolling holdout periods</strong>.</p>
<p>The <strong>last &#8220;roll</strong>,&#8221; it turns out, <strong>is the same 6-month period</strong> we would have used <strong>under a single 6-month holdout sample case</strong>. So, we generate the stats for a standard single holdout sample during the course of this rolling holdout approach.</p>
<p>If we are examining multiple candidate models, this process can generate a lot of data. Below is an example of the rolling forecasts for one model.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1325 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Rolling-Holdout-Samples.png?resize=561%2C547&#038;ssl=1" alt="Rolling Holdout Samples" width="561" height="547" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Rolling-Holdout-Samples.png?w=561&amp;ssl=1 561w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Rolling-Holdout-Samples.png?resize=300%2C293&amp;ssl=1 300w" sizes="auto, (max-width: 561px) 100vw, 561px" /></p>
<h3>Summary roll statistics</h3>
<p>We could generate a similar chart for every model we are testing. But it is <strong>easier to work with measures of forecast accuracy and bias</strong>, such as <a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>MAPE</strong></a> and <a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>MPE</strong></a>.</p>
<p>For each roll forecast, we can calculate the MAPE and MPE and observe how they change across the rolling forecasts.</p>
<p>Are the MAPE and MPE constant? Fluctuate with no apparent trend? Or exhibit some systematic trend?</p>
<p>Doing this for every candidate model we are testing generates charts like this which can quickly show any areas of concern:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1326 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Rolling-Holdout-Samples-MAPE.png?resize=604%2C370&#038;ssl=1" alt="" width="604" height="370" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Rolling-Holdout-Samples-MAPE.png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Rolling-Holdout-Samples-MAPE.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<p>In this example, candidate models 18 and 15 may be worth further inspection since their MAPEs are much higher than the rest in a recent roll period (roll 6).</p>
<h3>What else makes a model useful?</h3>
<p>So, with respect to the <strong>guidelines</strong> for whittling down a pool of candidate models we listed in an <strong><a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/" target="_blank" rel="noopener">earlier article</a></strong>, we can add the following from a rolling holdout analysis:</p>
<p><strong>Stability</strong> – The model’s parameters should retain their statistical significance and not vary too much across the rolling periods; and the model&#8217;s residuals should remain &#8220;<strong>white noise</strong>&#8221; across the rolls;</p>
<p><strong>Consistency of Performance</strong> – The model’s forecast accuracy and bias should not exhibit any strong trends, especially trends in the “wrong” direction (i.e. getting progressively worse) as the more recent time period is approached.</p>
<p><strong>Strong Rolling Holdout Sample Performance</strong> – The model’s forecast accuracy and bias, <strong>averaged across all the rolls</strong>, should be high and low respectively. That is <strong>both the average MAPE </strong>and<strong> MPE should be low</strong>.</p>
<h3>Benefits of Rolling</h3>
<p>The primary benefit of a rolling analysis is that we get to see <strong>how a model performs</strong> forecast-wise <strong>over multiple time spans</strong> equal in length to our forecast horizon; <strong>instead of relying on performance in just one holdout sample</strong>.</p>
<p>A rolling analysis also <strong>addresses the issue of a short holdout sample</strong> (e.g. short forecast horizon) <strong>possibly not being representative of the general character of the time series</strong>.</p>
<p>In addition, a rolling analysis can be used as a check for the “best” model chosen using a single holdout sample. That is, would you pick the same model using the rolling holdout approach? If not, why?</p>
<p>In sum, <strong>a model that is persistently better at holdout sample forecasting over a longer time frame is likely to be more robust.</strong></p>
<p>So, let ‘em roll!</p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Part 5 &#8211; Practical Time Series Forecasting &#8211; Know When to Hold &#8217;em</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/" target="_blank" rel="noopener"><strong>Part 6 &#8211; Practical Time Series Forecasting &#8211; What Makes a Model Useful?</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend/" target="_blank" rel="noopener"><strong>Part 7 &#8211; Practical Time Series Forecasting &#8211; To Difference or Not to Difference</strong></a></p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-times-series-forecasting-rolling-holdout-sample/">Practical Time Series Forecasting – Know When to Roll ‘em</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1322</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – To Difference or Not to Difference</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend-2/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 22 Jan 2018 01:22:47 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[deterministic]]></category>
		<category><![CDATA[forecast error]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[stochastic]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[trend]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1348</guid>

					<description><![CDATA[<p>“It is sometimes very difficult to decide whether trend is best modeled as deterministic or stochastic, and the decision is an important part of the science – and art – of building forecasting models.” ― Diebold,  Elements of Forecasting, 1998 A time series can have a very strong trend. Visually, we often can see it. Gross&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend-2/">Practical Time Series Forecasting – To Difference or Not to Difference</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>It is sometimes very difficult to decide whether trend is best modeled as deterministic or stochastic, and the decision is an important part of the science – and art – of building forecasting models</em>.”<br />
― <strong>Diebold,  Elements of Forecasting, 1998</strong></p>
<p><strong>A time series can have a very strong trend.</strong></p>
<p>Visually, we often can see it. Gross domestic product (GDP) per person increasing year after year.</p>
<p>When a “<strong>shock</strong>” occurs to the process generating GDP, due to a recession for example, GDP gets <strong>knocked off its long-run growth path</strong>.</p>
<p>But can we expect GDP to bounce back and return to its <strong>original</strong> long-run growth path? Or will it start growing again but along a <strong>different</strong> path?</p>
<p>If the former, then the trend in GDP is said to be “<strong>deterministic</strong>.” And adding TIME to a time series forecasting model is one way to capture this trend.</p>
<p>On the other hand, if GDP starts a new trend after a recession, its trend is said to be “<strong>stochastic</strong>,” driven by random shocks. The standard approach to time series forecast modeling in this case is to “<strong>difference</strong>” the data before modeling.</p>
<p>The challenge as a forecaster is that it is <strong>not always easy to tell if the trend in a time series is deterministic or stochastic</strong>.</p>
<p>And <strong>your answer</strong> and the subsequent modeling choice <strong>will have important implications for the resulting forecast</strong>.</p>
<p><strong>Deterministic vs. stochastic trends</strong></p>
<p>Consider the time series shown below.</p>
<p>Suppose you were <strong>tasked with generating a 2-year forecast</strong> starting December 2003 (at the end of the shown time series history).</p>
<p><strong>Is there a deterministic trend in this series</strong>? That is, do you suspect that the series will bounce back to the trend exhibited before January 2001?</p>
<p><strong>Or</strong> has there been a fundamental change to the process generating this series and a new trend will start (i.e. the <strong>trend is stochastic</strong>)?</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1304 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Deterministic-or-stochastic-trend..png?resize=604%2C371&#038;ssl=1" alt="Deterministic vs stochastic trend" width="604" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Deterministic-or-stochastic-trend..png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Deterministic-or-stochastic-trend..png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<p><strong>Deterministic trend</strong></p>
<p>If you opt for a deterministic trend, then your <strong>forecasting model will be in “levels.”</strong> If we are talking about SALES, then it is the value of SALES at any given point in time. So, when we have a deterministic trend, we can model SALES as:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*TIME + ε<sub>t</sub></p>
<p><strong>Of course, we could</strong> <strong>also</strong> account for <strong>seasonality</strong> by adding seasonal dummy variables as well as any <strong>hidden dynamics</strong> (cycles) by modeling the error term u<sub>t</sub> as an ARMA process. But the key characteristic is the inclusion of a TIME variable (May 1993 = 1, June 1993 =2, etc.) and possibly TIME<sup>2</sup> and/or TIME<sup>3</sup> depending on the series.</p>
<p><em><span style="color: #60786b;">An ARMA process models SALES as being based on past SALES as well as on unobservable shocks. Such models can include two types of components: An autoregressive (AR) component captures the effect of past SALES on current SALES while a moving average (MA) component captures random shocks to the SALES series. </span> </em></p>
<p><strong>Stochastic trend</strong></p>
<p>If you opt for a stochastic trend, then the <strong>standard methodology</strong> is to <strong>difference</strong> your data (to remove the trend) and model the differences. This is known as ARIMA modeling. An ARIMA process is like an ARMA process except that the dynamics of the differenced series are modeled (see <a href="http://people.duke.edu/~rnau/411arim.htm"><strong>here</strong></a>).</p>
<p><strong>Forecast differences</strong></p>
<p>The forecast implications of this choice are shown in the following chart. We estimated a deterministic and a stochastic model and generated a forecast from each starting in December 2003. Specifically,</p>
<p style="text-align: center;"><strong>Deterministic Trend Model:</strong>  Y<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*TIME + b<sub>2</sub>*AR(1) + b<sub>3</sub>*AR(2) + b<sub>4</sub>*MA(3) + ε<sub>t</sub></p>
<p style="text-align: center;"><strong>Stochastic Trend Model: </strong> Y<sub>t</sub> &#8211; Y<sub>t-1</sub> = b<sub>0</sub> + b<sub>1</sub>*AR(1) + b<sub>2</sub>*AR(3) + ε<sub>t</sub></p>
<p>The forecast based on a <strong>deterministic model</strong> is shown by the <strong>orange line</strong> while the one based on the <strong>stochastic model</strong> is shown by the <strong>gray line</strong>. Also shown is what actually happened to the time series.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1305 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Deterministic-vs-stochastic-forecast.png?resize=604%2C371&#038;ssl=1" alt="Deterministic vs stochastic forecast" width="604" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Deterministic-vs-stochastic-forecast.png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Deterministic-vs-stochastic-forecast.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<p>Hindsight is 20/20. In this case, the <strong>stochastic model would have been the better choice</strong>.</p>
<p>It does <strong>appear that some fundamental change occurred in the time series generation process</strong>. That is, the time series did not revert to its pre-2001 historical trend (at least during the forecast horizon).</p>
<p>The stochastic model yields a better forecast error (<a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/"><strong>MAPE</strong></a> = 2.0%) than the deterministic model (<a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/"><strong>MAPE</strong></a> = 5.6%) over the forecast horizon.</p>
<p>But at the time we had to make the forecast, all we had available were data through December 2003.</p>
<p><strong>So, how do we pick between a deterministic and a stochastic forecasting model?</strong></p>
<p><strong>Holdout sample</strong></p>
<p>From a practical perspective, unless we have very strong evidence of a stochastic process, the best course of action is to <strong>use a holdout sample.</strong></p>
<p>Yes, there are techniques for testing whether a time series is “<a href="https://www.otexts.org/fpp/8/1"><strong>stationary</strong></a>” (i.e. has no trend) when visually it is not obvious.</p>
<p>But pragmatically, we are concerned about short-run forecast accuracy. And <strong>one way to compare competing models is by their performance in a holdout sample.</strong></p>
<p>As we discussed in an <a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/"><strong>earlier article</strong></a>, <strong>hold out a period of time at least equal to your forecast horizon</strong> from the data used to estimate a model. In this case, 2 years (January 2001 – December 2003).</p>
<p>Then build your models on data prior to January 2001 and <strong>compare the models’ forecast performance over the holdout sample</strong>.</p>
<p>In this case, such a holdout sample does not include any data from the strong trend period (pre-May 2001). So, likely a stochastic model would have performed better in the holdout sample as well.</p>
<p><strong>But suppose we do this and have two (or more) models that perform equally well in the holdout sample?</strong></p>
<p>We’ll cover this possibility in a subsequent article.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=deterministic%2Fstochastic+trend%3F+holdout+sample%21&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-deterministic-stochastic-trend-2%2F"><div class="dpsp-click-to-tweet-content">deterministic/stochastic trend? holdout sample!</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Part 5 &#8211; Practical Time Series Forecasting &#8211; Know When to Hold &#8217;em</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/" target="_blank" rel="noopener"><strong>Part 6 &#8211; Practical Time Series Forecasting &#8211; What Makes a Model Useful?</strong></a></p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend-2/">Practical Time Series Forecasting – To Difference or Not to Difference</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1348</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – What Makes a Model Useful?</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 15 Jan 2018 07:56:19 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[model build process]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[useful models]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1278</guid>

					<description><![CDATA[<p>“In God we trust. All others must bring data.” ― W. Edwards Deming, statistician So, you have estimated a bunch of forecasting models and realize (kudos to you!) that they are “all wrong” (ala George Box). But your forecasting deadline is looming, and you need to find some useful models on which to base a&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/">Practical Time Series Forecasting – What Makes a Model Useful?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><strong>“</strong><em>In God we trust. All others must bring data.</em><strong>”<br />
― <a href="https://en.wikipedia.org/wiki/W._Edwards_Deming" target="_blank" rel="noopener">W. Edwards Deming</a></strong>, statistician<strong><br />
</strong></p>
<p>So, you have estimated a bunch of forecasting models and realize (kudos to you!) that they are “all wrong” (ala <strong><a href="https://en.wikipedia.org/wiki/George_E._P._Box/" target="_blank" rel="noopener">George Box</a></strong>).</p>
<p>But your forecasting deadline is looming, and you need to find some <strong><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener">useful</a></strong> models on which to base a forecast.</p>
<p>How do you decide which models make it to the next round?</p>
<h3>Model building process</h3>
<p>First, let’s review the forecast model build process:</p>
<p><strong>Step 1:</strong>  Determine what is the business need;</p>
<p><strong>Step 2:</strong>  Collect and examine your data; clean and adjust (e.g. frequency change) as necessary;</p>
<p><strong>Step 3:</strong>  Determine your forecast horizon (i.e. align with the business need);</p>
<p><strong>Step 4:</strong>  Determine and set aside your holdout sample;</p>
<p><strong>Step 5:</strong>  Estimate models using the non-holdout portion of your time series (i.e. the “modeling sample”);</p>
<p><strong>Step 6:</strong>  Gauge each model’s performance in the holdout sample;</p>
<p><strong>Step 7:</strong>  Recalibrate each model using the full historical sample;</p>
<p><strong>Step 8:</strong>  Make your forecast for the forecast horizon.</p>
<p>At the end of this process, you should have a few models that “pass muster,&#8221; that are <strong>potentially useful models</strong>.</p>
<p>But <strong>how do you whittle down all the models you tried to this select few</strong>?</p>
<h3>Guidelines for selecting useful models</h3>
<p>Here are some guidelines we follow:</p>
<p><strong>Statistically Significant Parameters</strong> – Although one can argue that it is the prediction that matters, we still like to see model coefficients that are statistically significant with signs that can be explained. <strong>You may be asked to defend your model</strong>.</p>
<p><strong>White Noise Residuals</strong> – When you estimate your model using the modeling sample, the <strong>residuals</strong> (difference between the actual and predicted values in the modeling sample) <strong>should have no apparent pattern to them</strong>. That is, there is no additional variation in the time series that can be explained by your model. What is left over is random or “white” noise.</p>
<p><strong>Strong Holdout Sample Performance</strong> – Your model should produce <strong>low forecast error</strong> and exhibit <strong>low systematic bias</strong> in the holdout sample.</p>
<p><strong>Robustness</strong> – When you <strong>recalibrate your model</strong> using the entire historical sample (modeling + holdout sample), your <strong>model should retain its statistical properties</strong>. That is, parameters are still significant with plausible signs and the residuals are still white noise.</p>
<p><strong>Parsimony</strong> – If two models are equal in all performance respects except one is more complex than the other, we generally opt for the simpler model. Experience suggests that <strong>simpler models perform better</strong> when forecasting over the forecast horizon. And they <strong>are easier to interpret and explain</strong> to business decision makers.</p>
<p><strong>Forecast Plausibility</strong> – The forecast produced by your model over the forecast horizon should be consistent with the available knowledge concerning the relevant business environment. In other words, <strong>the forecast needs to make sense</strong>. It is possible, following the steps above, to arrive at a high performing model which produces a counter intuitive forecast (e.g. declining SALES when the trend in SALES has been nothing but up).</p>
<p>At the end of this model building and testing process, you may have more than 1 model that can be used to generate your forecast. In a later article we will address what you can do in this situation.</p>
<h3>The art of forecasting</h3>
<p>Our experience is consistent with the opinion of <a href="https://www.amazon.com/Elements-Forecasting-Diebold-September-Paperback/dp/B014GFR8BI/ref=sr_1_13?ie=UTF8&amp;qid=1512689630&amp;sr=8-13&amp;keywords=diebold+elements+of+forecasting" target="_blank" rel="noopener"><strong>others</strong></a> that there is still quite a bit of “art” to time series forecasting. Especially if you want it to meet a specific business need. Automated forecast routines exist. But we recommend that the process be closely <strong>supervised by a human</strong> to ensure a reasonable forecast.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CIn+God+we+trust.+All+others+must+bring+data.%E2%80%9D+W.+Edwards+Deming%2C+statistician&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-what-makes-a-useful-model%2F"><div class="dpsp-click-to-tweet-content">“In God we trust. All others must bring data.” W. Edwards Deming, statistician</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Part 5 &#8211; Practical Time Series Forecasting &#8211; Know When to Hold &#8217;em</strong></a></p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/">Practical Time Series Forecasting – What Makes a Model Useful?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1278</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – Know When to Hold ‘em</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 08 Jan 2018 01:37:33 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[forecast bias]]></category>
		<category><![CDATA[forecast error]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[holdout sample]]></category>
		<category><![CDATA[methodology]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1263</guid>

					<description><![CDATA[<p>“The only relevant test of the validity of a hypothesis is comparison of prediction with experience.” ― Milton Friedman, economist Holdout samples are a mainstay of predictive analytics. Set aside a portion of your data (say, 30%). Build your candidate models. Then “internally validate” your models using the holdout sample. More sophisticated methods like cross&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/">Practical Time Series Forecasting – Know When to Hold ‘em</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>The only relevant test of the validity of a hypothesis is comparison of prediction with experience.</em>”<br />
― <strong>Milton Friedman, economist</strong></p>
<p><strong>Holdout samples</strong> are a mainstay of predictive analytics.</p>
<p>Set aside a portion of your data (say, 30%). Build your <a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>candidate models</strong></a>. Then “<strong>internally validate</strong>” your models using the holdout sample.</p>
<p>More sophisticated methods like <a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)"><strong>cross validation</strong></a> use multiple holdout samples. But the idea is to <strong>see how well your models predict using data the model has not “seen” before</strong>. Then go back and fine tune to improve the models&#8217; predictive accuracy.</p>
<h3>Time series holdout samples</h3>
<p>The <strong>truest test of your models</strong> is when they are applied to “new” data. Data from a fresh marketing campaign, a new set of customers, a more recent time period (“<strong>external validation</strong>”).</p>
<p>But you may not have access to such data when building your models. You certainly will not have access to future data.</p>
<p>So, a <strong>holdout sample needs to be crafted from the historical data at your disposal</strong>.</p>
<p>When building predictive models for, say, a marketing campaign or for loan risk scoring, there is usually a large amount of data to work with. So, holding out a sample for testing still leaves lots of data for model building.</p>
<p>However, the situation can be much different when working with time series data.</p>
<p>Depending on the frequency of the series, the <strong>amount of data points available to work with can be limited</strong>. 50 years of annual data is just 50 data points. 5 years of monthly data is just 60 data points.</p>
<p>Obviously the greater the frequency of data, the greater the number of data points available to work with…5 years of daily data is 1,825 data points. But these time series sample sizes usually pale against the large customer sets used to fuel marketing campaigns, which can run into the hundreds of thousands.</p>
<p>So, does this mean that holdout samples shouldn’t be used to test time series forecasting models?</p>
<p><strong>Absolutely not!</strong></p>
<p>You still <strong>need a way to</strong> <strong>whittle down your candidate models</strong>. You just need to be careful in how you select and use your holdout sample.</p>
<h3>Holdout sample length</h3>
<p>How much data should you set aside for a holdout sample? The <strong>rule of thumb</strong> we go by is to choose a holdout sample length that is <strong>at least</strong> (a) <strong>equal to the length of your forecast horizon</strong> or (b) <strong>equal to the length of time needed for your business to make a change</strong>.</p>
<p>Suppose you need a 12-month forecast to support a business plan. And you wish to forecast monthly sales for the 12 months starting November 1, 2017.</p>
<p>Then, your holdout sample should be at least the 12 months pertaining to November 2016 through October 2017. And your estimation sample should be all months prior to November 2016.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1267 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Holdout-Sample-1.png?resize=618%2C385&#038;ssl=1" alt="Using a holdout sample for time series forecasting" width="618" height="385" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Holdout-Sample-1.png?w=618&amp;ssl=1 618w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Holdout-Sample-1.png?resize=300%2C187&amp;ssl=1 300w" sizes="auto, (max-width: 618px) 100vw, 618px" /></p>
<p>Remember, the <strong>time series methods we are addressing are best used for short-run forecasting</strong>. Most business forecasting needs are for short-run forecasts. The next few months or few years. Not the next 5 to 10 years.</p>
<p>Alternatively, suppose your business only needs 8 months to make a change (maybe it is getting more salespeople on line). Then your holdout sample should be at least 8 months.</p>
<h3>Holdout sample performance</h3>
<p>Once you estimate a model, you apply it to the holdout sample to see how well it predicts. There are several <strong>measures</strong> you can use to gauge <strong>how well your model performs</strong>. We focus on measures of <strong>accuracy</strong> and <strong>bias</strong>.</p>
<h4>To measure forecast accuracy:</h4>
<p><strong>If the business cost of a forecast error is high</strong>, then the <a href="https://en.wikipedia.org/wiki/Mean_squared_error"><strong>Mean Square Error</strong></a> (MSE) or <a href="https://en.wikipedia.org/wiki/Root-mean-square_deviation"><strong>Root Mean Square Error</strong></a> (RMSE) will magnify it since forecast errors are squared. MSE is the average of (predicted – actual)<sup>2</sup>.</p>
<p><strong>If the business cost of a forecast error is average</strong>, then the <a href="https://en.wikipedia.org/wiki/Mean_absolute_percentage_error"><strong>Mean Absolute Percent Error</strong></a> (MAPE) can be used. MAPE is simply the average of the absolute value of [(predicted – actual)/actual]. However, care should be taken if “0” values are possible as MAPE would be undefined.</p>
<p>See <a href="http://otexts.org/fpp2/accuracy.html" target="_blank" rel="noopener"><strong>here</strong></a> for a discussion of forecast accuracy measures.</p>
<h4>To measure forecast bias:</h4>
<p>The <a href="https://en.wikipedia.org/wiki/Mean_percentage_error"><strong>Mean Percent Error</strong></a> (MPE) will indicate if there is a <strong>systematic bias to the forecast</strong>. If positive, then the model is over predicting; if negative it is underpredicting. And the further from 0, the greater the bias. MPE is the average of [(predicted – actual)/actual].</p>
<p>An alternative measure is <strong>Theil’s measure of systematic error</strong>, the “bias-proportion” of Theil’s <a href="http://www.eviews.com/help/helpintro.html#page/content%2FForecast-Forecast_Basics.html%23" target="_blank" rel="noopener"><strong>inequality coefficient</strong></a>. This measures the extent to which average values of the forecasted and actual values deviate from each other, the larger the value, the greater the systematic bias.</p>
<p><strong>In general, in the holdout sample, a good performing model will exhibit low overall error (high accuracy) and low systematic bias</strong>.</p>
<p>The chart below shows an example of such a model using a 5-month holdout sample. On average, the model’s error is between 0.28% and 1.85% while exhibiting a very small positive bias of 0.10%.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1268 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Holdout-Sample-2.png?resize=618%2C385&#038;ssl=1" alt="Example of holdout sample performance" width="618" height="385" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Holdout-Sample-2.png?w=618&amp;ssl=1 618w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Holdout-Sample-2.png?resize=300%2C187&amp;ssl=1 300w" sizes="auto, (max-width: 618px) 100vw, 618px" /></p>
<p>Note that <strong>there is no absolute criterion for what constitutes a “low” error,</strong> for example, MSE.</p>
<p><strong>Measures of forecast error</strong> are to be <strong>judged relative to the context of the forecast</strong> you are making. In some cases, your models may be averaging an error in the 30%’s; in others it could be in the single digits.</p>
<h3>Length of estimation sample</h3>
<p>A related issue is <strong>how much data do you use for model estimation</strong>?</p>
<p><strong>Often, there is not a choice</strong>. After setting aside a holdout sample, there may be just a bare minimum amount of data left for modeling (i.e. need more data points than model parameters to be estimated).</p>
<p>In general, the <strong>fewer</strong> the <strong>number of model parameters</strong> and the <strong>less &#8220;noisy&#8221;</strong> the data (i.e. less random), the <strong>fewer the number of data points <a href="http://otexts.org/fpp2/short-ts.html" target="_blank" rel="noopener">needed</a></strong>. Typically, though, <strong>we look for at least 40 data points.</strong></p>
<p>If you have a <strong>high frequency time series</strong> (monthly, daily, hourly) you may have room to consider whether the <strong>choice of the estimation sample length can affect model performance</strong>.</p>
<p><strong>One can argue that the modeling sample should be reflective of the characteristics of the forecast horizon</strong>. That is the next year, say, is more likely to be like the past several years, not like 20 years ago. So, <strong>limit the estimating sample to more recent years</strong>.</p>
<p>Consider the time series shown below. Clearly the time path of this series has not been consistent. Rather than estimating a model using the entire historical sample, maybe limit it to the more recent period.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1206 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?resize=615%2C386&#038;ssl=1" alt="Low variation time series" width="615" height="386" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?w=615&amp;ssl=1 615w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?resize=300%2C188&amp;ssl=1 300w" sizes="auto, (max-width: 615px) 100vw, 615px" /></p>
<p>The <strong>trade-off</strong> is that there is <strong>less experiential history upon which to base a model</strong>. Maybe the dynamics associated with that turning point in early 2000 and subsequent recovery could prove to be fertile ground for training your model.</p>
<p><strong>But this is testable proposition!</strong></p>
<p>Because you have already set aside a holdout sample, <strong>you can test whether a model estimated on the full (non-holdout) sample performs better in the holdout sample than one based on a more recent sample.</strong></p>
<h3>Data frequency compression</h3>
<p>Another use for a holdout sample is to test for whether changes to the frequency of the time series will improve predictive accuracy.</p>
<p><strong>The frequency of the time series could be reduced to help match a desired forecast horizon</strong>. For example, suppose management wants a 3-year forecast. And you are working with monthly SALES. Yes, you could produce a 36 period (month) forecast. But that might be pushing the limits of your methodology, especially if there is not a strong trend.</p>
<p>Alternatively, by converting to a quarterly series, you would lessen the variability in your data and forecast only 12 periods. <strong>This might yield a more accurate forecast</strong>.</p>
<p><strong>But again, this is testable using a holdout sample!</strong></p>
<h3>Bottom line</h3>
<p><strong>Holdout samples are a critical component</strong> of a time series forecasting methodology.</p>
<p>In a later article we will address using <strong>multiple</strong> holdout samples…to help guard against basing a model on a single, unrepresentative holdout sample (i.e. we found a great model just because we got lucky!).</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=Holdout+sample+a+critical+component+of+a+time+series+forecasting+methodology.&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-holdout-sample%2F"><div class="dpsp-click-to-tweet-content">Holdout sample a critical component of a time series forecasting methodology.</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/">Practical Time Series Forecasting – Know When to Hold ‘em</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1263</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting &#8211; Data Science Taxonomy</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Tue, 02 Jan 2018 12:26:19 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1229</guid>

					<description><![CDATA[<p>“Big data is not about the data.*” ― Gary King, Harvard University (*It&#8217;s about the analytics.) Machine Learning. Deep Learning. Data Science. Artificial Intelligence. Big Data. Not a day goes by that one or all of these buzzwords stream past in our business news feeds. Data analytics has become mainstream. And you better jump on&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/">Practical Time Series Forecasting &#8211; Data Science Taxonomy</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“Big data is not about the data.*”<br />
― <strong>Gary King, Harvard University</strong></p>
<p>(*<strong><a href="https://www.slideshare.net/BernardMarr/big-data-best-quotes/3-Big_data_is_notabout_the" target="_blank" rel="noopener">It&#8217;s about the analytics</a></strong>.)</p>
<p><strong>Machine Learning</strong>. <strong>Deep Learning</strong>. <strong>Data Science</strong>. <strong>Artificial Intelligence</strong>. <strong>Big Data</strong>.</p>
<p>Not a day goes by that one or all of these buzzwords stream past in our business news feeds.</p>
<p><strong>Data analytics has become mainstream</strong>. And you better jump on board or risk being left at the station!</p>
<p>Just within the last year or so, <strong>searches</strong> of these topics have taken off. In fact, according to Google, in early 2017, search interest in one of these topics, <strong>machine learning, has eclipsed that of big data</strong>:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="aligncenter wp-image-1230 size-large" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=1024%2C329&#038;ssl=1" alt="Google Search Machine Learning" width="1024" height="329" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=1024%2C329&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=300%2C96&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=768%2C247&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?w=1233&amp;ssl=1 1233w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></p>
<p>So, how do <strong>time series methods for forecasting</strong> fit into the taxonomy that currently defines the data science field?</p>
<h3>Data science taxonomy</h3>
<p>Key data science terms that are related to time series methods for forecasting are <strong><a href="https://www.datasciencecentral.com/profiles/blogs/data-mining-what-why-when">data mining</a></strong>, <a href="https://www.datasciencecentral.com/profiles/blogs/18-great-articles-about-predictive-analytics"><strong>predictive analytics</strong></a>, <a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-summarized-in-one-picture"><strong>machine learning</strong></a> (supervised and unsupervised), <a href="https://en.wikipedia.org/wiki/Linear_regression"><strong>regression</strong></a>, <strong>structured</strong> and <a href="https://en.wikipedia.org/wiki/Unstructured_data"><strong>unstructured</strong></a> data.</p>
<p>These are not necessarily mutually exclusive. At the risk of incurring the wrath of the data science gods, <strong>here is our simplification</strong>:</p>
<h4>Structured vs. unstructured data</h4>
<p>Structured data are organized into “rows and columns” (spreadsheet); unstructured data are not (text in a book).</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods use structured data</strong>.</span></p>
<h4>Data mining</h4>
<p>Data mining seeks to find patterns in data, whether structured or unstructured.</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods seek to find patterns that repeat over time</strong>.</span></p>
<h4>Predictive analytics</h4>
<p>Predictive analytics seeks to find a relationship between a variable of interest (e.g. customer churn) and multiple dimensions (e.g. age, length of contract, zip code). These dimensions can be used to predict the likelihood of a customer churning (in our example).</p>
<p>Typically, predictive analytics is not based on time series data but &#8220;cross-sectional&#8221; data like a customer set. Additionally, time series methods use only a very limited set of dimensions, the primary one being past behavior of the variable being forecasted (e.g. sales).</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods typically use the past behavior of the variable being forecasted as the primary dimension.</strong></span></p>
<h4>Machine learning</h4>
<p>Machine learning means that a computer is using a program (algorithm) to “connect the dots” in the data. <strong>If you run a regression model in Excel you are engaging in machine learning.</strong></p>
<p>However, <span style="text-decoration: underline;">supervised</span> machine learning does not mean you are keeping watch over Excel as it does its stuff!</p>
<div id="attachment_1232" style="width: 310px" class="wp-caption alignright"><img data-recalc-dims="1" loading="lazy" decoding="async" aria-describedby="caption-attachment-1232" class="wp-image-1232 size-medium" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/46961317_s.jpg?resize=300%2C200&#038;ssl=1" alt="supervised machine learning?" width="300" height="200" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/46961317_s.jpg?resize=300%2C200&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/46961317_s.jpg?w=450&amp;ssl=1 450w" sizes="auto, (max-width: 300px) 100vw, 300px" /><p id="caption-attachment-1232" class="wp-caption-text">This is NOT what &#8220;supervised&#8221; machine learning means!</p></div>
<p><strong>Supervised machine learning means</strong> that the computer is seeking to find a relationship between a single variable (e.g. churn) and many dimensional variables (e.g. age, length of contract, zip code).</p>
<p><strong>Unsupervised machine learning</strong> <strong>means</strong> that the computer is seeking to find a relationship between many dimensions (e.g. age, length of contract, zip code) so that customers can, for example, be clustered into a small number of groups or tribes with similar characteristics.</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods are a type of supervised machine learning since they attempt to find a relationship between present and past behavior</strong>.</span></p>
<h4>Regression</h4>
<p>Regression is one way a machine finds relationships between a single variable and a few (or many) dimensional variables or past values of the variable itself. There are several flavors of regression.</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong> Time series models typically use <a style="color: #60786b;" href="https://en.wikipedia.org/wiki/Least_squares">least squares</a> regression or <a style="color: #60786b;" href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood</a></strong>.</span></p>
<h3>Bottom line</h3>
<p>So, when you use time series methods for forecasting you are probably <strong>mining structured data using supervised, regression- or maximum likelihood-based, machine learning</strong>.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CBig+data+is+not+about+the+data.%E2%80%9D&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-data-science-taxonomy%2F"><div class="dpsp-click-to-tweet-content">“Big data is not about the data.”</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/">Practical Time Series Forecasting &#8211; Data Science Taxonomy</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1229</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting &#8211; Potentially Useful Models</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 18 Dec 2017 08:00:05 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[ARIMA]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1245</guid>

					<description><![CDATA[<p>“All models are wrong, but some are useful.” ― attributed to statistician George Box This quote pretty well sums up time series forecasting models. Any given model is unlikely to be spot on. And some can be wildly off. But through a careful methodical process, we can whittle the pool of candidate models down to&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/">Practical Time Series Forecasting &#8211; Potentially Useful Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>All models are wrong, but some are useful.</em>”<br />
― attributed to statistician <a href="https://en.wikipedia.org/wiki/All_models_are_wrong" target="_blank" rel="noopener"><strong>George Box</strong></a></p>
<p>This quote pretty well sums up time series forecasting models.</p>
<p><strong>Any given model is unlikely to be spot on. And some can be wildly off.</strong></p>
<p>But through a careful methodical process, we can <strong>whittle</strong> the pool of candidate models <strong>down</strong> <strong>to a set of useful models,</strong> if not a single preferred model.</p>
<p>When all is said and done, though, our guiding principle when building forecasting models is…<strong>how well the model predicts</strong>!</p>
<p>In practice, what this means for the types of models we consider is that <strong>we don’t rule anything out</strong>.</p>
<p>Yes, we have specific things we look for in an acceptable model (which we will cover later). But we don’t rule out a simple TIME trend model simply because it is too “simple.”</p>
<p>Our focus is on finding a forecasting model that can yield <strong>defensible short-run forecasts in a cost-effective manner</strong>.</p>
<h3>Potentially useful models</h3>
<p>So what kind of models do we typically examine?</p>
<p>As discussed in a <strong><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/ ‎" target="_blank" rel="noopener">previous article</a></strong>, a time series such as monthly sales (SALES) can have 3 components: <strong>trend, seasonal and cyclical</strong>. So, the type of model we consider depends on the extent to which 1, 2 or all 3 of these dynamics are present.</p>
<p>There are 3 classes of models that we typically consider. We will use a bit of math here to describe these models…think back to the formula of a line you learned in algebra: Y = a + bX.</p>
<h4>Regression models</h4>
<p>First are <strong><a href="https://en.wikipedia.org/wiki/Linear_regression">least squares regression</a></strong> models. Using SALES as our example, we could have a TIME trend model with, say, quarterly seasonality if we were examining SALES by quarter:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*TIME + b<sub>2</sub>*Q1 + b<sub>3</sub>*Q2 + b<sub>4</sub>*Q3 + ε<sub>t</sub></p>
<p>Or a lagged least squares model with quarterly seasonality:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*SALES<sub>t-1</sub> + b<sub>2</sub>*SALES<sub>t-2</sub> + b<sub>3</sub>*Q1 + b<sub>4</sub>*Q2 +b<sub>5</sub>*Q3 +ε<sub>t</sub></p>
<p><span style="color: #60786b;"><em>In these model formulae, b<sub>0</sub> is the &#8220;intercept.&#8221; b<sub>1</sub>, b<sub>2</sub>,…etc. indicate the incremental effect (i.e. slope) on sales of a change in the value of a “right hand side” variable. ε<sub>t</sub> is “residual” SALES, what is left “unexplained” by the model. And t is the time period, whether it is months, quarters, years, etc.</em></span></p>
<h4>ARMA models</h4>
<p>The second class of models are ARMA models.</p>
<p>An <a href="https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model" target="_blank" rel="noopener"><strong>ARMA process</strong></a> models SALES as being based on past SALES as well as on unobservable shocks to SALES over time. Such models can include two types of components:</p>
<p>An <strong>autoregressive (AR)</strong> component captures the effect of past SALES on current SALES while a <strong>moving average (MA)</strong> component captures random shocks to the SALES series. These are typically estimated using a <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation"><strong>maximum likelihood</strong></a> technique.</p>
<p>We could have a model that is a <strong>pure ARMA</strong> model, for example:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*AR(1) + b<sub>2</sub>*AR(2) + b<sub>3</sub>*MA(1) +ε<sub>t</sub></p>
<p>Or a <strong>mixed regression-ARMA</strong> model, sometimes called &#8220;regression with ARMA errors,&#8221; like this:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*TIME + b<sub>2</sub>*Q1 + b<sub>3</sub>*Q2 + b<sub>4</sub>*Q3 + b<sub>4</sub>*AR(1) + b<sub>5</sub>*MA(1) +ε<sub>t</sub></p>
<h4>ARIMA models</h4>
<p>A third class of models is related to the ARMA models above: <strong>ARIMA</strong>. According to standard <a href="https://en.wikipedia.org/wiki/Box%E2%80%93Jenkins_method"><strong>Box-Jenkins</strong></a> methodology, if you know the <strong>underlying trend in SALES is “stochastic”</strong> (i.e. random), <strong>remove it by differencing</strong> SALES. Then model the differenced series as an ARMA process. For example:</p>
<p style="text-align: center;">SALES<sub>t</sub> – SALES<sub>t-1</sub> = b<sub>0</sub> + b<sub>1</sub>*AR(1) + b<sub>2</sub>*MA(1) + b<sub>3</sub>*MA(2) +ε<sub>t</sub></p>
<p>However, “it is sometimes <strong>very difficult to decide whether trend is best modeled as deterministic or stochastic</strong>, and the decision is an important part of the <strong>science – and art – of building forecasting models</strong>.” (<a href="https://www.amazon.com/Elements-Forecasting-Diebold-September-Paperback/dp/B014GFR8BI/ref=sr_1_14?ie=UTF8&amp;qid=1512586234&amp;sr=8-14&amp;keywords=diebold+elements+of+forecasting" target="_blank" rel="noopener"><strong>Diebold,  Elements of Forecasting, 1998</strong></a>)</p>
<p>We will revisit this issue in a later article.</p>
<h4>Other considerations</h4>
<p>In addition to these 3 general classes of models we typically also try these variations:</p>
<ul>
<li><a href="http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/GARCH/garch101(ENGLE).pdf"><strong>ARCH/GARCH</strong></a> <strong>models.</strong></li>
</ul>
<p>These models address <a href="https://en.wikipedia.org/wiki/Heteroscedasticity" target="_blank" rel="noopener"><strong>heteroscedasticity</strong></a> in the residuals (ε<sub>t</sub>). ARCH/GARCH models are <strong><a href="http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/GARCH/garch101(ENGLE).pdf" target="_blank" rel="noopener">used in the financial arena</a></strong> to help model return and risk where market volatility can fluctuate in a predictable manner.</p>
<ul>
<li><strong>Inclusion of additional “right hand side variables.”</strong></li>
</ul>
<p>In the case of least squares and mixed regression-ARMA models, if the data are available, we often consider <strong>whether additional variables will improve predictive accuracy</strong>. In the case of SALES, for example, we could consider adding lagged values of advertising spending (AD SPEND). <strong>But</strong> if we are tasked with <strong>forecasting out 6 months</strong>, for example, then we <strong>cannot use lags</strong> of AD SPEND (in this example) <strong>shorter than 5 months</strong>. Else we would <strong>also have to forecast AD SPEND</strong>.</p>
<ul>
<li><strong>Transformations</strong>.</li>
</ul>
<p>For example, using the <a href="https://people.duke.edu/~rnau/411log.htm"><strong>natural log</strong></a> of SALES can help <strong>model non-linear trends</strong> and/or <strong>dampen variation</strong> in SALES over time which may help to <strong>improve predictive accuracy</strong>.</p>
<h3>Bottom line</h3>
<p>There are <strong>many “specifications,&#8221; many potentially useful models </strong>that we estimate.</p>
<p>But <strong>not all end up in a final “pool” of candidates</strong> for the forecasting model. Each estimated <strong>model must pass certain tests</strong> to stay in the candidate pool.</p>
<p>In a later article we will cover the tests we use to help <strong>whittle down the pool of candidates to a set of truly useful models</strong>.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CAll+models+are+wrong%2C+but+some+are+useful.%E2%80%9D&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-useful-models%2F"><div class="dpsp-click-to-tweet-content">“All models are wrong, but some are useful.”</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part I &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part II &#8211; Practical Time Series Forecasting &#8211; Some basics</strong></a></p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/">Practical Time Series Forecasting &#8211; Potentially Useful Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1245</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – Some Basics</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-basics/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 11 Dec 2017 02:50:12 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[ARIMA]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1198</guid>

					<description><![CDATA[<p>“The long run is a misleading guide to current affairs. In the long run we are all dead.” ― John Maynard Keynes, A Tract on Monetary Reform Forecasting the future is an exercise in uncertainty. And the further out one looks, the more uncertain the forecast becomes. Most businesses are keenly focused on the next&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/">Practical Time Series Forecasting – Some Basics</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“The long run is a misleading guide to current affairs. In the long run we are all dead.”<br />
― <a href="https://www.goodreads.com/author/show/159357.John_Maynard_Keynes"><strong>John Maynard Keynes</strong></a><strong>, <a href="https://www.goodreads.com/work/quotes/358282">A Tract on Monetary Reform</a></strong></p>
<p>Forecasting the future is an exercise in uncertainty. And the further out one looks, the more uncertain the forecast becomes.</p>
<p>Most businesses are keenly focused on the next quarter, 6 months, year or at most next few years. Hence, <strong>our focus in this series is on time series methods for “short-run” forecasting.</strong></p>
<h3>The nature of time series</h3>
<p>We are all familiar with charts like this:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1206 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?resize=615%2C386&#038;ssl=1" alt="Low variation time series" width="615" height="386" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?w=615&amp;ssl=1 615w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?resize=300%2C188&amp;ssl=1 300w" sizes="auto, (max-width: 615px) 100vw, 615px" /></p>
<p>showing a sequence of numbers ordered by time, across equally spaced periods of time. That is, a &#8220;<strong><a href="https://en.wikipedia.org/wiki/Time_series" target="_blank" rel="noopener">time series&#8221;</a></strong> (e.g. closing stock price per day, sales per month, GDP per quarter, average global temperature per year).</p>
<p>Some time series exhibit little variability (up/down) from time period to time period (except for an overall trend) like the one above.</p>
<p>Others exhibit considerable variability across time with a much less apparent trend, like this:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1204 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/High-variation-time-series.png?resize=615%2C384&#038;ssl=1" alt="High variation time series" width="615" height="384" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/High-variation-time-series.png?w=615&amp;ssl=1 615w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/High-variation-time-series.png?resize=300%2C187&amp;ssl=1 300w" sizes="auto, (max-width: 615px) 100vw, 615px" /></p>
<p>An oftentimes <strong>unique characteristic</strong> of time series data, relative to non-time series data, is that <strong>successive values are not independent of each other</strong>. Although it may not be apparent from looking at a chart, today’s value is usually related in some way to yesterday’s value. And possibly to that of the day and/or several days before. This makes time series model estimation more complicated than in other areas.</p>
<p>A time series chart holds a unique fascination for us. Because we are constantly aware of the progression of time, our natural reaction when we see such charts is, <strong>&#8220;I wonder what&#8217;s going to happen next?&#8221;</strong></p>
<h3>Components of a time series</h3>
<p>A successful forecasting model will account for each of <strong>3 components</strong> that may exist in a time series: <strong>trend, seasonality and cycles</strong>.</p>
<h4>Trend</h4>
<p><strong>Trend</strong>, when present, can be (but not always) visually apparent. For example, US real GDP (below) exhibits a persistent upward trend since the Great Depression.</p>
<p>Trend is a long-run phenomenon and reflects, in business, “slowly evolving preferences, technologies, institutions and demographics.” (<a href="https://www.amazon.com/Elements-Forecasting-4th-Fourth-byDiebold/dp/B004UW0PA4/ref=sr_1_2?ie=UTF8&amp;qid=1512495766&amp;sr=8-2&amp;keywords=diebold%2C+elements+of+forecasting" target="_blank" rel="noopener"><strong>Diebold, Elements of Forecasting</strong></a>)</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1211 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Real-GDP.png?resize=604%2C371&#038;ssl=1" alt="US Real GDP" width="604" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Real-GDP.png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Real-GDP.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<p>Trend comes in two flavors.</p>
<p>If GDP, for example, was knocked off its long-run growth path by a recession but returned to the same path afterwards, then trend is said to be &#8220;<strong>deterministic</strong>.&#8221; Adding a TIME dimension to a model can go a long way to capturing such “deterministic” trend.</p>
<p>On the other hand, if GDP started a new growth path after the recession, then trend is said to be &#8220;<strong>stochastic</strong>.&#8221;</p>
<p><strong> This distinction</strong> (between deterministic and stochastic trend) has <strong>important</strong> modeling and forecasting <strong>consequences</strong> which we will address in a later article.</p>
<h4>Seasonality</h4>
<p>A seasonal pattern <strong>repeats with calendar regularity</strong>.</p>
<p>The annual uptick in sales that occur during the November and December holiday season is an example. Higher airline passenger counts during the summer months is another example (see below). Adding seasonal indicators (<a href="https://en.wikipedia.org/wiki/Dummy_variable_(statistics)">&#8220;<strong>dummy variables</strong></a>&#8220;) to a model can capture such seasonality.</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1212 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Enplanements.png?resize=604%2C371&#038;ssl=1" alt="US Enplanements" width="604" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Enplanements.png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Enplanements.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<h4>Cycles</h4>
<p>A cyclic component can also be present. <strong>Cycles are much less rigid than seasonal patterns</strong>. One example is the business cycle, from a recession low to an expansion high.</p>
<p>A time series can contain one cycle (e.g. the daily cycle of body temperature) or multiple cycles (e.g. bicycle traffic patterns can exhibit daily, weekly and annual cycles). More broadly, <strong>a cyclic component is any dynamic not accounted for by trend or seasonality</strong>.</p>
<p>Modeling cycles takes us into the world of <a href="https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model"><strong>ARMA</strong></a> and <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average"><strong>ARIMA</strong></a> models which we&#8217;ll cover later.</p>
<h3>Methods for forecasting</h3>
<p>There are numerous methods for forecasting a time series, ranging from simple to complex.</p>
<h4>Simple</h4>
<p>The simplest is some type of <strong>smoothing</strong> routine, like <a href="https://en.wikipedia.org/wiki/Moving_average" target="_blank" rel="noopener"><strong>moving averages</strong></a> or <a href="https://en.wikipedia.org/wiki/Exponential_smoothing" target="_blank" rel="noopener"><strong>exponential smoothing</strong></a>. <strong>Moving averages</strong> , especially a 200-day moving average, are commonly used in technical analysis of stock price movements:</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" class="size-full wp-image-1215 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/200-Day-MAV.png?resize=554%2C464&#038;ssl=1" alt="" width="554" height="464" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/200-Day-MAV.png?w=554&amp;ssl=1 554w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/200-Day-MAV.png?resize=300%2C251&amp;ssl=1 300w" sizes="auto, (max-width: 554px) 100vw, 554px" /></p>
<h4>Complex</h4>
<p>More complex <a href="https://en.wikipedia.org/wiki/Econometric_model"><strong>econometric</strong></a> methods seek to model the relationship between, say, sales over time, and several dimensions that could affect sales, such as advertising spending.</p>
<p>Econometric models can consist of <strong>multiple interrelated equations</strong> (one for sales, one for ad spending) which would be estimated jointly, typically using a multiple regression methodology. <a href="https://en.wikipedia.org/wiki/Macroeconomic_model"><strong>Such models</strong></a> are used to model the US economy and to generate <strong>long-run forecasts</strong> of macroeconomic variables such as GDP and employment.</p>
<p>Also on the sophisticated end of the spectrum are techniques like <a href="https://en.wikipedia.org/wiki/Spectral_density#Explanation" target="_blank" rel="noopener"><strong>spectral analysis</strong></a>, <a href="https://en.wikipedia.org/wiki/Deep_learning"><strong>deep learning</strong></a> and <a href="https://en.wikipedia.org/wiki/Artificial_neural_network"><strong>neural networks</strong></a>. These methods require an <strong>elevated level of expertise</strong> on the part of a data scientist to implement and fine tune the models.</p>
<h4>Middle of the road</h4>
<p>In between the simpler and more complex forecasting methods is what we refer to as “<strong>time series methods</strong>.” These methods primarily <strong>rely on</strong> (but not always) the<strong> series’ historical behavior to inform the future</strong>. “<a href="http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm"><strong>Univariate modeling</strong></a>” is sometimes used to describe these methods.</p>
<p>A distinguishing feature of time series methods is that they <strong>explicitly account for the key characteristics of a time series</strong>: trend, seasonality and cycles.</p>
<p>The <strong>workhorses </strong>of time series methods are single equation, <a href="https://en.wikipedia.org/wiki/Least_squares"><strong>least squares</strong></a> regression and <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average"><strong>ARIMA</strong></a> models.</p>
<p>Least squares regression models can use a TIME trend, seasonal indicators and either lagged values of the series being modeled or an ARMA representation of the cyclic component to model a time series. They can also include other related lagged variables (e.g., advertising expenditures in a SALES forecasting model) but usually only if the lags are long.</p>
<p>If the trend of the series is “stochastic” (i.e. when the series is bumped off its trend path, it starts a new trend path), then ARIMA models may provide the best forecast.</p>
<h3>Back to the short-run</h3>
<p>The <strong>time series methods we will cover</strong> in this series of articles use the estimated dynamics and trend of the series to forecast a future path over the &#8220;<strong>forecast horizon</strong>.&#8221;</p>
<p>But since the <strong>forecasts will</strong> most likely ultimately <strong>revert to the underlying trend in the series</strong>, the best use of these time series methods is for <strong>&#8220;short-run&#8221; </strong>forecasts.</p>
<p>Although there is a more &#8220;technical&#8221; definition based on the type of model used, we <strong>generally define the &#8220;short run&#8221;</strong> as the <strong>period of time</strong> that <strong>matches <span style="text-decoration: underline;">most</span> business&#8217; forecast needs</strong>.  So, we are talking about anywhere from the next day to the next few years.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CThe+long+run+is+a+misleading+guide+to+current+affairs&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-basics%2F"><div class="dpsp-click-to-tweet-content">“The long run is a misleading guide to current affairs</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/">Practical Time Series Forecasting – Some Basics</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1198</post-id>	</item>
	</channel>
</rss>
