<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>regression Archives - KDD Analytics</title>
	<atom:link href="https://www.kddanalytics.com/tag/regression/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.kddanalytics.com/tag/regression/</link>
	<description>Data to Decisions</description>
	<lastBuildDate>Sat, 24 Mar 2018 02:54:36 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.3</generator>

<image>
	<url>https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2016/08/cropped-imageedit_1_7939659602.png?fit=32%2C32&#038;ssl=1</url>
	<title>regression Archives - KDD Analytics</title>
	<link>https://www.kddanalytics.com/tag/regression/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">114932494</site>	<item>
		<title>Practical Time Series Forecasting – Meta Models</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 05 Feb 2018 01:47:38 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[forecast error]]></category>
		<category><![CDATA[MAPE]]></category>
		<category><![CDATA[meta forecast]]></category>
		<category><![CDATA[MPE]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[weighting]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1331</guid>

					<description><![CDATA[<p>“There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.” ― John Kenneth Galbraith After an extensive model building and vetting process, along the lines we previously discussed here and here, the practical forecaster may still be left with several strong performing models. These models perform similarly&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/">Practical Time Series Forecasting – Meta Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.</em>”<br />
― <a href="https://en.wikipedia.org/wiki/John_Kenneth_Galbraith" target="_blank" rel="noopener"><strong>John Kenneth Galbraith</strong></a></p>
<p>After an extensive model building and vetting process, along the lines we previously discussed <strong><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener">here</a></strong> and <a href="https://www.kddanalytics.com/practical-time-series-forecasting-rolling-holdout-sample-analysis/" target="_blank" rel="noopener"><strong>here</strong></a>, the practical forecaster may still be left with several strong performing models.</p>
<p>These models perform similarly in the holdout sample tests. They retain their statistical properties when recalibrated on the full historical sample. But they <strong>yield different forecast paths over the forecast horizon</strong>.</p>
<p>Any one of the models could be easily defended. But the <strong>fact that the models yield different forecasts should make the forecaster pause</strong>.</p>
<h3>An example</h3>
<p>Below is an example of 3 short-run monthly forecasts:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1334 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Different-FC.png?resize=603%2C371&#038;ssl=1" alt="Examples of competiting forecasts" width="603" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Different-FC.png?w=603&amp;ssl=1 603w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-Different-FC.png?resize=300%2C185&amp;ssl=1 300w" sizes="auto, (max-width: 603px) 100vw, 603px" /></p>
<p>The 3 models perform similarly in the holdout sample. One of the models is a least squares model. The other 2 are ARIMA models.</p>
<p>One model produces a <strong>steeply declining forecast</strong>. Another a <strong>slightly declining forecast</strong>. The third model produces an <strong>increasing forecast</strong>.</p>
<p>What should the forecaster do?</p>
<h3>How can this happen?</h3>
<p>Models are just that – models. They are abstractions from reality. And <strong>no single model will “fit” the holdout sample perfectly</strong>.</p>
<p>Two <strong>models</strong>, especially <strong>of different types</strong> (e.g. least squares vs. ARIMA), could have very <strong>similar holdout sample performance but differ</strong> dramatically <strong>in their forecast</strong> over the forecast horizon.</p>
<p>The holdout sample <strong>MAPE</strong> (<a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>mean absolute percentage error</strong></a>) could be very similar for these models. But the <strong>MAPE is an average error across the holdout sample</strong>. And the models could have arrived at their MAPEs by <strong>focusing on different aspects of the time series in the holdout sample.</strong></p>
<p>Projecting these differences into the forecast horizon can result in very different forecasts.</p>
<h3>Solutions</h3>
<p>When there is no clear “champion” model, one <strong>solution is to combine the forecasts into one</strong>. We call this a “<strong><a href="https://en.wikipedia.org/wiki/Metamodeling">meta</a></strong>” forecast.</p>
<p>There are several ways this can be accomplished.</p>
<h4>Checkpoint</h4>
<p><strong>But first</strong>, <strong>check</strong> to make sure the <strong>models</strong> to be combined are <strong>not “nested.”</strong> That is, <strong>one model is not a subset of another</strong>. If models are nested there usually is no advantage to combining their forecasts into a meta forecast.</p>
<p>In fact, a <strong>meta forecast will more likely be superior the greater the differences between the constituent models</strong>.</p>
<p>A meta forecast based on a least squares model and an ARIMA model will likely yield a smaller forecast error than that associated with either of the two models. However, if the two models were both least squares models, the superiority of a meta forecast might be questionable (<a href="https://www.amazon.com/Forecasting-Business-Economics-Econometrics-Mathematical/dp/0122951816"><strong>Granger, 1989</strong></a>).</p>
<h4>Solution 1</h4>
<p>The simplest approach to arriving at a meta forecast is to <strong>simply average the forecasts</strong> of the individual models.</p>
<p>This essentially assumes that <strong>each model’s forecast is equally important in the meta forecast </strong>(i.e. receives equal weighting). This is a quick and uncomplicated way to generate a meta forecast.</p>
<h4>Solution 2</h4>
<p>Another approach <strong>makes use</strong> of each model’s <strong>holdout sample performance measures of forecast accuracy and bias</strong>. A weighting for each model&#8217;s forecast can be calculated using each model’s <strong>MAPE</strong> and <strong>MPE</strong> (<a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>mean percentage error</strong></a>) relative to that of all the models combined.</p>
<p>The meta forecast would then be a <strong>weighted average</strong> of the individual model forecasts. Models with <strong>lower MAPE and MPE</strong> would receive <strong>higher weights and contribute more</strong> to the meta forecast.</p>
<h4>Solution 3</h4>
<p>A third approach is to use <strong>regression</strong> to estimate the weights.</p>
<p>Using the holdout sample, or if too small, the full sample, <strong>regress the actual value on the forecasted value from each model</strong>. The goal is to find a regression with <strong>no constant and all regression coefficients positive and statistically significant</strong>.</p>
<p>The regression <strong>coefficients should then sum very close to one</strong>. These <strong>coefficients then become the weights</strong> by which forecasts are combined into a meta forecast (see <a href="https://www.amazon.com/Business-Forecasting-ForecastX-Holton-Wilson/dp/0073373648/ref=sr_1_2?s=books&amp;ie=UTF8&amp;qid=1512008807&amp;sr=1-2&amp;keywords=wilson+keating+forecasting"><strong>Wilson and Keating</strong></a>).</p>
<h3>Back to our example</h3>
<p>The forecaster could go with candidate 3 since it &#8220;splits the difference.&#8221; However, the forecaster is still left with the task of defending why the other two equally plausible models were not chosen.</p>
<p>Alternatively, a meta forecast can be used. As an example, we created a <strong>simple average forecast</strong> across the 3 candidate models. As discussed above, this <strong>assumes an equal weighting across the 3 short-run forecasts</strong>. A more sophisticated approach would have been to estimate the weights using a regression approach.</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1335 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-a-meta-forecast.png?resize=605%2C371&#038;ssl=1" alt="Example of a meta forecast" width="605" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-a-meta-forecast.png?w=605&amp;ssl=1 605w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Example-of-a-meta-forecast.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 605px) 100vw, 605px" /></p>
<p>Not surprisingly, the meta forecast is quite like the essentially flat forecast of candidate 3 (which lies almost half way between candidate 1’s and 2’s forecast). <strong>But not all cases will be like this</strong>.</p>
<p>If a regression approach to estimating the weights was used, the meta forecast could be quite different from that of candidate 3.</p>
<p>Yes, the meta forecast will lie between the two forecast extremes. But the <strong>assumed or estimated weights will dictate where the meta forecast will lie</strong>.</p>
<h3>Bottom line</h3>
<p>Combining forecasts from equally strong models is intuitively appealing since <strong>each model has its strengths and weaknesses</strong>.</p>
<p><strong> Combining</strong> models’ forecasts in a <strong>complementary fashion</strong> should lead to <strong>more robust and accurate short-run forecasts</strong>.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=Combine+forecasts+into+a+meta+forecast+for+a+more+accurate+forecast&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-meta-models%2F"><div class="dpsp-click-to-tweet-content">Combine forecasts into a meta forecast for a more accurate forecast</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/" target="_blank" rel="noopener"><strong>Part 4 &#8211; Practical Time Series Forecasting &#8211; Data Science Taxonomy</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-holdout-sample/" target="_blank" rel="noopener"><strong>Part 5 &#8211; Practical Time Series Forecasting &#8211; Know When to Hold &#8217;em</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-what-makes-a-useful-model/" target="_blank" rel="noopener"><strong>Part 6 &#8211; Practical Time Series Forecasting &#8211; What Makes a Model Useful?</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-deterministic-stochastic-trend/" target="_blank" rel="noopener"><strong>Part 7 &#8211; Practical Time Series Forecasting &#8211; To Difference or Not to Difference</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-times-series-forecasting-rolling-holdout-sample/" target="_blank" rel="noopener"><strong>Part 8 &#8211; Practical Time Series Forecasting &#8211; Know When to Roll &#8217;em</strong></a></p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-meta-models/">Practical Time Series Forecasting – Meta Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1331</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting &#8211; Data Science Taxonomy</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Tue, 02 Jan 2018 12:26:19 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1229</guid>

					<description><![CDATA[<p>“Big data is not about the data.*” ― Gary King, Harvard University (*It&#8217;s about the analytics.) Machine Learning. Deep Learning. Data Science. Artificial Intelligence. Big Data. Not a day goes by that one or all of these buzzwords stream past in our business news feeds. Data analytics has become mainstream. And you better jump on&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/">Practical Time Series Forecasting &#8211; Data Science Taxonomy</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“Big data is not about the data.*”<br />
― <strong>Gary King, Harvard University</strong></p>
<p>(*<strong><a href="https://www.slideshare.net/BernardMarr/big-data-best-quotes/3-Big_data_is_notabout_the" target="_blank" rel="noopener">It&#8217;s about the analytics</a></strong>.)</p>
<p><strong>Machine Learning</strong>. <strong>Deep Learning</strong>. <strong>Data Science</strong>. <strong>Artificial Intelligence</strong>. <strong>Big Data</strong>.</p>
<p>Not a day goes by that one or all of these buzzwords stream past in our business news feeds.</p>
<p><strong>Data analytics has become mainstream</strong>. And you better jump on board or risk being left at the station!</p>
<p>Just within the last year or so, <strong>searches</strong> of these topics have taken off. In fact, according to Google, in early 2017, search interest in one of these topics, <strong>machine learning, has eclipsed that of big data</strong>:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="aligncenter wp-image-1230 size-large" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=1024%2C329&#038;ssl=1" alt="Google Search Machine Learning" width="1024" height="329" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=1024%2C329&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=300%2C96&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?resize=768%2C247&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Google-Search-Machine-Learning-11_11_2012-to-11_11_2017.png?w=1233&amp;ssl=1 1233w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></p>
<p>So, how do <strong>time series methods for forecasting</strong> fit into the taxonomy that currently defines the data science field?</p>
<h3>Data science taxonomy</h3>
<p>Key data science terms that are related to time series methods for forecasting are <strong><a href="https://www.datasciencecentral.com/profiles/blogs/data-mining-what-why-when">data mining</a></strong>, <a href="https://www.datasciencecentral.com/profiles/blogs/18-great-articles-about-predictive-analytics"><strong>predictive analytics</strong></a>, <a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-summarized-in-one-picture"><strong>machine learning</strong></a> (supervised and unsupervised), <a href="https://en.wikipedia.org/wiki/Linear_regression"><strong>regression</strong></a>, <strong>structured</strong> and <a href="https://en.wikipedia.org/wiki/Unstructured_data"><strong>unstructured</strong></a> data.</p>
<p>These are not necessarily mutually exclusive. At the risk of incurring the wrath of the data science gods, <strong>here is our simplification</strong>:</p>
<h4>Structured vs. unstructured data</h4>
<p>Structured data are organized into “rows and columns” (spreadsheet); unstructured data are not (text in a book).</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods use structured data</strong>.</span></p>
<h4>Data mining</h4>
<p>Data mining seeks to find patterns in data, whether structured or unstructured.</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods seek to find patterns that repeat over time</strong>.</span></p>
<h4>Predictive analytics</h4>
<p>Predictive analytics seeks to find a relationship between a variable of interest (e.g. customer churn) and multiple dimensions (e.g. age, length of contract, zip code). These dimensions can be used to predict the likelihood of a customer churning (in our example).</p>
<p>Typically, predictive analytics is not based on time series data but &#8220;cross-sectional&#8221; data like a customer set. Additionally, time series methods use only a very limited set of dimensions, the primary one being past behavior of the variable being forecasted (e.g. sales).</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods typically use the past behavior of the variable being forecasted as the primary dimension.</strong></span></p>
<h4>Machine learning</h4>
<p>Machine learning means that a computer is using a program (algorithm) to “connect the dots” in the data. <strong>If you run a regression model in Excel you are engaging in machine learning.</strong></p>
<p>However, <span style="text-decoration: underline;">supervised</span> machine learning does not mean you are keeping watch over Excel as it does its stuff!</p>
<div id="attachment_1232" style="width: 310px" class="wp-caption alignright"><img data-recalc-dims="1" decoding="async" aria-describedby="caption-attachment-1232" loading="lazy" class="wp-image-1232 size-medium" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/46961317_s.jpg?resize=300%2C200&#038;ssl=1" alt="supervised machine learning?" width="300" height="200" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/46961317_s.jpg?resize=300%2C200&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/46961317_s.jpg?w=450&amp;ssl=1 450w" sizes="auto, (max-width: 300px) 100vw, 300px" /><p id="caption-attachment-1232" class="wp-caption-text">This is NOT what &#8220;supervised&#8221; machine learning means!</p></div>
<p><strong>Supervised machine learning means</strong> that the computer is seeking to find a relationship between a single variable (e.g. churn) and many dimensional variables (e.g. age, length of contract, zip code).</p>
<p><strong>Unsupervised machine learning</strong> <strong>means</strong> that the computer is seeking to find a relationship between many dimensions (e.g. age, length of contract, zip code) so that customers can, for example, be clustered into a small number of groups or tribes with similar characteristics.</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong>Time series methods are a type of supervised machine learning since they attempt to find a relationship between present and past behavior</strong>.</span></p>
<h4>Regression</h4>
<p>Regression is one way a machine finds relationships between a single variable and a few (or many) dimensional variables or past values of the variable itself. There are several flavors of regression.</p>
<p style="text-align: center;"><span style="color: #60786b;"><strong> Time series models typically use <a style="color: #60786b;" href="https://en.wikipedia.org/wiki/Least_squares">least squares</a> regression or <a style="color: #60786b;" href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood</a></strong>.</span></p>
<h3>Bottom line</h3>
<p>So, when you use time series methods for forecasting you are probably <strong>mining structured data using supervised, regression- or maximum likelihood-based, machine learning</strong>.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CBig+data+is+not+about+the+data.%E2%80%9D&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-data-science-taxonomy%2F"><div class="dpsp-click-to-tweet-content">“Big data is not about the data.”</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part 2 &#8211; Practical Time Series Forecasting &#8211; Some Basics</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/" target="_blank" rel="noopener"><strong>Part 3 &#8211; Practical Time Series Forecasting &#8211; Potentially Useful Models</strong></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-data-science-taxonomy/">Practical Time Series Forecasting &#8211; Data Science Taxonomy</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1229</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting &#8211; Potentially Useful Models</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 18 Dec 2017 08:00:05 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[ARIMA]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1245</guid>

					<description><![CDATA[<p>“All models are wrong, but some are useful.” ― attributed to statistician George Box This quote pretty well sums up time series forecasting models. Any given model is unlikely to be spot on. And some can be wildly off. But through a careful methodical process, we can whittle the pool of candidate models down to&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/">Practical Time Series Forecasting &#8211; Potentially Useful Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>All models are wrong, but some are useful.</em>”<br />
― attributed to statistician <a href="https://en.wikipedia.org/wiki/All_models_are_wrong" target="_blank" rel="noopener"><strong>George Box</strong></a></p>
<p>This quote pretty well sums up time series forecasting models.</p>
<p><strong>Any given model is unlikely to be spot on. And some can be wildly off.</strong></p>
<p>But through a careful methodical process, we can <strong>whittle</strong> the pool of candidate models <strong>down</strong> <strong>to a set of useful models,</strong> if not a single preferred model.</p>
<p>When all is said and done, though, our guiding principle when building forecasting models is…<strong>how well the model predicts</strong>!</p>
<p>In practice, what this means for the types of models we consider is that <strong>we don’t rule anything out</strong>.</p>
<p>Yes, we have specific things we look for in an acceptable model (which we will cover later). But we don’t rule out a simple TIME trend model simply because it is too “simple.”</p>
<p>Our focus is on finding a forecasting model that can yield <strong>defensible short-run forecasts in a cost-effective manner</strong>.</p>
<h3>Potentially useful models</h3>
<p>So what kind of models do we typically examine?</p>
<p>As discussed in a <strong><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/ ‎" target="_blank" rel="noopener">previous article</a></strong>, a time series such as monthly sales (SALES) can have 3 components: <strong>trend, seasonal and cyclical</strong>. So, the type of model we consider depends on the extent to which 1, 2 or all 3 of these dynamics are present.</p>
<p>There are 3 classes of models that we typically consider. We will use a bit of math here to describe these models…think back to the formula of a line you learned in algebra: Y = a + bX.</p>
<h4>Regression models</h4>
<p>First are <strong><a href="https://en.wikipedia.org/wiki/Linear_regression">least squares regression</a></strong> models. Using SALES as our example, we could have a TIME trend model with, say, quarterly seasonality if we were examining SALES by quarter:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*TIME + b<sub>2</sub>*Q1 + b<sub>3</sub>*Q2 + b<sub>4</sub>*Q3 + ε<sub>t</sub></p>
<p>Or a lagged least squares model with quarterly seasonality:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*SALES<sub>t-1</sub> + b<sub>2</sub>*SALES<sub>t-2</sub> + b<sub>3</sub>*Q1 + b<sub>4</sub>*Q2 +b<sub>5</sub>*Q3 +ε<sub>t</sub></p>
<p><span style="color: #60786b;"><em>In these model formulae, b<sub>0</sub> is the &#8220;intercept.&#8221; b<sub>1</sub>, b<sub>2</sub>,…etc. indicate the incremental effect (i.e. slope) on sales of a change in the value of a “right hand side” variable. ε<sub>t</sub> is “residual” SALES, what is left “unexplained” by the model. And t is the time period, whether it is months, quarters, years, etc.</em></span></p>
<h4>ARMA models</h4>
<p>The second class of models are ARMA models.</p>
<p>An <a href="https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model" target="_blank" rel="noopener"><strong>ARMA process</strong></a> models SALES as being based on past SALES as well as on unobservable shocks to SALES over time. Such models can include two types of components:</p>
<p>An <strong>autoregressive (AR)</strong> component captures the effect of past SALES on current SALES while a <strong>moving average (MA)</strong> component captures random shocks to the SALES series. These are typically estimated using a <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation"><strong>maximum likelihood</strong></a> technique.</p>
<p>We could have a model that is a <strong>pure ARMA</strong> model, for example:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*AR(1) + b<sub>2</sub>*AR(2) + b<sub>3</sub>*MA(1) +ε<sub>t</sub></p>
<p>Or a <strong>mixed regression-ARMA</strong> model, sometimes called &#8220;regression with ARMA errors,&#8221; like this:</p>
<p style="text-align: center;">SALES<sub>t</sub> = b<sub>0</sub> + b<sub>1</sub>*TIME + b<sub>2</sub>*Q1 + b<sub>3</sub>*Q2 + b<sub>4</sub>*Q3 + b<sub>4</sub>*AR(1) + b<sub>5</sub>*MA(1) +ε<sub>t</sub></p>
<h4>ARIMA models</h4>
<p>A third class of models is related to the ARMA models above: <strong>ARIMA</strong>. According to standard <a href="https://en.wikipedia.org/wiki/Box%E2%80%93Jenkins_method"><strong>Box-Jenkins</strong></a> methodology, if you know the <strong>underlying trend in SALES is “stochastic”</strong> (i.e. random), <strong>remove it by differencing</strong> SALES. Then model the differenced series as an ARMA process. For example:</p>
<p style="text-align: center;">SALES<sub>t</sub> – SALES<sub>t-1</sub> = b<sub>0</sub> + b<sub>1</sub>*AR(1) + b<sub>2</sub>*MA(1) + b<sub>3</sub>*MA(2) +ε<sub>t</sub></p>
<p>However, “it is sometimes <strong>very difficult to decide whether trend is best modeled as deterministic or stochastic</strong>, and the decision is an important part of the <strong>science – and art – of building forecasting models</strong>.” (<a href="https://www.amazon.com/Elements-Forecasting-Diebold-September-Paperback/dp/B014GFR8BI/ref=sr_1_14?ie=UTF8&amp;qid=1512586234&amp;sr=8-14&amp;keywords=diebold+elements+of+forecasting" target="_blank" rel="noopener"><strong>Diebold,  Elements of Forecasting, 1998</strong></a>)</p>
<p>We will revisit this issue in a later article.</p>
<h4>Other considerations</h4>
<p>In addition to these 3 general classes of models we typically also try these variations:</p>
<ul>
<li><a href="http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/GARCH/garch101(ENGLE).pdf"><strong>ARCH/GARCH</strong></a> <strong>models.</strong></li>
</ul>
<p>These models address <a href="https://en.wikipedia.org/wiki/Heteroscedasticity" target="_blank" rel="noopener"><strong>heteroscedasticity</strong></a> in the residuals (ε<sub>t</sub>). ARCH/GARCH models are <strong><a href="http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/GARCH/garch101(ENGLE).pdf" target="_blank" rel="noopener">used in the financial arena</a></strong> to help model return and risk where market volatility can fluctuate in a predictable manner.</p>
<ul>
<li><strong>Inclusion of additional “right hand side variables.”</strong></li>
</ul>
<p>In the case of least squares and mixed regression-ARMA models, if the data are available, we often consider <strong>whether additional variables will improve predictive accuracy</strong>. In the case of SALES, for example, we could consider adding lagged values of advertising spending (AD SPEND). <strong>But</strong> if we are tasked with <strong>forecasting out 6 months</strong>, for example, then we <strong>cannot use lags</strong> of AD SPEND (in this example) <strong>shorter than 5 months</strong>. Else we would <strong>also have to forecast AD SPEND</strong>.</p>
<ul>
<li><strong>Transformations</strong>.</li>
</ul>
<p>For example, using the <a href="https://people.duke.edu/~rnau/411log.htm"><strong>natural log</strong></a> of SALES can help <strong>model non-linear trends</strong> and/or <strong>dampen variation</strong> in SALES over time which may help to <strong>improve predictive accuracy</strong>.</p>
<h3>Bottom line</h3>
<p>There are <strong>many “specifications,&#8221; many potentially useful models </strong>that we estimate.</p>
<p>But <strong>not all end up in a final “pool” of candidates</strong> for the forecasting model. Each estimated <strong>model must pass certain tests</strong> to stay in the candidate pool.</p>
<p>In a later article we will cover the tests we use to help <strong>whittle down the pool of candidates to a set of truly useful models</strong>.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CAll+models+are+wrong%2C+but+some+are+useful.%E2%80%9D&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-useful-models%2F"><div class="dpsp-click-to-tweet-content">“All models are wrong, but some are useful.”</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part I &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/" target="_blank" rel="noopener"><strong>Part II &#8211; Practical Time Series Forecasting &#8211; Some basics</strong></a></p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-useful-models/">Practical Time Series Forecasting &#8211; Potentially Useful Models</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1245</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting – Some Basics</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-basics/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 11 Dec 2017 02:50:12 +0000</pubDate>
				<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[ARIMA]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1198</guid>

					<description><![CDATA[<p>“The long run is a misleading guide to current affairs. In the long run we are all dead.” ― John Maynard Keynes, A Tract on Monetary Reform Forecasting the future is an exercise in uncertainty. And the further out one looks, the more uncertain the forecast becomes. Most businesses are keenly focused on the next&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/">Practical Time Series Forecasting – Some Basics</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“The long run is a misleading guide to current affairs. In the long run we are all dead.”<br />
― <a href="https://www.goodreads.com/author/show/159357.John_Maynard_Keynes"><strong>John Maynard Keynes</strong></a><strong>, <a href="https://www.goodreads.com/work/quotes/358282">A Tract on Monetary Reform</a></strong></p>
<p>Forecasting the future is an exercise in uncertainty. And the further out one looks, the more uncertain the forecast becomes.</p>
<p>Most businesses are keenly focused on the next quarter, 6 months, year or at most next few years. Hence, <strong>our focus in this series is on time series methods for “short-run” forecasting.</strong></p>
<h3>The nature of time series</h3>
<p>We are all familiar with charts like this:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1206 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?resize=615%2C386&#038;ssl=1" alt="Low variation time series" width="615" height="386" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?w=615&amp;ssl=1 615w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/Low-variation-time-series.png?resize=300%2C188&amp;ssl=1 300w" sizes="auto, (max-width: 615px) 100vw, 615px" /></p>
<p>showing a sequence of numbers ordered by time, across equally spaced periods of time. That is, a &#8220;<strong><a href="https://en.wikipedia.org/wiki/Time_series" target="_blank" rel="noopener">time series&#8221;</a></strong> (e.g. closing stock price per day, sales per month, GDP per quarter, average global temperature per year).</p>
<p>Some time series exhibit little variability (up/down) from time period to time period (except for an overall trend) like the one above.</p>
<p>Others exhibit considerable variability across time with a much less apparent trend, like this:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1204 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/High-variation-time-series.png?resize=615%2C384&#038;ssl=1" alt="High variation time series" width="615" height="384" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/High-variation-time-series.png?w=615&amp;ssl=1 615w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/High-variation-time-series.png?resize=300%2C187&amp;ssl=1 300w" sizes="auto, (max-width: 615px) 100vw, 615px" /></p>
<p>An oftentimes <strong>unique characteristic</strong> of time series data, relative to non-time series data, is that <strong>successive values are not independent of each other</strong>. Although it may not be apparent from looking at a chart, today’s value is usually related in some way to yesterday’s value. And possibly to that of the day and/or several days before. This makes time series model estimation more complicated than in other areas.</p>
<p>A time series chart holds a unique fascination for us. Because we are constantly aware of the progression of time, our natural reaction when we see such charts is, <strong>&#8220;I wonder what&#8217;s going to happen next?&#8221;</strong></p>
<h3>Components of a time series</h3>
<p>A successful forecasting model will account for each of <strong>3 components</strong> that may exist in a time series: <strong>trend, seasonality and cycles</strong>.</p>
<h4>Trend</h4>
<p><strong>Trend</strong>, when present, can be (but not always) visually apparent. For example, US real GDP (below) exhibits a persistent upward trend since the Great Depression.</p>
<p>Trend is a long-run phenomenon and reflects, in business, “slowly evolving preferences, technologies, institutions and demographics.” (<a href="https://www.amazon.com/Elements-Forecasting-4th-Fourth-byDiebold/dp/B004UW0PA4/ref=sr_1_2?ie=UTF8&amp;qid=1512495766&amp;sr=8-2&amp;keywords=diebold%2C+elements+of+forecasting" target="_blank" rel="noopener"><strong>Diebold, Elements of Forecasting</strong></a>)</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1211 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Real-GDP.png?resize=604%2C371&#038;ssl=1" alt="US Real GDP" width="604" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Real-GDP.png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Real-GDP.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<p>Trend comes in two flavors.</p>
<p>If GDP, for example, was knocked off its long-run growth path by a recession but returned to the same path afterwards, then trend is said to be &#8220;<strong>deterministic</strong>.&#8221; Adding a TIME dimension to a model can go a long way to capturing such “deterministic” trend.</p>
<p>On the other hand, if GDP started a new growth path after the recession, then trend is said to be &#8220;<strong>stochastic</strong>.&#8221;</p>
<p><strong> This distinction</strong> (between deterministic and stochastic trend) has <strong>important</strong> modeling and forecasting <strong>consequences</strong> which we will address in a later article.</p>
<h4>Seasonality</h4>
<p>A seasonal pattern <strong>repeats with calendar regularity</strong>.</p>
<p>The annual uptick in sales that occur during the November and December holiday season is an example. Higher airline passenger counts during the summer months is another example (see below). Adding seasonal indicators (<a href="https://en.wikipedia.org/wiki/Dummy_variable_(statistics)">&#8220;<strong>dummy variables</strong></a>&#8220;) to a model can capture such seasonality.</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1212 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Enplanements.png?resize=604%2C371&#038;ssl=1" alt="US Enplanements" width="604" height="371" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Enplanements.png?w=604&amp;ssl=1 604w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/US-Enplanements.png?resize=300%2C184&amp;ssl=1 300w" sizes="auto, (max-width: 604px) 100vw, 604px" /></p>
<h4>Cycles</h4>
<p>A cyclic component can also be present. <strong>Cycles are much less rigid than seasonal patterns</strong>. One example is the business cycle, from a recession low to an expansion high.</p>
<p>A time series can contain one cycle (e.g. the daily cycle of body temperature) or multiple cycles (e.g. bicycle traffic patterns can exhibit daily, weekly and annual cycles). More broadly, <strong>a cyclic component is any dynamic not accounted for by trend or seasonality</strong>.</p>
<p>Modeling cycles takes us into the world of <a href="https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model"><strong>ARMA</strong></a> and <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average"><strong>ARIMA</strong></a> models which we&#8217;ll cover later.</p>
<h3>Methods for forecasting</h3>
<p>There are numerous methods for forecasting a time series, ranging from simple to complex.</p>
<h4>Simple</h4>
<p>The simplest is some type of <strong>smoothing</strong> routine, like <a href="https://en.wikipedia.org/wiki/Moving_average" target="_blank" rel="noopener"><strong>moving averages</strong></a> or <a href="https://en.wikipedia.org/wiki/Exponential_smoothing" target="_blank" rel="noopener"><strong>exponential smoothing</strong></a>. <strong>Moving averages</strong> , especially a 200-day moving average, are commonly used in technical analysis of stock price movements:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1215 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/200-Day-MAV.png?resize=554%2C464&#038;ssl=1" alt="" width="554" height="464" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/200-Day-MAV.png?w=554&amp;ssl=1 554w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2017/12/200-Day-MAV.png?resize=300%2C251&amp;ssl=1 300w" sizes="auto, (max-width: 554px) 100vw, 554px" /></p>
<h4>Complex</h4>
<p>More complex <a href="https://en.wikipedia.org/wiki/Econometric_model"><strong>econometric</strong></a> methods seek to model the relationship between, say, sales over time, and several dimensions that could affect sales, such as advertising spending.</p>
<p>Econometric models can consist of <strong>multiple interrelated equations</strong> (one for sales, one for ad spending) which would be estimated jointly, typically using a multiple regression methodology. <a href="https://en.wikipedia.org/wiki/Macroeconomic_model"><strong>Such models</strong></a> are used to model the US economy and to generate <strong>long-run forecasts</strong> of macroeconomic variables such as GDP and employment.</p>
<p>Also on the sophisticated end of the spectrum are techniques like <a href="https://en.wikipedia.org/wiki/Spectral_density#Explanation" target="_blank" rel="noopener"><strong>spectral analysis</strong></a>, <a href="https://en.wikipedia.org/wiki/Deep_learning"><strong>deep learning</strong></a> and <a href="https://en.wikipedia.org/wiki/Artificial_neural_network"><strong>neural networks</strong></a>. These methods require an <strong>elevated level of expertise</strong> on the part of a data scientist to implement and fine tune the models.</p>
<h4>Middle of the road</h4>
<p>In between the simpler and more complex forecasting methods is what we refer to as “<strong>time series methods</strong>.” These methods primarily <strong>rely on</strong> (but not always) the<strong> series’ historical behavior to inform the future</strong>. “<a href="http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm"><strong>Univariate modeling</strong></a>” is sometimes used to describe these methods.</p>
<p>A distinguishing feature of time series methods is that they <strong>explicitly account for the key characteristics of a time series</strong>: trend, seasonality and cycles.</p>
<p>The <strong>workhorses </strong>of time series methods are single equation, <a href="https://en.wikipedia.org/wiki/Least_squares"><strong>least squares</strong></a> regression and <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average"><strong>ARIMA</strong></a> models.</p>
<p>Least squares regression models can use a TIME trend, seasonal indicators and either lagged values of the series being modeled or an ARMA representation of the cyclic component to model a time series. They can also include other related lagged variables (e.g., advertising expenditures in a SALES forecasting model) but usually only if the lags are long.</p>
<p>If the trend of the series is “stochastic” (i.e. when the series is bumped off its trend path, it starts a new trend path), then ARIMA models may provide the best forecast.</p>
<h3>Back to the short-run</h3>
<p>The <strong>time series methods we will cover</strong> in this series of articles use the estimated dynamics and trend of the series to forecast a future path over the &#8220;<strong>forecast horizon</strong>.&#8221;</p>
<p>But since the <strong>forecasts will</strong> most likely ultimately <strong>revert to the underlying trend in the series</strong>, the best use of these time series methods is for <strong>&#8220;short-run&#8221; </strong>forecasts.</p>
<p>Although there is a more &#8220;technical&#8221; definition based on the type of model used, we <strong>generally define the &#8220;short run&#8221;</strong> as the <strong>period of time</strong> that <strong>matches <span style="text-decoration: underline;">most</span> business&#8217; forecast needs</strong>.  So, we are talking about anywhere from the next day to the next few years.</p>
<a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=%E2%80%9CThe+long+run+is+a+misleading+guide+to+current+affairs&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-basics%2F"><div class="dpsp-click-to-tweet-content">“The long run is a misleading guide to current affairs</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a>
<p><a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/" target="_blank" rel="noopener"><strong>Part 1 &#8211; Practical Time Series Forecasting &#8211; Introduction</strong></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-basics/">Practical Time Series Forecasting – Some Basics</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1198</post-id>	</item>
		<item>
		<title>Practical Time Series Forecasting &#8211; Introduction</title>
		<link>https://www.kddanalytics.com/practical-time-series-forecasting-introduction/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 04 Dec 2017 18:21:01 +0000</pubDate>
				<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Forecasting]]></category>
		<category><![CDATA[Time Series]]></category>
		<category><![CDATA[ARIMA]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[time series]]></category>
		<guid isPermaLink="false">http://www.kddanalytics.com/?p=1183</guid>

					<description><![CDATA[<p>“The only thing I cannot predict is the future.” ― Amit Trivedi, Riding The Roller Coaster: Lessons from financial market cycles we repeatedly forget It goes without saying that every business is keenly interested in knowing what the future will bring. Will sales grow next year? By how much? Will suppliers increase their prices? How&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/">Practical Time Series Forecasting &#8211; Introduction</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“<em>The only thing I cannot predict is the future.</em>”<br />
― <strong><a href="https://www.goodreads.com/author/show/14241127.Amit_Trivedi" target="_blank" rel="noopener">Amit Trivedi</a>, <a href="https://www.goodreads.com/work/quotes/46159495" target="_blank" rel="noopener">Riding The Roller Coaster: Lessons from financial market cycles we repeatedly forget</a></strong></p>
<p>It goes without saying that every business is keenly interested in knowing what the future will bring.</p>
<p>Will sales grow next year? By how much? Will suppliers increase their prices? How fast will be the adoption of a new IoT product? How much warehouse capacity is needed for the next holiday period? Will some international event plunge the global economy into a recession?</p>
<p><strong>Predicting the future is an exercise in probability rather than certainty</strong>. Businesses engage in various levels of sophistication in trying to bound the likelihood of future states to support their business plans.</p>
<p>Some have teams of economists and data scientists tasked with building complex forecasting models.</p>
<p>Many businesses, however, likely rely on less sophisticated means centered on spreadsheet models, trends and moving averages (or even educated guesses).</p>
<p><a href="https://en.wikipedia.org/wiki/Time_series" target="_blank" rel="noopener"><strong>Time series methodology</strong></a> is a <strong>moderately sophisticated yet cost effective way</strong> to generate forecasts. It is a statistical approach which bases forecasts on the past behavior of the data series in question (e.g. monthly sales).</p>
<p>And it accounts for other characteristics of a time series which can yield a more accurate forecast than, say, a simple straight-line trend model.</p>
<h3>More time, more data</h3>
<p>We have all heard the forecasts about data growth, the proverbial “<a href="https://en.wikipedia.org/wiki/Hockey_stick_graph" target="_blank" rel="noopener"><strong>hockey stick</strong></a>.”</p>
<p>By one account, human and machine-generated data is growing at 10x the rate of traditional business data. And machine-generated data is <strong><a href="https://insidebigdata.com/2017/02/16/the-exponential-growth-of-data/" target="_blank" rel="noopener">growing at 50X</a></strong> the rate.</p>
<p>A good portion of this machine-generated data has a time dimension.<strong> <a href="https://en.wikipedia.org/wiki/Internet_of_things" target="_blank" rel="noopener">Internet of Things</a></strong> (IoT) devices are proliferating, each of which has a potential to <a href="https://www.kdnuggets.com/2015/07/impact-iot-big-data-landscape.html" target="_blank" rel="noopener"><strong>collect data</strong></a> over time.</p>
<p>A washing machine can monitor, collect and post performance data to the cloud. Using these data to forecast product failure can lead to a pro-active maintenance visit by your friendly but lonely <strong><a href="https://www.youtube.com/watch?v=n7z6AKPGDZ4" target="_blank" rel="noopener">Maytag repairman</a></strong>.</p>
<p>Similarly, a household’s electricity usage can be monitored, modeled and forecasted leading to cost-savings suggestions by an energy service provider under a time-of-day pricing scheme.</p>
<p>The electric utility company itself can use the data from all the IoT appliances in households to generate better residential load forecasts and help better <a href="https://dupress.deloitte.com/dup-us-en/focus/internet-of-things/iot-in-electric-power-industry.html" target="_blank" rel="noopener"><strong>manage the electricity grid</strong></a>.</p>
<p>Even if a more traditional businesses source like sales, inventories, deliveries, workforce utilization, IT usage and the like, advances in data collection, storage and proliferation are making <strong>time series data more readily accessible</strong>.</p>
<p>Thus, there will be an <strong>increased demand</strong> for product managers, economists, statisticians and data scientists to make use of these data and <strong>tell us what will happen next</strong>.</p>
<h3>Time series methods</h3>
<p>The <strong>premise</strong> of time series methods (and of most quantitatively-based forecasting methods) is that the <strong>future will be much like the past</strong>.</p>
<p>If sales have been growing at a consistently healthy rate with strong seasonal variation (e.g. holiday periods) for the last year, then it is likely the next year will be similar, all else constant. If done correctly, the <strong>methodology can yield a defensible forecast</strong> of likely sales each month during the “forecast horizon.”</p>
<p>But, as with all forecasting methodologies, <strong>there are pitfalls of which one should be aware</strong>.</p>
<h3>Practical time series methods</h3>
<p>This is the first of a series of articles on <strong>practical time series methods for short-run business forecasting</strong>.</p>
<p>There are abundant, excellent resources covering the basics of business forecasting including time series methods, ranging from blog posts to online courses to <a href="https://www.otexts.org/fpp2" target="_blank" rel="noopener"><strong>open-source textbooks</strong></a>.</p>
<p>And time series methods are a mainstay of advanced courses in econometrics and business forecasting (resources we recommend are <a href="https://www.amazon.com/Elements-Forecasting-Book-Francis-Diebold/dp/0324359047/ref=sr_1_1?ie=UTF8&amp;qid=1510100591&amp;sr=8-1&amp;keywords=elements+of+forecasting&amp;dpID=512OHGykTZL&amp;preST=_SX218_BO1,204,203,200_QL40_&amp;dpSrc=srch" target="_blank" rel="noopener"><strong>Elements of Forecasting</strong></a> by Diebold and <a href="https://www.amazon.com/Econometric-Models-Economic-Forecasts-Pindyck/dp/0079132928/ref=sr_1_7?ie=UTF8&amp;qid=1512407309&amp;sr=8-7&amp;keywords=pindyck+and+rubinfeld" target="_blank" rel="noopener"><strong>Econometric Models and  Economic Forecasts</strong></a> by Pindyck and Rubinfeld).</p>
<p><strong>Rather than being a treatise on forecasting, this series of articles will present a practical methodology and some of the lessons we have learned performing time series forecasting for clients.</strong></p>
<p><a class="dpsp-click-to-tweet dpsp-style-1" href="https://twitter.com/intent/tweet?text=A+practical+methodology+for+business+time+series+forecasting.&url=https%3A%2F%2Fwww.kddanalytics.com%2Fpractical-time-series-forecasting-introduction%2F"><div class="dpsp-click-to-tweet-content">A practical methodology for business time series forecasting.</div><div class="dpsp-click-to-tweet-footer"><span class="dpsp-click-to-tweet-cta"><span>Click to Tweet</span><i class="dpsp-network-btn dpsp-twitter"><span class="dpsp-network-icon"></span></i></span></div></a><strong><br />
</strong></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/practical-time-series-forecasting-introduction/">Practical Time Series Forecasting &#8211; Introduction</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1183</post-id>	</item>
	</channel>
</rss>
