KDD Analytics

If odds are not odd, what about odds ratios?

KDD — Mon, 28 Jun 2021 00:37:12 +0000

What are the odds of developing a brain tumor from long-term use of cell phones?

This is an evolving area of research. Some studies have found an association and others have not.

But two recent meta-analyses suggest that the odds are about 33 to 44% greater due to long-term cell phone usage.

Got your attention?

“But what does this do to my odds of developing a brain tumor?” you may ask.

Before we answer that, we need to explain how the meta-analyses derive this 33 to 44% figure. Which introduces us to odds ratios.

Case-control studies

Studies of the association between cell phone usage and brain tumor are typically case-control studies.

Such studies are retrospective, as opposed to prospective.[1] They combine a sample of patients (cases) already diagnosed with a brain tumor with a random sample of non-patients (controls) drawn from the general population. Study investigators match controls to each case based on key demographics such as sex, age, and region.

The studies then measure and test for the existence of an association between exposure (cell phone usage) and outcome (brain tumor).[2]

Typically, these case-control studies report their estimated effects, not in terms of odds, but in terms of odds ratios.

So, what is an odds ratio?

Odds ratios

An odds ratio is a measure of association strength. In this case, between cell phone usage and the diagnosis of a brain tumor.

As an example, we can use the results from one of the high-quality studies used in the meta-analyses mentioned above to show how odds ratios are calculated.[3]

The data shown in the following table are from a case-control study conducted in Sweden between 2000 and 2003.[4] The data are for long term cell phone usage (>= 10 years). The reference category is no cell phone usage.[5]

In an earlier article we learned that the odds of an event occurring are the number of events divided by the number of non-events.

Thus, the odds of a long-term cell phone user in this sample being diagnosed with a brain tumor is (16 / 232) or 0.069; about 1 to 14.

The odds of a non-cell phone user being diagnosed with a brain tumor is (18 / 674) or 0.027; about 1 to 37.

The odds ratio is simply the ratio of the two odds: (0.069 / 0.027) or 2.582.

So, the odds of a long-term cell phone user being diagnosed with a brain tumor are 2.582 times greater compared to a non-cell phone user.

Alternatively, this can be stated in terms of a % difference. The odds of a long-term cell phone user being diagnosed with a brain tumor are 158% greater compared to a non-cell phone user ((2.582 – 1) * 100).

That is a pretty large effect.[6]

Meta-studies

Now this is just one study. The two meta-studies alluded to above each combined the results of 7 different, high-quality studies.

They found that the overall odds (across the studies) of a long-term cell phone user (>= 10 years) being diagnosed with a brain tumor (any tumor type) are 33% and (with respect to glioma, a common type of tumor) 44% greater compared to a non-cell phone user.[7]

These meta-studies found no effect due to cell phone usage over a shorter period (i.e., < 10 years).

So, it appears that the risk, if it exists, is associated with long-term usage. Moreover, using a cell phone on the same side of the head is associated with 46% greater odds of developing a glioma on that side of the head.[8]

Odds of developing a brain tumor

So, back to our original question. What are the odds of developing a brain tumor from long term cell phone usage?

The odds of developing a brain tumor among the general population is very low to start with. Annual incidence in the US (2018) is 6.5 per 100,00 or 0.0065%. In terms of odds, this is about 1 to 15,000.

So, a 44% increase in the odds would mean 9.4 per 100,000 or about 1 to 10,000. Still quite low.[9]

As one researcher put it, “Your chance of being hurt by distracted driving because you’re using your cell phone wipes out the risk of getting cancer.”

However, in 2011 the World Health Organization’s International Agency for Research on Cancer (IARC) did classify cell phones as a Group 2B carcinogen (i.e., possibly causes cancer).

And there continues to be a healthy debate in both the statistical and public arenas.

Studies are continuing to be released which purportedly finding evidence that recent increasing rates in glioblastomas, an aggressive type of cancer, are tied to cell phone usage.

Skeptics argue that changes in WHO classification of what is considered a glioblastoma may be responsible for any uptick in brain tumor incidence. And that the large, increased risk reported by studies, like the meta-studies discussed above, are inconsistent with the historical trend in brain tumor incidence.[10]

As we said at the outset, this is an evolving area of research, with lots of issues to untangle.

One thing to keep in mind, though, is who is funding the research. A topic we will cover in a later article.

We have odds ratios to thank

Back to the main point of this article.

Odds facilitate the measurement of the relative likelihood of events. Epidemiological studies that are retrospective, commonly use the odds ratio as this relative measurement of association strength.

So, the next time you hear that your favorite dietary choice increases your chances of developing cancer, it is probably the result of that not-so-oddity, the odds ratio.

[1] Prospective cohort studies have also been used (i.e., studies which track subjects over time). See here for a summary of the advantages and disadvantages of retrospective and prospective studies.

[2] Exposure is determined by answers to a lengthy questionnaire. Hence, one of the criticisms levied against case-control studies is respondent recall bias. That is, whether respondents accurately recall their cell phone usage, particularly over a long period of time.

[3] Studies are graded on a quality scale considering such factors as selection of cases and controls, comparability of cases and controls based on study design, and proper assessment/measurement of exposure.

[4] The results shown in the table are taken from a meta-study which considered this Hardell et al (2006) study.

[5] As cell phone usage becomes more ubiquitous, and fewer people who have never used a cell phone are available in the population, the exposure will need to be increasingly measured in terms of levels/frequency of usage.

[6] The additional risk derived using an odds ratio is closely related to the concept of efficacy, which is derived directly from the concept of relative risk (ratio of probabilities). We covered efficacy in an earlier article. Epidemiologists typically use relative risk to measure association strength in prospective (cohort) studies; odds ratios in case-control studies.

[7] Meta-studies start with a larger number of studies. They then cull studies from the final sample for various reasons, such as data availability and the quality grade they receive.

[8] All these studies on brain tumors controlled for whether cell phones were being used next to users’ heads.

[9] See for US cancer incidence rates as of 2018.

[10] See also Geoffrey Kabat (2017).

The post If odds are not odd, what about odds ratios? appeared first on KDD Analytics.

Odds and probability…two sides of the same coin

KDD — Fri, 04 Jun 2021 16:59:03 +0000

What are the lifetime odds of dying from being hit by a meteorite?

1 in 1,600,000.

Yep, not very likely. You are much more likely to die from a dog attack (1 in 86,781) or from a lightning strike (1 in 138,849).

But why odds?

Why not express these likelihoods in terms of probabilities? Seems like a more natural way to express the chance of an event occurring, doesn’t it?

Odds, however, are commonly used to express event risk. And of course, the chances of winning a sporting event.

As we write, the odds of the Los Angeles Dodgers repeating as World Series champions in 2021 are 1 in 3.[1]

So, what are odds?

The number of times an event occurs divided by the number of times it does not occur.

In the case of a meteorite strike, for every person that dies, 1.6 million do not. In the case of the Dodgers, we would expect the Dodgers to win 1 World Series for every 3 they lose.

But this still begs the question, why odds and not probabilities?

Odds and probabilities

It is true that the probability of a low likelihood event is so small that stating it as a % requires a lot of zeros after the decimal (0.0000625% in the case of dying from a meteorite strike).

But that is not an insurmountable objection. For example, the risk of disease is often expressed in terms of rates per 100,000 to make the chances of low likelihood events easier to comprehend. Or we could state the probability of the non-event…not dying from a meteorite strike (99.994%).

A more important reason for using odds is that they facilitate multiplicative comparisons.

A simple example makes clear how probabilities can fall short.

Suppose the probability that Beth will go out to dinner this weekend is 75%. We cannot say, then, that the probability of Jose doing the same is 3 times that of Beth’s probability.

Why? Probabilities are constrained to lie between 0 and 1. And 3 * 0.75 > 1.0.

So, what do we do? Enter odds.

Odds are unconstrained

Odds are only bounded on the low end, by 0. Let’s return to Beth and Jose.

The odds of Beth going out to dinner are 3 or 3/1. Why 3/1?

Remember odds are the ratio of the events to non-events. Beth is 75% likely to go out. So, if she is faced with 4 opportunities to go out, she will do so 3 times. In other words, she will go out (event) 3 times for every time she stays home (non-event). 3 to 1.

Now, if Jose is 3 times as likely to go out as Beth, his odds are simply 3 * 3 or 9. Equivalently, we can express his odds as 9 to 1 or 9/1.

On the odds scale, odds can be 2, 10, 50 times greater…there is no upper limit. And this makes them very useful when we wish to compare the relative likelihood of events occurring.

Two sides of the same coin

It turns out that if we are still interested in the probability, we can easily derive it from the odds. Odds and probability are two sides of the same coin.

Odds (o) are related to probability (p) by the following:

o = p / (1 – p) = (probability of event / probability of non-event)

Rearranging we find the “other side of the coin” (for an event):

p = o / (1 + o) = (odds of event) / (1 + odds of event)

So, in the case of Beth and Jose we get:

The relationship between odds and probability is shown graphically below.

As the odds increase, the probability also increases but in a non-linear manner. As shown above, the probability “increases at a decreasing rate” and approaches 1.0 “asymptotically” (i.e., as the odds get very large, the probability approaches but never quite reaches 1.0).[2]

But any finite odds will map to a probability between 0 and 1.

Odds are preferred

When comparing the relative chances of events (or sports teams), odds are the preferred way of expressing how much more likely one event is over another. We can always derive the associated probability. But since odds are unconstrained, there is no issue with saying the Los Angeles Dodgers are 11 times as likely (as of May 31, 2021) to win the World Series in 2021 than the Chicago Cubs.

So, the next time someone tells you the odds of rain during your camping trip this weekend are 5 to 2, you might want to sleep in a tent.

[1] As of May 31, 2021, the reported odds of the Dodgers repeating are 3 to 1 or 3/1. In the betting world, this is referred to as fractional odds. The number on the left or numerator is typically the number of times the team is expected to lose. 3/1 yields an implied probably of losing 3 times out of 4 or 75%. Thus, the probability of the Dodgers repeating are (1 – 0.750) or 25%. Expressing as odds yields 1/3.

[2] In the limit, if the odds = infinity, then probability = 1.

The post Odds and probability…two sides of the same coin appeared first on KDD Analytics.

Tableau Basics: SUM vs AVG

KDD — Fri, 21 May 2021 17:49:37 +0000

First time users of Tableau often get tripped up over the default Tableau SUM aggregation. Here is what I mean.

Suppose the question is to find the average of SALES PER VISIT (sales measured across the preceding 6 months) among the males and females in a sample of 25 shoppers. The data look like this in Excel:

TIP: We can easily input these data to Tableau by cutting and pasting the selection into the Tableau canvas:

Now to answer the question.

First time users of Tableau may correctly put GENDER on the row shelf and SALES PER VISIT on the column shelf. Tableau defaults to a bar chart yielding:

First time users may also put SALES PER VISIT on the Label Marks card, which displays the value next to each bar in the chart. They may even put GENDER on the Color Marks card to give the viz some pop.

And then they call it done. Males spend more on average than females.

But do they?

We note that the green pills show SUM(Sales per Visit). Tableau’s default “aggregation” is to sum the values across the rows in the data set.

Going back to Excel, if we sum SALES PER VISIT by GENDER across the 25 rows, we get, using the SUMIF function:

This is exactly what Tableau shows.

But we want to find the average. In Excel, using the AVERAGEIF function, we see that females spend on average $11.62 while males spend $9.13 per visit.

To get Tableau to match, we simply change the aggregation by right-clicking on each of the green pills and select Measure (Average) from the drop-down menu.

Now we get the correct answer to our question.

If Tableau is not yielding the correct answer, try thinking about how you would do it in Excel. Sometimes, but not always, this will provide the proper guidance.

The post Tableau Basics: SUM vs AVG appeared first on KDD Analytics.

Curse of Big Data

KDD — Mon, 03 May 2021 11:31:59 +0000

“Big data.”

We checked in with Google search trends recently. Appears that “Big Data” has lost its luster search-wise…started trending down about 4 years ago.

Nowadays, everything is big data?

Implications of big data

However, this does not mean we should lose sight of certain statistical implications associated with being “big”. Yes, large amounts of data can help us estimate relationships (effects) with a high degree of precision.

And help us uncover low occurrence events such as the blood clotting cases associated with the Johnson & Johnson COVID-19 vaccine.

But massive amounts of data can reveal patterns that are not always meaningful or happen by chance.

Additionally, from a statistical inference perspective, with big data, even small, uninteresting effects can be statistically significant.

This has important implications for inferential conclusions about the associations we are studying.

And it does not take all that much data for this to happen.

Small clinical trial example

As an example, consider the following hypothetical results from a clinical trial of a “common” cold vaccine:

The table shows the number of subjects who had both a positive outcome (no infection) and negative outcome (infection) across the two types of treatment. A standard statistical test of association, the Pearson chi-squared, indicates we cannot say there is any difference in outcomes across the two treatment types.

That is, we cannot reject the “null” hypothesis of no association at the 95% level of confidence (i.e., X²= 0.024).

The strength of the association, or effect size, is obtained from the ratio of relative risks.

The probability of a vaccinated subject getting sick is (24 / 59) or 0.407 (40.7%) while that for the placebo group is (29 / 69) or 0.420 (42.0%).

So the relative risk ratio is (0.407 / 0.420) or 0.968.[1]

Thus, we would expect that when applied to the population, under the same conditions as the study, there would be 3.2% fewer infections among those who received the vaccine (i.e., (1 – 0.968) *100)).

This 3.2% is known as the efficacy rate of the vaccine.

The 95% confidence interval for the relative risk ratio is wide (i.e., 0.639 to 1.465) indicating a lack of precision in the point estimate of 0.968.

The study investigators conclude that the effect of the vaccine is neither statistically nor practically significant.

Aside from its statistical insignificance, an efficacy rate of just 3.2% is not nearly large enough to justify starting production of the vaccine.

Large clinical trial example

Contrast this with the following study results based on a much larger sample of 44,800 subjects:[2]

The Pearson chi-squared statistic (X²) is now 8.375. Thus, the hypothesis of no association can be rejected at the 95% level of confidence.

And the 95% confidence interval for the relative risk ratio is much narrower indicating a much higher level of precision (i.e., 0.947 to 0.990).[3]

The study investigators now conclude that there is a statistically significant association between receiving the vaccine and avoiding a cold infection (positive outcome).

But, the relative risk ratio of a positive outcome from receiving the vaccine is identical to that obtained from the smaller study, 0.968.

Implying the efficacy rate is also the same, 3.2%.

Practical vs statistical significance

What are we to make of this?

From the perspective of effect size, do the larger study results carry more weight simply because the hypothesis of no association can be rejected? Even though the practical significance has remained the same?

We can turn a very small, 3.2% effect into a statistically significant effect by simply increasing the sample size.

But does this change the practical significance of the 3.2%?

No.

If 3.2% was deemed by the study investigators to be practically insignificant, it remains practically insignificant. Despite the larger sample size and despite it now being statistically significant.[4]

A curse of data “bigness”

With a large enough sample, everything is statistically significant, even associations that are practically not significant or very interesting.

The implication is that rather than focusing on hypothesis testing as sample sizes increase, the focus should shift. Towards the size of the estimated effect, whether the estimated effect is “practically” important, and “sensitivity analysis” (i.e., how does the estimated effect change when control variables are added and dropped).[5]

Confidence intervals can and should play a role. But they will get narrower and narrower as sample sizes grow. And everything within the confidence interval could still be deemed not practically important.

In sum, as data get bigger (and it does not take massive amounts of data for this to be an issue), we need to guard against concluding that a small effect is practically significant just because the p-value is very small (i.e., the effect is statistically significant).

The curse of big data is still very much with us.

[1] A ratio of 1.0 would mean no difference in effect between the treatment types.

[2] As a point of comparison, the 2020 Moderna and Pfizer COVID-19 vaccines trials consisted of about 30,000 and 40,000 subjects.

[3] A more complicated technique is used to calculate confidence intervals for actual clinical trial results than used here, which typically result in wider intervals. For example, in 2020, Moderna reported an efficacy rate of 94.1% for its COVID-19 vaccine with a 95% confidence interval of 89.3% to 96.8%.

[4] Since the standard error of the relative risk ratio estimate is based on the cell counts in the contingency table, increasing the size of the sample lowers the standard error, making it more likely we can reject the null hypothesis at a given level of confidence.

[5] The paper Too Big to Fail presents a nice discussion of these issues. Additionally, the American Statistical Association released recommendations on the reporting of p-values.

The post Curse of Big Data appeared first on KDD Analytics.

San Diego and COVID-19 … A Very Challenging Year

KDD — Fri, 23 Apr 2021 01:59:25 +0000

We just noticed that it has been a full year since we started posting daily updates to our San Diego County COVID-19 dashboard.

This dashboard tracks the San Diego COVID experience: new cases, tests, and positivity rates at the county-level as well as new cases for each of the county’s ZIP Codes.

On this first-year anniversary of these daily postings, we thought we would look back at this roller coaster year.

And, although our dashboard does not include US data, we thought that comparing the San Diego COVID experience with the national average would be insightful.

San Diego COVID experience vs the nation

The following figure shows the 7-day moving average of daily new cases per 100,000, for both San Diego County and the entire US.[1]

As shown in the above figure, as a nation we have been through 3 waves with it being too soon to tell if the 4^th wave has crested. San Diego County’s experience was generally similar except for the 4^th wave.

Wave #1

The initial rise in daily new cases crested at a 7-day average of 10 per 100,000 for the US on April 12, 2020. San Diego’s first wave crested about a week earlier on April 4^th at about 4 per 100,000.

The US new case 7-day average fell to 6 per 100,000 by mid-June. San Diego’s briefly fell a bit but then rose back up to a daily rate of 3 to 4 per 100,000 till mid-June.

So, San Diego did not really experience the same recovery from the first wave as the US.

Wave #2

For both the US and San Diego, the second, much larger wave began in mid-June 2020. The US new case rate increased from a 7-day average of about 6 per 100,000 to a peak of 21 per 100,000 on July 23^rd. San Diego’s rate increased from about 4 per 100,000 to a peak of 16 per 100,000 on July 2^nd.

Both the US and San Diego new case rates declined through the end of the summer. The US new case rate bottomed at a 7-day average of 13 per 100,000 on September 13^th. San Diego’s bottomed at the end of August at about 8 per 100,000 and stayed essentially flat till mid-October.

Wave #3

The US third wave began in mid-September – a full month before San Diego was hit. Rising from a 7-day average low of about 11 per 100,000, the US new case rate increased throughout the fall and early winter, peaking at 76 cases per 100,000 on January 11, 2021.

San Diego’s third wave began in mid-October. From a 7-day average low of about 8 new cases per 100,000 on October 20, the new case rate peaked at a high of 109 per 100,000 on the same day that this third wave peaked for the country.

As the figure shows, San Diego (as well as Los Angeles) suffered much higher new case rates than the national average.

But daily new cases started to decline just as steeply as they increased. San Diego’s 7-day average new case rate fell from this high of 109 to around 7 per 100,000 by April 20, 2021.

The US new case rate fell as well, from a 7-day average of 76 to about 16 by March 19, 2021.

Wave #4

Until this point, San Diego’s experience, though different in severity, matched the general pattern of the country. However, a 4^th US wave began in mid-March 2021, driven by new outbreaks in Michigan and New Jersey. It is too soon to tell if this 4^th wave has crested but the most recent peak is a 7-day average of 21 new cases per 100,000 on April 13^th.

San Diego has been fortunate to escape this 4^th wave (so far).

Fingers crossed…

[1] San Diego new case data are from the San Diego County Health Department. US new case data are from the CDC. The 7-day moving average is the average of the current and preceding 6 days. 2019 population is used to normalize case counts so we can compare San Diego with the nation.

The post San Diego and COVID-19 … A Very Challenging Year appeared first on KDD Analytics.

Efficacy vs Effectiveness of the COVID Vaccines…”tomato, tomahto”?

KDD — Thu, 08 Apr 2021 18:15:48 +0000

You like potato and I like potahto
You like tomato and I like tomahto
Potato, potahto, tomato, tomahto
Let’s call the whole thing off

But oh, if we call the whole thing off
Then we must part
And oh, if we ever part
then that might break my heart

—Ira Gershwin

The eye-popping efficacy rates reported for the Moderna (94%), Pfizer (95%) and, to a lesser extent, the Johnson & Johnson (66%) COVID-19 vaccines have undoubtedly not escaped your attention.

But what is vaccine efficacy and how is it calculated? And how does it differ from vaccine effectiveness?

Moderna vaccine efficacy

First, consider efficacy. Using Moderna’s reported clinical trial results as an example, we see that it is a straightforward calculation.

Moderna reported results from it’s COVID-19 vaccine trial in November 2020. The results are shown below in a 2×2 “contingency” or “cross-tabulation” table. The columns show the number of subjects who were infected (or not); the rows show the number who received the vaccine (or the placebo). And the cells show the intersection of those two events.

Relative risk

The strength of the association, or the effect size, between receiving the vaccine and not getting infected is measured by the relative risk.

The probability or risk of a vaccinated subject being infected is 0.08%. That is, (11 / 14,134) or the expected number of events / sum of events and non-events. For a subject receiving the placebo, the probability of infection is higher at 1.31% (i.e., 185 / 14,073).

So, using the placebo group as the reference group, the relative risk is (11 / 14,134) / (185 / 14,073) or 0.059.[1]

In other words, the risk of a vaccinated person being infected is 94.1% lower compared to a subject who received the placebo (i.e., (1 – 0.059) * 100)).

It is this calculation of 94.1% that was reported by Moderna as the vaccine’s efficacy rate.[2]

Vaccine effectiveness

So, what about vaccine effectiveness? The term effectiveness refers to how the vaccine performs in the real world. Efficacy refers to how the vaccine performs under “optimal” conditions of a clinical trial.

Clinical trials are based on a sample of subjects who may not be fully representative of the general population (e.g., all comorbidities are not controlled for). In addition, the COVID strain that existed in the population during the clinical trial period may not be the same that occurs when the vaccine is released. Also, vaccine transportation, storage and delivery may differ from the more controlled environment of the clinical trial. Thus, the effectiveness of the vaccine may be different from what was found during the clinical trial.

Studies on COVID vaccine effectiveness

So, do we have any data yet on the real-world effectiveness of the COVID vaccines? It takes time to collect data, but we do have some indication that vaccine effectiveness is very high.

An early study appeared February 24, 2021 in the New England Journal of Medicine. The study examined the Pfizer vaccine performance in Israel. The sample was matched data from over 1 million people, half who were vaccinated between December 2020 to February 2021 and half who were not. The results of the study suggest a symptomatic infection effectiveness rate of 94% 7+ days after the second dose.

A more recent study released by the CDC on April 2 examined both the Pfizer and Moderna vaccines. This study used US data from December 2020 to March 2021. The sample consisted of 3,950 health care personnel, first responders, and other front-line workers. The study found that the vaccines were 90% effective against COVID infection 14+ days after the second dose. Even 14+ days after the first dose the vaccines were 80% effective.

As a point of comparison, according to the CDC, effectiveness of the annual flu vaccination ranges between 40 and 60%.[3]

So, the effectiveness rate, after 2 doses of the Pfizer and Moderna vaccines, appears to be very close in magnitude to the efficacy rate.

Very good news indeed!

Tomato, tomahto?

[1] A relative risk ratio of 1.0 would mean no difference in effect between the treatment types.

[2] A summary of efficacy rates across the range of current COVID vaccines can be found here.

[3] One reason for the range is that the flu strain that is in circulation can differ from what was predicted when the annual flu vaccine was developed earlier in the year.

The post Efficacy vs Effectiveness of the COVID Vaccines…”tomato, tomahto”? appeared first on KDD Analytics.

How to Visualize Changing Recession Start Date Forecasts

KDD — Sat, 05 Jan 2019 22:19:59 +0000

In case you missed it, we are in a recession.

According to Intensity’s latest US recession start date forecast, there is a 50% probability of a recession starting sometime in the January to February 2019 period. And a 97% probability of it starting sometime within the next 6 months.

Their “point estimate” of a recession start is January 2019.

Like, as in, right now!

If true, it will take awhile for the impacts to start showing up in the official government statistics. But the stock market sell-off last quarter may be a harbinger of things to come.

Intensity, an economics and data science firm based in San Diego, CA, developed and back-tested a machine learning prediction algorithm for its clients. The firm started releasing a monthly forecast of the next US recession start date to the public starting in March 2018.

Over the course of the last 11 months, it has been interesting following the updates to their forecast as economic conditions changed.

Intuitively, one would expect that the forecast would “settle down”, the closer the expected start date became.

And it got me thinking about what the best way is to visualize these changing forecasts.

Visualizing Forecast Updates Over Time

The forecasted recession start date is not linear with time. For example, in March 2018, the next recession was forecasted by Intensity to start in April 2019. But in April 2018, the forecast was revised, and the recession was to start 6 months earlier in October 2018.

Plotting the month of the forecast on the x-axis and the forecasted month of the recession start on the y-axis yields a “traditional time series” view as shown below.

As time progresses from left to right, we can see the forecasted recession start date fluctuating up and down, settling on January 2019, the most recent forecasted start date.

However, another way to visualize this is to show the progression of time vertically, from bottom to top. In this case the forecasted recession start date would fluctuate horizontally, left and right, as shown below.

I don’t know about you, but I find this second view more appealing. Maybe it is the old economist in me, trained on the Phillips Curve in graduate school. But for me, the vertical, “up-down” orientation makes the variation in the forecasted recession start date “pop” more than in the horizontal, “left-to-right” view.

So, Recession in 2019?

It will be very interesting to see if Intensity sticks to its January 2019 point estimate. Prior to the unexpectedly positive December 2018 jobs report, the consensus seemed to be a recession starting some time in 2019 or 2020. For example, Gary Shilling recently tossed his hat into the recession ring with a predicted 66% chance of a recession in 2019.

However, the positive jobs report apparently has many economists now softening their stance on a recession this year. And there is talk of policy makers being able to sidestep a recession.

Only time will tell…so stay tuned!

Plotting Ordered Times Series in Tableau

By the way, these charts were made in Tableau. And it was not as straight forward as flipping the axes to get the vertical view. Tableau’s default inclination is to “connect the dots” from left to right when time is involved.

Fortunately, there is an easy way to get Tableau to connect the dots vertically. This makes use of the Path property in the Marks card. I simply added a field to my raw data that indicated the order of my data, which, of course was calendar order.

Then dropping this field on the Path property in the Marks card tells Tableau to connect the dots (or “Marks” in Tableau-speak) in this order. With the date of the forecast on the vertical, y-axis, Tableau connects the dots from bottom to top.

Very slick!

US Recession starting January 2019?

The post How to Visualize Changing Recession Start Date Forecasts appeared first on KDD Analytics.

Practical Time Series Forecasting – Bounding Uncertainty

KDD — Mon, 12 Feb 2018 03:46:22 +0000

“A good forecaster is not smarter than everyone else, he merely has his ignorance better organized.”
― Anonymous

Predicting the future is an exercise in probability rather than certainty. As we have mentioned several times over the course of these articles, your forecast model will be wrong.

It is just a matter of how useful it might be.

A time series model will forecast a path through the forecast horizon, a “point forecast.” But this path is just one of the paths your forecast can take based on your estimated model.

Providing a sense of the uncertainty surrounding your forecast is an essential part of your job as a forecaster.

Forecast intervals

The standard approach is to provide the “forecast interval” for your forecast.

Typically, this is cast in terms of a 95% prediction interval. That is, 95 times out of 100, the actual value will fall within the specified range. (Note that there is a difference between a “confidence” interval and a “forecast” interval.)

Sources of forecast uncertainty

There are at least two sources of forecast uncertainty over the forecast horizon.

The first results from our ignorance of what the model’s error will be in the forecast horizon. So, we must rely on how well the model did in the recalibration sample (estimation + holdout) as an estimate.

The second source of uncertainty results from the model’s coefficients (or parameters) being estimates of their true values. As estimates, they have their own “confidence” interval.

As a result, the forecast interval can be quite large (as shown above). And, due to error compounding over time, the forecast interval widens the further into the forecast horizon you go.

In our example above, during the first month of the forecast horizon, the forecast interval is plus or minus 0.63% of the forecasted value. By month 6, this spread widens to plus or minus 2.95%.

Even accounting for forecast error and parameter uncertainty, these forecast intervals may still be too narrow.

What about meta forecasts?

In an earlier article we discussed combining forecasts into a meta forecast. The challenge in terms of a meta prediction interval is that it is not a simple matter to combine the prediction intervals of the constituents’ forecasts.

One approach is to simply show the extreme upper and lower forecast paths along with the meta forecast path, which will lie somewhere between the two extremes.

And then to caution the consumer of your forecast that this is just to give a sense of the possible forecast range, which will likely be too narrow (since the upper and lower forecast will each have their own prediction interval).

Probability-based assessment of forecast uncertainty

Another approach is to couch your forecast uncertainty in terms of a probability.

For example, based on your SALES forecast, what are the chances of hitting a certain level of sales by a certain date? If you are forecasting procurement needs for a warehouse, what is the chance of running out of inventory by a certain date? If you are a macroeconomist forecasting GDP, what are the chances of the economy falling into a recession by a certain date?

Suppose you are tasked with forecasting daily SALES over the next year.

Management has targeted a certain level of SALES and wants to know when that target will be hit. You can use the forecast uncertainty produced by your model to generate the following chart:

The vertical axis is the chance of hitting the SALES target by a certain date (in this case, days into the next year). So, 160 days into the year, there is a 10% chance of hitting the sales target.

By day 192, a month later, the chance has grown to 30%. And by day 218, there is a 50/50 chance the sales target will be reached.

Stating these chances in terms of odds may be an easier way to present this:

By day 160, the odds of hitting the target would be 9 to 1. By day 192 it would be a little over 2 to 1. And by day 218, it would be 1 to 1…a flip of the coin.

Bottom line

Uncertainty is a fact of life and your forecasts will be “wrong.”

But quantifying how wrong they can be will go a long way towards making them “useful.”

quantify forecast uncertainty and make your forecast useful

Part 1 – Practical Time Series Forecasting – Introduction

Part 2 – Practical Time Series Forecasting – Some Basics

Part 3 – Practical Time Series Forecasting – Potentially Useful Models

Part 4 – Practical Time Series Forecasting – Data Science Taxonomy

Part 5 – Practical Time Series Forecasting – Know When to Hold ’em

Part 6 – Practical Time Series Forecasting – What Makes a Model Useful?

Part 7 – Practical Time Series Forecasting – To Difference or Not to Difference

Part 8 – Practical Time Series Forecasting – Know When to Roll ’em

Part 9 – Practical Time Series Forecasting – Meta Models

The post Practical Time Series Forecasting – Bounding Uncertainty appeared first on KDD Analytics.

Practical Time Series Forecasting – Meta Models

KDD — Mon, 05 Feb 2018 01:47:38 +0000

“There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.”
― John Kenneth Galbraith

After an extensive model building and vetting process, along the lines we previously discussed here and here, the practical forecaster may still be left with several strong performing models.

These models perform similarly in the holdout sample tests. They retain their statistical properties when recalibrated on the full historical sample. But they yield different forecast paths over the forecast horizon.

Any one of the models could be easily defended. But the fact that the models yield different forecasts should make the forecaster pause.

An example

Below is an example of 3 short-run monthly forecasts:

The 3 models perform similarly in the holdout sample. One of the models is a least squares model. The other 2 are ARIMA models.

One model produces a steeply declining forecast. Another a slightly declining forecast. The third model produces an increasing forecast.

What should the forecaster do?

How can this happen?

Models are just that – models. They are abstractions from reality. And no single model will “fit” the holdout sample perfectly.

Two models, especially of different types (e.g. least squares vs. ARIMA), could have very similar holdout sample performance but differ dramatically in their forecast over the forecast horizon.

The holdout sample MAPE (mean absolute percentage error) could be very similar for these models. But the MAPE is an average error across the holdout sample. And the models could have arrived at their MAPEs by focusing on different aspects of the time series in the holdout sample.

Projecting these differences into the forecast horizon can result in very different forecasts.

Solutions

When there is no clear “champion” model, one solution is to combine the forecasts into one. We call this a “meta” forecast.

There are several ways this can be accomplished.

Checkpoint

But first, check to make sure the models to be combined are not “nested.” That is, one model is not a subset of another. If models are nested there usually is no advantage to combining their forecasts into a meta forecast.

In fact, a meta forecast will more likely be superior the greater the differences between the constituent models.

A meta forecast based on a least squares model and an ARIMA model will likely yield a smaller forecast error than that associated with either of the two models. However, if the two models were both least squares models, the superiority of a meta forecast might be questionable (Granger, 1989).

Solution 1

The simplest approach to arriving at a meta forecast is to simply average the forecasts of the individual models.

This essentially assumes that each model’s forecast is equally important in the meta forecast (i.e. receives equal weighting). This is a quick and uncomplicated way to generate a meta forecast.

Solution 2

Another approach makes use of each model’s holdout sample performance measures of forecast accuracy and bias. A weighting for each model’s forecast can be calculated using each model’s MAPE and MPE (mean percentage error) relative to that of all the models combined.

The meta forecast would then be a weighted average of the individual model forecasts. Models with lower MAPE and MPE would receive higher weights and contribute more to the meta forecast.

Solution 3

A third approach is to use regression to estimate the weights.

Using the holdout sample, or if too small, the full sample, regress the actual value on the forecasted value from each model. The goal is to find a regression with no constant and all regression coefficients positive and statistically significant.

The regression coefficients should then sum very close to one. These coefficients then become the weights by which forecasts are combined into a meta forecast (see Wilson and Keating).

Back to our example

The forecaster could go with candidate 3 since it “splits the difference.” However, the forecaster is still left with the task of defending why the other two equally plausible models were not chosen.

Alternatively, a meta forecast can be used. As an example, we created a simple average forecast across the 3 candidate models. As discussed above, this assumes an equal weighting across the 3 short-run forecasts. A more sophisticated approach would have been to estimate the weights using a regression approach.

Not surprisingly, the meta forecast is quite like the essentially flat forecast of candidate 3 (which lies almost half way between candidate 1’s and 2’s forecast). But not all cases will be like this.

If a regression approach to estimating the weights was used, the meta forecast could be quite different from that of candidate 3.

Yes, the meta forecast will lie between the two forecast extremes. But the assumed or estimated weights will dictate where the meta forecast will lie.

Bottom line

Combining forecasts from equally strong models is intuitively appealing since each model has its strengths and weaknesses.

Combining models’ forecasts in a complementary fashion should lead to more robust and accurate short-run forecasts.

Combine forecasts into a meta forecast for a more accurate forecast

Part 1 – Practical Time Series Forecasting – Introduction

Part 2 – Practical Time Series Forecasting – Some Basics

Part 3 – Practical Time Series Forecasting – Potentially Useful Models

Part 4 – Practical Time Series Forecasting – Data Science Taxonomy

Part 5 – Practical Time Series Forecasting – Know When to Hold ’em

Part 6 – Practical Time Series Forecasting – What Makes a Model Useful?

Part 7 – Practical Time Series Forecasting – To Difference or Not to Difference

Part 8 – Practical Time Series Forecasting – Know When to Roll ’em

The post Practical Time Series Forecasting – Meta Models appeared first on KDD Analytics.

Practical Time Series Forecasting – Know When to Roll ‘em

KDD — Mon, 29 Jan 2018 01:33:32 +0000

“Prediction is very difficult, especially if it’s about the future.”
― Niels Bohr, physicist

Holdout samples are a key component to estimating a “useful” forecasting model. Set aside data at least equal in length to your forecast horizon (“holdout sample”). Build your models on the remaining data (“modeling sample”). And compare the candidate models’ forecast performance over the holdout sample.

At a minimum, a single holdout sample should be used.

But to get a better sense of a model’s future performance, consider using multiple holdout samples.

This guards against basing your model on a holdout sample that is unrepresentative of the overall characteristics of the time series.

One way to achieve this is to use “rolling” holdout samples.

Rolling analysis

A rolling analysis of a time series is generally used to test a model’s stability. That is, are a model’s parameters stable across time or do they change, especially in a systematic way?

This is important for a forecasting model. We don’t want a forecasting model whose parameters are changing during the forecast horizon in an unexpected (i.e. unmodeled) manner.

Suppose our forecast horizon is 6 months.

Under a single holdout sample, we would set aside the last 6 months of data as the holdout sample. Then using the remaining data as the modeling sample, estimate models, forecast over the single holdout sample and compare the models’ performance.

This will help narrow down the pool of candidate models.

Rolling holdout samples

But under a rolling holdout approach, also called “time series cross-validation,” we would set aside a longer sample of data, say, the last 12 months. Then:

Step 1: Estimate a model and forecast over the first 6-months of this 12-month period (“roll 1”);

Step 2: Then add one 1 month to the tail-end of the estimation sample, recalibrate the model, and forecast over the subsequent 6-months (“roll 2”);

Step 3: Then add another month to the estimation sample, recalibrate and forecast over the subsequent 6-months (“roll 3”);

Step 4: Repeat until there are no more 6-month periods (“rolls”) remaining in the 12-month period.

So, in this example, we would have recalibrated our model 7 times (each with a modeling sample that is one additional month longer than the previous). And we would have made 7 forecasts over the rolling holdout periods.

The last “roll,” it turns out, is the same 6-month period we would have used under a single 6-month holdout sample case. So, we generate the stats for a standard single holdout sample during the course of this rolling holdout approach.

If we are examining multiple candidate models, this process can generate a lot of data. Below is an example of the rolling forecasts for one model.

Summary roll statistics

We could generate a similar chart for every model we are testing. But it is easier to work with measures of forecast accuracy and bias, such as MAPE and MPE.

For each roll forecast, we can calculate the MAPE and MPE and observe how they change across the rolling forecasts.

Are the MAPE and MPE constant? Fluctuate with no apparent trend? Or exhibit some systematic trend?

Doing this for every candidate model we are testing generates charts like this which can quickly show any areas of concern:

In this example, candidate models 18 and 15 may be worth further inspection since their MAPEs are much higher than the rest in a recent roll period (roll 6).

What else makes a model useful?

So, with respect to the guidelines for whittling down a pool of candidate models we listed in an earlier article, we can add the following from a rolling holdout analysis:

Stability – The model’s parameters should retain their statistical significance and not vary too much across the rolling periods; and the model’s residuals should remain “white noise” across the rolls;

Consistency of Performance – The model’s forecast accuracy and bias should not exhibit any strong trends, especially trends in the “wrong” direction (i.e. getting progressively worse) as the more recent time period is approached.

Strong Rolling Holdout Sample Performance – The model’s forecast accuracy and bias, averaged across all the rolls, should be high and low respectively. That is both the average MAPE and MPE should be low.

Benefits of Rolling

The primary benefit of a rolling analysis is that we get to see how a model performs forecast-wise over multiple time spans equal in length to our forecast horizon; instead of relying on performance in just one holdout sample.

A rolling analysis also addresses the issue of a short holdout sample (e.g. short forecast horizon) possibly not being representative of the general character of the time series.

In addition, a rolling analysis can be used as a check for the “best” model chosen using a single holdout sample. That is, would you pick the same model using the rolling holdout approach? If not, why?

In sum, a model that is persistently better at holdout sample forecasting over a longer time frame is likely to be more robust.

So, let ‘em roll!

Part 1 – Practical Time Series Forecasting – Introduction

Part 2 – Practical Time Series Forecasting – Some Basics

Part 3 – Practical Time Series Forecasting – Potentially Useful Models

Part 4 – Practical Time Series Forecasting – Data Science Taxonomy

Part 5 – Practical Time Series Forecasting – Know When to Hold ’em

Part 6 – Practical Time Series Forecasting – What Makes a Model Useful?

Part 7 – Practical Time Series Forecasting – To Difference or Not to Difference

The post Practical Time Series Forecasting – Know When to Roll ‘em appeared first on KDD Analytics.