<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Statistics Archives - KDD Analytics</title>
	<atom:link href="https://www.kddanalytics.com/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.kddanalytics.com/category/statistics/</link>
	<description>Data to Decisions</description>
	<lastBuildDate>Mon, 28 Jun 2021 00:37:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>

<image>
	<url>https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2016/08/cropped-imageedit_1_7939659602.png?fit=32%2C32&#038;ssl=1</url>
	<title>Statistics Archives - KDD Analytics</title>
	<link>https://www.kddanalytics.com/category/statistics/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">114932494</site>	<item>
		<title>If odds are not odd, what about odds ratios?</title>
		<link>https://www.kddanalytics.com/if-odds-are-not-odd-what-about-odds-ratios/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 28 Jun 2021 00:37:12 +0000</pubDate>
				<category><![CDATA[Categorical Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[case_control]]></category>
		<category><![CDATA[meta-analysis]]></category>
		<category><![CDATA[odds]]></category>
		<category><![CDATA[odds ratio]]></category>
		<category><![CDATA[propective]]></category>
		<category><![CDATA[retropsective]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=2022</guid>

					<description><![CDATA[<p>What are the odds of developing a brain tumor from long-term use of cell phones? This is an evolving area of research.  Some studies have found an association and others have not. But two recent meta-analyses suggest that the odds are about 33 to 44% greater due to long-term cell phone usage. Got your attention?&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/if-odds-are-not-odd-what-about-odds-ratios/">If odds are not odd, what about odds ratios?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>What are the <strong>odds of developing a brain tumor from long-term use of cell phones</strong>?</p>
<p>This is an evolving area of research.  Some studies have found an association and others have not.</p>
<p>But two recent <strong><em><a href="https://en.wikipedia.org/wiki/Meta-analysis" target="_blank" rel="noopener">meta-analyses</a></em></strong> suggest that the odds are about <strong>33 to 44%</strong> <strong>greater</strong> due to long-term cell phone usage.</p>
<p>Got your attention?</p>
<p>“But what does this do to my odds of developing a brain tumor?” you may ask.</p>
<p>Before we answer that, we need to explain how the meta-analyses derive this 33 to 44% figure.  Which introduces us to <strong><em>odds ratios</em></strong>.</p>
<h2>Case-control studies</h2>
<p>Studies of the association between cell phone usage and brain tumor are typically <strong><em>case-control</em></strong> studies.</p>
<p>Such studies are <em><strong>retrospective</strong></em>, as opposed to <em><strong>prospective</strong></em>.<a href="#_ftn1" name="_ftnref1">[1]</a> They combine a sample of patients (<strong><em>cases</em></strong>) already diagnosed with a brain tumor with a random sample of non-patients (<strong><em>controls</em></strong>) drawn from the general population. Study investigators match controls to each case based on key demographics such as sex, age, and region.</p>
<p>The studies then measure and test for the existence of an association between <strong><em>exposure</em></strong> (cell phone usage) and <strong><em>outcome</em></strong> (brain tumor).<a href="#_ftn2" name="_ftnref2">[2]</a></p>
<p>Typically, these case-control studies report their <strong>estimated effects</strong>, not in terms of odds, but in terms of <strong>odds ratios</strong>.</p>
<p>So, what is an odds ratio?</p>
<h2>Odds ratios</h2>
<p>An odds ratio is a <strong>measure of association strength.</strong> In this case, between cell phone usage and the diagnosis of a brain tumor.</p>
<p>As an example, we can use the results from one of the <strong><em>high-quality</em></strong> <strong><a href="https://pubmed.ncbi.nlm.nih.gov/16023098/" target="_blank" rel="noopener">studies</a></strong> used in the meta-analyses mentioned above to show how odds ratios are calculated.<a href="#_ftn3" name="_ftnref3">[3]</a></p>
<p>The data shown in the following table are from a case-control study conducted in Sweden between 2000 and 2003.<a href="#_ftn4" name="_ftnref4">[4]</a>  The data are for long term cell phone usage (&gt;= 10 years). The reference category is no cell phone usage.<a href="#_ftn5" name="_ftnref5">[5]</a></p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="alignnone size-full wp-image-2023 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-ratios.png?resize=399%2C120&#038;ssl=1" alt="cell phones and brain tumors" width="399" height="120" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-ratios.png?w=399&amp;ssl=1 399w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-ratios.png?resize=300%2C90&amp;ssl=1 300w" sizes="auto, (max-width: 399px) 100vw, 399px" /><br />
In an earlier <strong><a href="https://www.kddanalytics.com/odds-and-probability-two-sides-of-the-same-coin/">article</a></strong> we learned that the odds of an event occurring are the number of events divided by the number of non-events.</p>
<p>Thus, the <strong>odds of a long-term cell phone user in this sample being diagnosed with a brain tumor</strong> is (16 / 232) or 0.069; about 1 to 14.</p>
<p>The <strong>odds of a non-cell phone user being diagnosed with a brain tumor</strong> is (18 / 674) or 0.027; about 1 to 37.</p>
<p>The <strong>odds ratio is simply the ratio of the two odds</strong>:  (0.069 / 0.027) or 2.582.</p>
<p>So, the odds of a long-term cell phone user being diagnosed with a brain tumor are <strong>2.582 times greater compared to a non-cell phone user</strong>.</p>
<p>Alternatively, this can be stated in <strong>terms of a % difference</strong>. The odds of a long-term cell phone user being diagnosed with a brain tumor are <strong>158% greater compared to a non-cell phone user</strong> ((2.582 – 1) * 100).</p>
<p>That is a pretty large effect.<a href="#_ftn6" name="_ftnref6">[6]</a></p>
<h2>Meta-studies</h2>
<p><strong>Now this is just one study</strong>.  The two meta-studies alluded to above each combined the results of 7 different, high-quality studies.</p>
<p>They found that the overall odds (across the studies) of a long-term cell phone user (&gt;= 10 years) being diagnosed with a brain tumor (any tumor type) are <a href="https://pubmed.ncbi.nlm.nih.gov/28213724/" target="_blank" rel="noopener"><strong>33%</strong></a> and (with respect to <a href="https://www.mayoclinic.org/diseases-conditions/glioma/symptoms-causes/syc-20350251" target="_blank" rel="noopener"><strong>glioma</strong></a>, a common type of tumor) <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5417432/" target="_blank" rel="noopener"><strong>44%</strong></a> <strong>greater compared to a non-cell phone user</strong>.<a href="#_ftn7" name="_ftnref7">[7]</a></p>
<p>These meta-studies found no effect due to cell phone usage over a shorter period (i.e., &lt; 10 years).</p>
<p>So, it appears that the risk, if it exists, is associated with long-term usage.  Moreover, using a cell phone on the same side of the head is associated with <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5417432/" target="_blank" rel="noopener"><strong>46%</strong></a> greater odds of developing a glioma on that side of the head.<a href="#_ftn8" name="_ftnref8">[8]</a></p>
<h2>Odds of developing a brain tumor</h2>
<p>So, <strong>back to our original question</strong>.  What are the odds of developing a brain tumor from long term cell phone usage?</p>
<p>The odds of developing a brain tumor among the general population is very low to start with.  Annual <strong><a href="https://seer.cancer.gov/statfacts/html/brain.html" target="_blank" rel="noopener">incidence</a></strong> in the US (2018) is 6.5 per 100,00 or 0.0065%.  In terms of odds, this is about 1 to 15,000.</p>
<p>So, a 44% increase in the odds would mean 9.4 per 100,000 or about 1 to 10,000.  Still quite low.<a href="#_ftn9" name="_ftnref9">[9]</a></p>
<p>As one <a href="https://academic.oup.com/jnci/article/103/15/1146/2516666" target="_blank" rel="noopener"><strong>researcher</strong></a> put it, “Your chance of being hurt by distracted driving because you’re using your cell phone wipes out the risk of getting cancer.”</p>
<p>However, in 2011 the World Health Organization’s International Agency for Research on Cancer (<a href="https://iarc.who.int/" target="_blank" rel="noopener"><strong>IARC</strong></a>) <strong><u>did</u> classify</strong> cell phones as a Group 2B <strong>carcinogen</strong> (i.e., possibly causes cancer).</p>
<p>And there continues to be a healthy debate in both the statistical and public arenas.</p>
<p><a href="https://ehtrust.org/scientific-documentation-cell-phone-radiation-associated-brain-tumor-rates-rising/" target="_blank" rel="noopener"><strong>Studies</strong></a> are continuing to be released which purportedly finding evidence that recent increasing rates in <a href="https://en.wikipedia.org/wiki/Glioblastoma" target="_blank" rel="noopener"><strong>glioblastomas</strong></a>, an aggressive type of cancer, are tied to cell phone usage.</p>
<p><a href="https://www.forbes.com/sites/geoffreykabat/2017/12/23/are-brain-cancer-rates-increasing-and-do-changes-relate-to-cell-phone-use/" target="_blank" rel="noopener"><strong>Skeptics</strong></a> argue that changes in WHO classification of what is considered a glioblastoma may be responsible for any uptick in brain tumor incidence. And that the large, increased risk reported by studies, like the meta-studies discussed above, are <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057143/" target="_blank" rel="noopener"><strong>inconsistent</strong></a> with the historical trend in brain tumor incidence.<a href="#_ftn10" name="_ftnref10">[10]</a></p>
<p><strong>As we said at the outset, this is an evolving area of research, with lots of issues to untangle.</strong></p>
<p>One thing to keep in mind, though, is <strong>who is funding the research</strong>.  A topic we will cover in a later article.</p>
<h2>We have odds ratios to thank</h2>
<p><strong>Back to the main point of this article.</strong></p>
<p>Odds facilitate the measurement of the <strong><span style="text-decoration: underline;">relative</span> likelihood of events</strong>.  Epidemiological studies that are retrospective, commonly use the <strong>odds ratio as this relative measurement of association strength</strong>.</p>
<p>So, the next time you hear that your favorite dietary choice increases your chances of developing cancer, it is probably the result of that not-so-oddity, the odds ratio.</p>
<p>&nbsp;</p>
<p><a href="#_ftnref1" name="_ftn1">[1]</a> Prospective cohort studies have also been used (i.e., studies which track subjects over time).  See <strong><a href="https://www.cognibrain.com/retrospective-vs-prospective-study-advantages-types-and-differences/" target="_blank" rel="noopener">here</a></strong> for a summary of the advantages and disadvantages of retrospective and prospective studies.</p>
<p><a href="#_ftnref2" name="_ftn2">[2]</a> Exposure is determined by answers to a lengthy questionnaire. Hence, one of the criticisms levied against case-control studies is respondent <strong><a href="https://catalogofbias.org/biases/recall-bias/" target="_blank" rel="noopener">recall bias</a></strong>. That is, whether respondents accurately recall their cell phone usage, particularly over a long period of time.</p>
<p><a href="#_ftnref3" name="_ftn3">[3]</a> Studies are <a href="http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp" target="_blank" rel="noopener"><strong>graded</strong></a> on a quality scale considering such factors as <strong>selection</strong> of cases and controls, <strong>comparability</strong> of cases and controls based on study design, and proper assessment/measurement of <strong>exposure</strong>.</p>
<p><a href="#_ftnref4" name="_ftn4">[4]</a> The results shown in the table are taken from a <a href="https://pubmed.ncbi.nlm.nih.gov/28213724/" target="_blank" rel="noopener"><strong>meta-study</strong></a> which considered this <a href="https://pubmed.ncbi.nlm.nih.gov/16023098/" target="_blank" rel="noopener"><strong>Hardell et al</strong></a> (2006) study.</p>
<p><a href="#_ftnref5" name="_ftn5">[5]</a> As cell phone usage becomes more ubiquitous, and fewer people who have never used a cell phone are available in the population, the exposure will need to be increasingly measured in terms of levels/frequency of usage.</p>
<p><a href="#_ftnref6" name="_ftn6">[6]</a> The additional risk derived using an odds ratio is closely related to the concept of <strong><em>efficacy</em></strong>, which is derived directly from the concept of <strong><em>relative risk</em></strong> (ratio of probabilities). We covered efficacy in an earlier <strong><a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/" target="_blank" rel="noopener">article</a></strong>. Epidemiologists typically use relative risk to measure association strength in prospective (cohort) studies; odds ratios in case-control studies.</p>
<p><a href="#_ftnref7" name="_ftn7">[7]</a> Meta-studies start with a larger number of studies.  They then cull studies from the final sample for various reasons, such as data availability and the quality grade they receive.</p>
<p><a href="#_ftnref8" name="_ftn8">[8]</a> All these studies on brain tumors controlled for whether cell phones were being used next to users’ heads.</p>
<p><a href="#_ftnref9" name="_ftn9">[9]</a> <strong><a href="https://seer.cancer.gov/statfacts/" target="_blank" rel="noopener">See</a></strong> for US cancer incidence rates as of 2018.</p>
<p><a href="#_ftnref10" name="_ftn10">[10]</a> See also <a href="https://www.forbes.com/sites/geoffreykabat/2017/12/27/what-the-best-u-s-data-have-to-say-about-brain-cancer-rates/" target="_blank" rel="noopener"><strong>Geoffrey Kabat</strong></a> (2017).</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/if-odds-are-not-odd-what-about-odds-ratios/">If odds are not odd, what about odds ratios?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2022</post-id>	</item>
		<item>
		<title>Odds and probability&#8230;two sides of the same coin</title>
		<link>https://www.kddanalytics.com/odds-and-probability-two-sides-of-the-same-coin/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Fri, 04 Jun 2021 16:59:03 +0000</pubDate>
				<category><![CDATA[Categorical Data Analysis]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[odds]]></category>
		<category><![CDATA[probability]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=1993</guid>

					<description><![CDATA[<p>What are the lifetime odds of dying from being hit by a meteorite? 1 in 1,600,000. Yep, not very likely.  You are much more likely to die from a dog attack (1 in 86,781) or from a lightning strike (1 in 138,849). But why odds? Why not express these likelihoods in terms of probabilities?  Seems&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/odds-and-probability-two-sides-of-the-same-coin/">Odds and probability&#8230;two sides of the same coin</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>What are the lifetime odds of dying from being hit by a <a href="https://www.tulane.edu/~sanelson/Natural_Disasters/impacts.htm" target="_blank" rel="noopener"><strong>meteorite</strong></a>?</p>
<p>1 in 1,600,000.</p>
<p>Yep, not very likely.  You are much more likely to die from a <a href="https://injuryfacts.nsc.org/all-injuries/preventable-death-overview/odds-of-dying/" target="_blank" rel="noopener"><strong>dog attack</strong></a> (1 in 86,781) or from a <strong><a href="https://injuryfacts.nsc.org/all-injuries/preventable-death-overview/odds-of-dying/" target="_blank" rel="noopener">lightning strike</a></strong> (1 in 138,849).</p>
<p>But why odds?</p>
<p>Why not express these likelihoods in terms of probabilities?  Seems like a more natural way to express the chance of an event occurring, doesn’t it?</p>
<p>Odds, however, are commonly used to express event risk.  And of course, the chances of winning a sporting event.</p>
<p>As we write, the odds of the <strong><a href="https://www.mlb.com/dodgers" target="_blank" rel="noopener">Los Angeles Dodgers</a></strong> repeating as World Series champions in 2021 are 1 in 3.<a href="#_ftn1" name="_ftnref1">[1]</a></p>
<h2>So, what are odds?</h2>
<p style="text-align: center;"><strong>The number of times an event occurs divided by the number of times it does not occur</strong>.</p>
<p>In the case of a meteorite strike, for every person that dies, 1.6 million do not.  In the case of the Dodgers, we would expect the Dodgers to win 1 World Series for every 3 they lose.</p>
<p>But this still begs the question, <strong>why odds and not probabilities?</strong></p>
<h2>Odds and probabilities</h2>
<p>It is true that the probability of a low likelihood event is so small that stating it as a % requires a lot of zeros after the decimal (0.0000625% in the case of dying from a meteorite strike).</p>
<p>But that is not an insurmountable objection. For example, the risk of disease is often expressed in terms of rates per 100,000 to make the chances of low likelihood events easier to comprehend. Or we could state the probability of the non-event&#8230;not dying from a meteorite strike (99.994%).</p>
<p>A more important reason for using odds is that they facilitate <strong>multiplicative comparisons</strong>.</p>
<p>A simple example makes clear how probabilities can fall short.</p>
<p>Suppose the probability that Beth will go out to dinner this weekend is 75%. We <strong>cannot</strong> say, then, that the probability of Jose doing the same is 3 times that of Beth’s probability.</p>
<p>Why?  Probabilities are constrained to lie between 0 and 1. And 3 * 0.75 &gt; 1.0.</p>
<p>So, what do we do?  Enter odds.</p>
<h3>Odds are unconstrained</h3>
<p>Odds are only bounded on the low end, by 0.  Let&#8217;s return to Beth and Jose.</p>
<p>The odds of Beth going out to dinner are 3 or 3/1.  Why 3/1?</p>
<p>Remember odds are the ratio of the events to non-events. Beth is 75% likely to go out. So, if she is faced with 4 opportunities to go out, she will do so 3 times.  In other words, she will go out (event) 3 times for every time she stays home (non-event).  3 to 1.</p>
<p>Now, if Jose is 3 times as likely to go out as Beth, his odds are simply 3 * 3 or 9.  Equivalently, we can express his odds as 9 to 1 or 9/1.</p>
<p>On the odds scale, odds can be 2, 10, 50 times greater…there is no upper limit. And this makes them very useful when we wish to compare the <strong>relative likelihood</strong> of events occurring.</p>
<h3>Two sides of the same coin</h3>
<p>It turns out that if we are still interested in the probability, we can easily derive it from the odds.  <strong>Odds and probability are two sides of the same coin</strong>.</p>
<p>Odds (o) are related to probability (p) by the following:</p>
<p style="text-align: center;"><em><strong>o = p / (1 &#8211; p) = (probability of event / probability of non-event)</strong></em></p>
<p>Rearranging we find the “other side of the coin” (for an event):</p>
<p style="text-align: center;"><em><strong>p = o / (1 + o) = (odds of event) / (1 + odds of event)</strong></em></p>
<p>So, in the case of Beth and Jose we get:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-2083 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/05/Odds-to-probability.png?resize=429%2C103&#038;ssl=1" alt="odds probability same coin different sides" width="429" height="103" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/05/Odds-to-probability.png?w=429&amp;ssl=1 429w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/05/Odds-to-probability.png?resize=300%2C72&amp;ssl=1 300w" sizes="auto, (max-width: 429px) 100vw, 429px" /></p>
<p>The relationship between odds and probability is shown graphically below.</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="aligncenter wp-image-1995 " src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-vs-probability.png?resize=683%2C451&#038;ssl=1" alt="what are odds" width="683" height="451" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-vs-probability.png?w=856&amp;ssl=1 856w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-vs-probability.png?resize=300%2C198&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Odds-vs-probability.png?resize=768%2C507&amp;ssl=1 768w" sizes="auto, (max-width: 683px) 100vw, 683px" /></p>
<p>As the odds increase, the probability also increases but in a non-linear manner.  As shown above, the probability &#8220;increases at a decreasing rate&#8221; and approaches 1.0 “asymptotically” (i.e., as the odds get very large, the probability approaches but never quite reaches 1.0).<a href="#_ftn2" name="_ftnref2">[2]</a></p>
<p>But any finite odds will map to a probability between 0 and 1.</p>
<h2>Odds are preferred</h2>
<p>When comparing the relative chances of events (or sports teams), odds are the preferred way of expressing how much more likely one event is over another.  We can always derive the associated probability.  But since odds are unconstrained,  there is no issue with saying the Los Angeles Dodgers are <a href="https://www.oddsshark.com/mlb/world-series-odds" target="_blank" rel="noopener"><strong>11 times</strong></a> as likely (as of May 31, 2021) to win the World Series in 2021 than the Chicago Cubs.</p>
<p>So, the next time someone tells you the odds of rain during your camping trip this weekend are 5 to 2, you might want to sleep in a tent.</p>
<p>&nbsp;</p>
<p><a href="#_ftnref1" name="_ftn1">[1]</a> As of May 31, 2021, the reported <a href="https://www.oddsshark.com/mlb/world-series-odds" target="_blank" rel="noopener"><strong>odds</strong></a> of the Dodgers repeating are 3 to 1 or <a href="https://www.oddsshark.com/tools/odds-calculator" target="_blank" rel="noopener"><strong>3/1</strong></a>.  In the betting world, this is referred to as <a href="https://www.sportsbettingdime.com/guides/betting-101/how-to-read-sports-odds/" target="_blank" rel="noopener"><strong>fractional odds</strong></a>. The <strong>number on the left</strong> or numerator is typically the <strong>number of times</strong> the team is <strong>expected to lose</strong>. 3/1 yields an implied probably of losing 3 times out of 4 or 75%.  Thus, the probability of the Dodgers repeating are (1 – 0.750) or 25%.  Expressing as odds yields 1/3.</p>
<p><a href="#_ftnref2" name="_ftn2">[2]</a> In the limit, if the odds = infinity, then probability = 1.</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/odds-and-probability-two-sides-of-the-same-coin/">Odds and probability&#8230;two sides of the same coin</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1993</post-id>	</item>
		<item>
		<title>Curse of Big Data</title>
		<link>https://www.kddanalytics.com/curse-of-big-data/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 03 May 2021 11:31:59 +0000</pubDate>
				<category><![CDATA[Categorical Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[efficacy]]></category>
		<category><![CDATA[hypothesis testing]]></category>
		<category><![CDATA[practical significance]]></category>
		<category><![CDATA[statistical significance]]></category>
		<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=1961</guid>

					<description><![CDATA[<p>“Big data.” We checked in with Google search trends recently. Appears that “Big Data” has lost its luster search-wise…started trending down about 4 years ago. Nowadays, everything is big data? Implications of big data However, this does not mean we should lose sight of certain statistical implications associated with being “big”. Yes, large amounts of&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/curse-of-big-data/">Curse of Big Data</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“Big data.”</p>
<p>We checked in with <strong><a href="https://trends.google.com/trends/?geo=US" target="_blank" rel="noopener">Google</a></strong> search trends recently. Appears that “Big Data” has lost its luster search-wise…started trending down about 4 years ago.</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="alignnone size-large wp-image-1963" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=1024%2C673&#038;ssl=1" alt="curse of big data" width="1024" height="673" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=1024%2C673&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=300%2C197&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=768%2C505&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?w=1203&amp;ssl=1 1203w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></p>
<p>Nowadays, everything is big data?</p>
<h2>Implications of big data</h2>
<p>However, this does not mean we should lose sight of certain <strong>statistical implications</strong> associated with being “big”. Yes, large amounts of data can help us estimate relationships (<strong>effects</strong>) with a high degree of precision.</p>
<p>And help us uncover low occurrence events such as the blood clotting cases associated with the <strong><a href="https://www.nytimes.com/2021/04/16/health/johnson-vaccine-blood-clot-case.html" target="_blank" rel="noopener">Johnson &amp; Johnson</a></strong> COVID-19 vaccine.</p>
<p>But massive amounts of data can reveal patterns that are not always meaningful or happen by <a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/the-curse-of-big-data" target="_blank" rel="noopener"><strong>chance</strong></a>.</p>
<p>Additionally, from a <strong><a href="https://en.wikipedia.org/wiki/Statistical_inference" target="_blank" rel="noopener">statistical inference </a></strong>perspective, with big data, <strong>even small, uninteresting effects can be statistically significant</strong>.</p>
<p>This has important implications for inferential conclusions about the associations we are studying.</p>
<p>And it does not take all that much data for this to happen.</p>
<h3>Small clinical trial example</h3>
<p>As an example, consider the following hypothetical results from a clinical trial of a &#8220;common&#8221; cold vaccine:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1969 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Small-Sample.png?resize=391%2C159&#038;ssl=1" alt="curse of big data" width="391" height="159" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Small-Sample.png?w=391&amp;ssl=1 391w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Small-Sample.png?resize=300%2C122&amp;ssl=1 300w" sizes="auto, (max-width: 391px) 100vw, 391px" /></p>
<p>The table shows the number of subjects who had both a positive outcome (no infection) and negative outcome (infection) across the two types of treatment. A standard statistical test of association, the <a href="https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test" target="_blank" rel="noopener"><strong>Pearson chi-squared</strong></a>, indicates <strong>we cannot say there is any difference in outcomes</strong> across the two treatment types.</p>
<p>That is, we <strong>cannot reject the &#8220;null&#8221; hypothesis of no association</strong> at the 95% level of confidence (i.e., <em>X<sup>2 </sup>=</em> 0.024).</p>
<p>The <strong>strength of the association</strong>, or <strong><em>effect size,</em></strong> is obtained from the <a href="https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section5.html" target="_blank" rel="noopener"><strong>ratio of relative risks</strong></a>.</p>
<p>The probability of a vaccinated subject getting sick is (24 / 59) or 0.407 (40.7%) while that for the placebo group is (29 / 69) or 0.420 (42.0%).</p>
<p>So the relative risk ratio is (0.407 / 0.420) or 0.968.<a href="#_ftn1" name="_ftnref1">[1]</a></p>
<p>Thus, we would expect that when applied to the population, <strong><span style="text-decoration: underline;">under the same conditions as the study</span></strong>, there would be 3.2% fewer infections among those who received the vaccine (i.e., (1 &#8211; 0.968) *100)).</p>
<p>This 3.2% is known as the <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/" target="_blank" rel="noopener"><em><strong>efficacy rate</strong></em></a> of the vaccine.</p>
<p>The 95% confidence interval for the relative risk ratio is wide (i.e., 0.639 to 1.465) indicating a lack of precision in the <strong><em>point estimate</em></strong> of 0.968.</p>
<p>The study investigators conclude that the effect of the vaccine is <strong>neither <span style="text-decoration: underline;">statistically</span> nor <a href="https://statisticsbyjim.com/hypothesis-testing/practical-statistical-significance/" target="_blank" rel="noopener"><em>practically</em></a> significant</strong>.</p>
<p>Aside from its statistical insignificance, an efficacy rate of just 3.2% is not nearly large enough to justify starting production of the vaccine.</p>
<h3>Large clinical trial example</h3>
<p>Contrast this with the following study results based on a much larger sample of 44,800 subjects:<a href="#_ftn2" name="_ftnref2">[2]</a></p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1970 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Large-Sample.png?resize=398%2C155&#038;ssl=1" alt="curse of big data" width="398" height="155" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Large-Sample.png?w=398&amp;ssl=1 398w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Large-Sample.png?resize=300%2C117&amp;ssl=1 300w" sizes="auto, (max-width: 398px) 100vw, 398px" /></p>
<p>The Pearson chi-squared statistic (<em>X<sup>2</sup></em>) is now 8.375. Thus, the <strong>hypothesis of no association <u>can be rejected</u> </strong>at the 95% level of confidence.</p>
<p>And the <strong>95% confidence interval </strong>for the relative risk ratio is<strong> much narrower indicating a much higher level of precision</strong> (i.e., 0.947 to 0.990).<a href="#_ftn3" name="_ftnref3">[3]</a></p>
<p>The study investigators now conclude that there is a <strong>statistically significant</strong> association between receiving the vaccine and avoiding a cold infection (positive outcome).</p>
<p><strong>But, </strong>the <strong>relative risk ratio</strong> of a positive outcome from receiving the vaccine is<strong> identical </strong>to that obtained from the smaller study,<strong> 0.968. </strong></p>
<p>Implying the <strong>efficacy rate is also the same, 3.2%</strong>.</p>
<h2>Practical vs statistical significance</h2>
<p>What are we to make of this?</p>
<p>From the perspective of <strong>effect size</strong>, do the larger study results carry more weight <strong>simply because</strong> the hypothesis of no association can be rejected? Even though the <strong><em>practical </em>significance has remained the same</strong>?</p>
<p>We can turn a very small, 3.2% effect into a <span style="text-decoration: underline;"><strong>statistically</strong></span> significant effect by simply increasing the sample size.</p>
<p>But does this <strong>change</strong> the <strong><span style="text-decoration: underline;">practical</span> </strong>significance of the 3.2%?</p>
<p><span style="font-size: 14pt;"><strong><span style="font-size: 12pt;">No</span>.</strong></span></p>
<p>If 3.2% was deemed by the study investigators to be <strong>practically insignificant</strong>, it<strong> remains practically insignificant.</strong> Despite the larger sample size and despite it now being statistically significant.<a href="#_ftn4" name="_ftnref4">[4]</a></p>
<h2>A curse of data &#8220;bigness&#8221;</h2>
<p style="text-align: center;"><strong>With a large enough sample, everything is statistically significant, <span style="text-decoration: underline;">even associations that are practically not significant or very interesting</span>.</strong></p>
<p>The implication is that rather than focusing on hypothesis testing as sample sizes increase, the focus should <strong>shift. Towards</strong> the<strong> size of the estimated effect</strong>, whether the<strong> estimated effect is “practically” important,</strong> and <strong>“sensitivity analysis”</strong> (i.e., how does the estimated effect change when <em><strong>control variables</strong></em> are added and dropped).<a href="#_ftn5" name="_ftnref5">[5]</a></p>
<p><strong>Confidence intervals</strong> can and should play a role. But they will get narrower and narrower as sample sizes grow. And everything within the confidence interval could still be deemed not practically important.</p>
<p>In sum, <strong>as data get bigger</strong> (and it does not take massive amounts of data for this to be an issue), <strong>we need to guard against concluding that a small effect is <span style="text-decoration: underline;">practically</span> significant just because the <a href="https://statisticsbyjim.com/hypothesis-testing/interpreting-p-values/" target="_blank" rel="noopener">p-value</a> is very small</strong> (i.e., the effect is statistically significant).</p>
<p><strong>The curse of big data is still very much with us.</strong></p>
<p>&nbsp;</p>
<p><a href="#_ftnref1" name="_ftn1">[1]</a> A ratio of 1.0 would mean no difference in effect between the treatment types.</p>
<p><a href="#_ftnref2" name="_ftn2">[2]</a> As a point of comparison, the 2020 Moderna and Pfizer COVID-19 vaccines trials consisted of about 30,000 and 40,000 subjects.</p>
<p><a href="#_ftnref3" name="_ftn3">[3]</a> A more complicated technique is used to calculate confidence intervals for actual clinical trial results than used here, which typically result in wider intervals.  For example, in 2020, Moderna <a href="https://www.modernatx.com/covid19vaccine-eua/providers/clinical-trial-data" target="_blank" rel="noopener"><strong>reported</strong></a> an efficacy rate of 94.1% for its COVID-19 vaccine with a 95% confidence interval of 89.3% to 96.8%.</p>
<p><a href="#_ftnref4" name="_ftn4">[4]</a> Since the standard error of the relative risk ratio estimate is based on the cell counts in the <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/" target="_blank" rel="noopener"><strong>contingency table</strong></a>, increasing the size of the sample lowers the standard error, making it more likely we can reject the null hypothesis at a given level of confidence.</p>
<p><a href="#_ftnref5" name="_ftn5">[5]</a> The paper <a href="https://www.galitshmueli.com/system/files/Print%20Version.pdf" target="_blank" rel="noopener"><strong>Too Big to Fail</strong></a> presents a nice discussion of these issues. Additionally, the American Statistical Association released <strong><a href="https://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108" target="_blank" rel="noopener">recommendations</a> </strong>on the reporting of p-values.</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/curse-of-big-data/">Curse of Big Data</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1961</post-id>	</item>
		<item>
		<title>Efficacy vs Effectiveness of the COVID Vaccines…&#8221;tomato, tomahto&#8221;?</title>
		<link>https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Thu, 08 Apr 2021 18:15:48 +0000</pubDate>
				<category><![CDATA[Categorical Data Analysis]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[categorical data]]></category>
		<category><![CDATA[contingency table]]></category>
		<category><![CDATA[COVID]]></category>
		<category><![CDATA[efficacy]]></category>
		<category><![CDATA[relative risk]]></category>
		<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=1939</guid>

					<description><![CDATA[<p>You like potato and I like potahto You like tomato and I like tomahto Potato, potahto, tomato, tomahto Let&#8217;s call the whole thing off But oh, if we call the whole thing off Then we must part And oh, if we ever part then that might break my heart &#8212;Ira Gershwin The eye-popping efficacy rates&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/">Efficacy vs Effectiveness of the COVID Vaccines…&#8221;tomato, tomahto&#8221;?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p style="text-align: center;"><em>You like potato and I like potahto</em><br />
<em>You like tomato and I like tomahto</em><br />
<em>Potato, potahto, tomato, tomahto</em><br />
<em>Let&#8217;s call the whole thing off</em></p>
<p style="text-align: center;"><em>But oh, if we call the whole thing off</em><br />
<em>Then we must part</em><br />
<em>And oh, if we ever part</em><br />
<em>then that might break my heart</em></p>
<p style="text-align: center;"><em>&#8212;Ira Gershwin</em></p>
<p>The eye-popping efficacy rates reported for the Moderna (<a href="https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/Moderna.html"><strong>94%</strong></a>), Pfizer (<a href="https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/Pfizer-BioNTech.html"><strong>95%</strong></a>) and, to a lesser extent, the Johnson &amp; Johnson (<a href="https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/janssen.html"><strong>66%</strong></a>) COVID-19 vaccines have undoubtedly not escaped your attention.</p>
<p>But what is vaccine <em><strong>efficacy</strong></em> and how is it calculated? And how does it differ from vaccine <em><strong>effectiveness</strong></em>?</p>
<h2>Moderna vaccine efficacy</h2>
<p>First, consider efficacy. Using Moderna’s reported clinical trial results as an example, we see that it is a straightforward calculation.</p>
<p>Moderna <strong><a href="https://www.modernatx.com/covid19vaccine-eua/providers/clinical-trial-data">reported</a></strong> results from it&#8217;s COVID-19 vaccine trial in November 2020. The results are shown below in a 2&#215;2 “<a href="https://en.wikipedia.org/wiki/Contingency_table"><strong><em>contingency</em></strong></a>” or “<strong><em>cross-tabulation</em></strong>” table. The columns show the number of subjects who were infected (or not); the rows show the number who received the vaccine (or the placebo). And the cells show the intersection of those two events.</p>
<h4><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1954 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Moderna-COVID-Clinical-Trial-Contingency-Table-v2.png?resize=458%2C164&#038;ssl=1" alt="Efficacy of Moderna COVID vaccine" width="458" height="164" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Moderna-COVID-Clinical-Trial-Contingency-Table-v2.png?w=458&amp;ssl=1 458w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Moderna-COVID-Clinical-Trial-Contingency-Table-v2.png?resize=300%2C107&amp;ssl=1 300w" sizes="auto, (max-width: 458px) 100vw, 458px" /></h4>
<h3>Relative risk</h3>
<p>The <strong>strength of the association, </strong>or the<strong><em> effect size</em>,</strong> between receiving the vaccine and not getting infected is measured by the <em><strong>relative risk</strong></em>.</p>
<p>The <em><strong>probability</strong></em> or <em><strong>risk</strong></em> of a vaccinated subject being infected is 0.08%. That is, (11 / 14,134) or the expected number of events / sum of events and non-events. For a subject receiving the placebo, the probability of infection is higher at 1.31% (i.e., 185 / 14,073).</p>
<p>So, using the placebo group as the reference group, the <em><strong>relative risk</strong></em> is (11 / 14,134) / (185 / 14,073) or 0.059.<a href="#_ftn1" name="_ftnref1">[1]</a></p>
<p>In other words, <strong>the risk of a vaccinated person being infected is 94.1% <span style="text-decoration: underline;">lower</span> compared to a subject who received the placebo</strong> (i.e., (1 – 0.059) * 100)).</p>
<p>It is this calculation of 94.1% that was reported by Moderna as the vaccine&#8217;s <strong><em><a href="https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section6.html">efficacy rate</a></em>.</strong><a href="#_ftn2" name="_ftnref2">[2]</a></p>
<h2>Vaccine effectiveness</h2>
<p>So, what about <em><strong>vaccine effectiveness</strong></em>? The term effectiveness refers to <strong>how the vaccine performs in the real world</strong>.  Efficacy refers to how the vaccine performs under “optimal” conditions of a clinical trial.</p>
<p>Clinical trials are based on a sample of subjects who may not be fully representative of the general population (e.g., all <a href="https://www.verywellhealth.com/comorbidity-5081615"><strong>comorbidities</strong></a> are not controlled for). In addition, the COVID strain that existed in the population during the clinical trial period may not be the same that occurs when the vaccine is released. Also, vaccine transportation, storage and delivery may differ from the more controlled environment of the clinical trial. Thus, the effectiveness of the vaccine may be different from what was found during the clinical trial.</p>
<h3>Studies on COVID vaccine effectiveness</h3>
<p>So, do we have any data yet on the real-world effectiveness of the COVID vaccines? It takes time to collect data, but <strong>we do have some indication that vaccine effectiveness is very high.</strong></p>
<p>An early <a href="https://www.nejm.org/doi/full/10.1056/NEJMoa2101765"><strong>study</strong></a> appeared February 24, 2021 in the New England Journal of Medicine.  The study examined the Pfizer vaccine performance in Israel. The sample was matched data from over 1 million people, half who were vaccinated between December 2020 to February 2021 and half who were not. The results of the study suggest a <strong>symptomatic infection effectiveness rate of 94%</strong> 7+ days after the second dose.</p>
<p>A more recent <a href="https://www.cdc.gov/mmwr/volumes/70/wr/mm7013e3.htm"><strong>study</strong></a> released by the CDC on April 2 examined both the Pfizer and Moderna vaccines.  This study used US data from December 2020 to March 2021. The sample consisted of 3,950 health care personnel, first responders, and other front-line workers.  The study found that the <strong>vaccines were 90% effective against COVID infection</strong> 14+ days after the second dose. <strong>Even 14+ days after the <span style="text-decoration: underline;">first</span> dose the vaccines were 80% effective.</strong></p>
<p>As a point of comparison, according to the <a href="https://www.cdc.gov/flu/vaccines-work/vaccineeffect.htm"><strong>CDC</strong></a>, effectiveness of the annual flu vaccination ranges between 40 and 60%.<a href="#_ftn3" name="_ftnref3">[3]</a></p>
<p><strong>So, the effectiveness rate, after 2 doses of the Pfizer and Moderna vaccines, appears to be very close in magnitude to the efficacy rate.</strong></p>
<p>Very good news indeed!</p>
<p>Tomato, tomahto?</p>
<p>&nbsp;</p>
<p><a href="#_ftnref1" name="_ftn1">[1]</a> A relative risk ratio of 1.0 would mean no difference in effect between the treatment types.</p>
<p><a href="#_ftnref2" name="_ftn2">[2]</a> A summary of efficacy rates across the range of current COVID vaccines can be found <a href="http://www.healthdata.org/covid/covid-19-vaccine-efficacy-summary">here</a>.</p>
<p><a href="#_ftnref3" name="_ftn3">[3]</a> One reason for the range is that the flu strain that is in circulation can differ from what was predicted when the annual flu vaccine was developed earlier in the year.</p>
<p>The post <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/">Efficacy vs Effectiveness of the COVID Vaccines…&#8221;tomato, tomahto&#8221;?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1939</post-id>	</item>
	</channel>
</rss>
