<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>statistics Archives - KDD Analytics</title>
	<atom:link href="https://www.kddanalytics.com/tag/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.kddanalytics.com/tag/statistics/</link>
	<description>Data to Decisions</description>
	<lastBuildDate>Fri, 21 May 2021 18:44:36 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>

<image>
	<url>https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2016/08/cropped-imageedit_1_7939659602.png?fit=32%2C32&#038;ssl=1</url>
	<title>statistics Archives - KDD Analytics</title>
	<link>https://www.kddanalytics.com/tag/statistics/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">114932494</site>	<item>
		<title>Curse of Big Data</title>
		<link>https://www.kddanalytics.com/curse-of-big-data/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Mon, 03 May 2021 11:31:59 +0000</pubDate>
				<category><![CDATA[Categorical Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[efficacy]]></category>
		<category><![CDATA[hypothesis testing]]></category>
		<category><![CDATA[practical significance]]></category>
		<category><![CDATA[statistical significance]]></category>
		<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=1961</guid>

					<description><![CDATA[<p>“Big data.” We checked in with Google search trends recently. Appears that “Big Data” has lost its luster search-wise…started trending down about 4 years ago. Nowadays, everything is big data? Implications of big data However, this does not mean we should lose sight of certain statistical implications associated with being “big”. Yes, large amounts of&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/curse-of-big-data/">Curse of Big Data</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>“Big data.”</p>
<p>We checked in with <strong><a href="https://trends.google.com/trends/?geo=US" target="_blank" rel="noopener">Google</a></strong> search trends recently. Appears that “Big Data” has lost its luster search-wise…started trending down about 4 years ago.</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="alignnone size-large wp-image-1963" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=1024%2C673&#038;ssl=1" alt="curse of big data" width="1024" height="673" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=1024%2C673&amp;ssl=1 1024w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=300%2C197&amp;ssl=1 300w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?resize=768%2C505&amp;ssl=1 768w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Big-Data-Trend.png?w=1203&amp;ssl=1 1203w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></p>
<p>Nowadays, everything is big data?</p>
<h2>Implications of big data</h2>
<p>However, this does not mean we should lose sight of certain <strong>statistical implications</strong> associated with being “big”. Yes, large amounts of data can help us estimate relationships (<strong>effects</strong>) with a high degree of precision.</p>
<p>And help us uncover low occurrence events such as the blood clotting cases associated with the <strong><a href="https://www.nytimes.com/2021/04/16/health/johnson-vaccine-blood-clot-case.html" target="_blank" rel="noopener">Johnson &amp; Johnson</a></strong> COVID-19 vaccine.</p>
<p>But massive amounts of data can reveal patterns that are not always meaningful or happen by <a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/the-curse-of-big-data" target="_blank" rel="noopener"><strong>chance</strong></a>.</p>
<p>Additionally, from a <strong><a href="https://en.wikipedia.org/wiki/Statistical_inference" target="_blank" rel="noopener">statistical inference </a></strong>perspective, with big data, <strong>even small, uninteresting effects can be statistically significant</strong>.</p>
<p>This has important implications for inferential conclusions about the associations we are studying.</p>
<p>And it does not take all that much data for this to happen.</p>
<h3>Small clinical trial example</h3>
<p>As an example, consider the following hypothetical results from a clinical trial of a &#8220;common&#8221; cold vaccine:</p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1969 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Small-Sample.png?resize=391%2C159&#038;ssl=1" alt="curse of big data" width="391" height="159" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Small-Sample.png?w=391&amp;ssl=1 391w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Small-Sample.png?resize=300%2C122&amp;ssl=1 300w" sizes="auto, (max-width: 391px) 100vw, 391px" /></p>
<p>The table shows the number of subjects who had both a positive outcome (no infection) and negative outcome (infection) across the two types of treatment. A standard statistical test of association, the <a href="https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test" target="_blank" rel="noopener"><strong>Pearson chi-squared</strong></a>, indicates <strong>we cannot say there is any difference in outcomes</strong> across the two treatment types.</p>
<p>That is, we <strong>cannot reject the &#8220;null&#8221; hypothesis of no association</strong> at the 95% level of confidence (i.e., <em>X<sup>2 </sup>=</em> 0.024).</p>
<p>The <strong>strength of the association</strong>, or <strong><em>effect size,</em></strong> is obtained from the <a href="https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section5.html" target="_blank" rel="noopener"><strong>ratio of relative risks</strong></a>.</p>
<p>The probability of a vaccinated subject getting sick is (24 / 59) or 0.407 (40.7%) while that for the placebo group is (29 / 69) or 0.420 (42.0%).</p>
<p>So the relative risk ratio is (0.407 / 0.420) or 0.968.<a href="#_ftn1" name="_ftnref1">[1]</a></p>
<p>Thus, we would expect that when applied to the population, <strong><span style="text-decoration: underline;">under the same conditions as the study</span></strong>, there would be 3.2% fewer infections among those who received the vaccine (i.e., (1 &#8211; 0.968) *100)).</p>
<p>This 3.2% is known as the <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/" target="_blank" rel="noopener"><em><strong>efficacy rate</strong></em></a> of the vaccine.</p>
<p>The 95% confidence interval for the relative risk ratio is wide (i.e., 0.639 to 1.465) indicating a lack of precision in the <strong><em>point estimate</em></strong> of 0.968.</p>
<p>The study investigators conclude that the effect of the vaccine is <strong>neither <span style="text-decoration: underline;">statistically</span> nor <a href="https://statisticsbyjim.com/hypothesis-testing/practical-statistical-significance/" target="_blank" rel="noopener"><em>practically</em></a> significant</strong>.</p>
<p>Aside from its statistical insignificance, an efficacy rate of just 3.2% is not nearly large enough to justify starting production of the vaccine.</p>
<h3>Large clinical trial example</h3>
<p>Contrast this with the following study results based on a much larger sample of 44,800 subjects:<a href="#_ftn2" name="_ftnref2">[2]</a></p>
<p><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1970 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Large-Sample.png?resize=398%2C155&#038;ssl=1" alt="curse of big data" width="398" height="155" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Large-Sample.png?w=398&amp;ssl=1 398w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Cold-Vaccine-Trial-Large-Sample.png?resize=300%2C117&amp;ssl=1 300w" sizes="auto, (max-width: 398px) 100vw, 398px" /></p>
<p>The Pearson chi-squared statistic (<em>X<sup>2</sup></em>) is now 8.375. Thus, the <strong>hypothesis of no association <u>can be rejected</u> </strong>at the 95% level of confidence.</p>
<p>And the <strong>95% confidence interval </strong>for the relative risk ratio is<strong> much narrower indicating a much higher level of precision</strong> (i.e., 0.947 to 0.990).<a href="#_ftn3" name="_ftnref3">[3]</a></p>
<p>The study investigators now conclude that there is a <strong>statistically significant</strong> association between receiving the vaccine and avoiding a cold infection (positive outcome).</p>
<p><strong>But, </strong>the <strong>relative risk ratio</strong> of a positive outcome from receiving the vaccine is<strong> identical </strong>to that obtained from the smaller study,<strong> 0.968. </strong></p>
<p>Implying the <strong>efficacy rate is also the same, 3.2%</strong>.</p>
<h2>Practical vs statistical significance</h2>
<p>What are we to make of this?</p>
<p>From the perspective of <strong>effect size</strong>, do the larger study results carry more weight <strong>simply because</strong> the hypothesis of no association can be rejected? Even though the <strong><em>practical </em>significance has remained the same</strong>?</p>
<p>We can turn a very small, 3.2% effect into a <span style="text-decoration: underline;"><strong>statistically</strong></span> significant effect by simply increasing the sample size.</p>
<p>But does this <strong>change</strong> the <strong><span style="text-decoration: underline;">practical</span> </strong>significance of the 3.2%?</p>
<p><span style="font-size: 14pt;"><strong><span style="font-size: 12pt;">No</span>.</strong></span></p>
<p>If 3.2% was deemed by the study investigators to be <strong>practically insignificant</strong>, it<strong> remains practically insignificant.</strong> Despite the larger sample size and despite it now being statistically significant.<a href="#_ftn4" name="_ftnref4">[4]</a></p>
<h2>A curse of data &#8220;bigness&#8221;</h2>
<p style="text-align: center;"><strong>With a large enough sample, everything is statistically significant, <span style="text-decoration: underline;">even associations that are practically not significant or very interesting</span>.</strong></p>
<p>The implication is that rather than focusing on hypothesis testing as sample sizes increase, the focus should <strong>shift. Towards</strong> the<strong> size of the estimated effect</strong>, whether the<strong> estimated effect is “practically” important,</strong> and <strong>“sensitivity analysis”</strong> (i.e., how does the estimated effect change when <em><strong>control variables</strong></em> are added and dropped).<a href="#_ftn5" name="_ftnref5">[5]</a></p>
<p><strong>Confidence intervals</strong> can and should play a role. But they will get narrower and narrower as sample sizes grow. And everything within the confidence interval could still be deemed not practically important.</p>
<p>In sum, <strong>as data get bigger</strong> (and it does not take massive amounts of data for this to be an issue), <strong>we need to guard against concluding that a small effect is <span style="text-decoration: underline;">practically</span> significant just because the <a href="https://statisticsbyjim.com/hypothesis-testing/interpreting-p-values/" target="_blank" rel="noopener">p-value</a> is very small</strong> (i.e., the effect is statistically significant).</p>
<p><strong>The curse of big data is still very much with us.</strong></p>
<p>&nbsp;</p>
<p><a href="#_ftnref1" name="_ftn1">[1]</a> A ratio of 1.0 would mean no difference in effect between the treatment types.</p>
<p><a href="#_ftnref2" name="_ftn2">[2]</a> As a point of comparison, the 2020 Moderna and Pfizer COVID-19 vaccines trials consisted of about 30,000 and 40,000 subjects.</p>
<p><a href="#_ftnref3" name="_ftn3">[3]</a> A more complicated technique is used to calculate confidence intervals for actual clinical trial results than used here, which typically result in wider intervals.  For example, in 2020, Moderna <a href="https://www.modernatx.com/covid19vaccine-eua/providers/clinical-trial-data" target="_blank" rel="noopener"><strong>reported</strong></a> an efficacy rate of 94.1% for its COVID-19 vaccine with a 95% confidence interval of 89.3% to 96.8%.</p>
<p><a href="#_ftnref4" name="_ftn4">[4]</a> Since the standard error of the relative risk ratio estimate is based on the cell counts in the <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/" target="_blank" rel="noopener"><strong>contingency table</strong></a>, increasing the size of the sample lowers the standard error, making it more likely we can reject the null hypothesis at a given level of confidence.</p>
<p><a href="#_ftnref5" name="_ftn5">[5]</a> The paper <a href="https://www.galitshmueli.com/system/files/Print%20Version.pdf" target="_blank" rel="noopener"><strong>Too Big to Fail</strong></a> presents a nice discussion of these issues. Additionally, the American Statistical Association released <strong><a href="https://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108" target="_blank" rel="noopener">recommendations</a> </strong>on the reporting of p-values.</p>
<p>&nbsp;</p>
<p>The post <a href="https://www.kddanalytics.com/curse-of-big-data/">Curse of Big Data</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1961</post-id>	</item>
		<item>
		<title>Efficacy vs Effectiveness of the COVID Vaccines…&#8221;tomato, tomahto&#8221;?</title>
		<link>https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/</link>
		
		<dc:creator><![CDATA[KDD]]></dc:creator>
		<pubDate>Thu, 08 Apr 2021 18:15:48 +0000</pubDate>
				<category><![CDATA[Categorical Data Analysis]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Data Analytics Methods]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[categorical data]]></category>
		<category><![CDATA[contingency table]]></category>
		<category><![CDATA[COVID]]></category>
		<category><![CDATA[efficacy]]></category>
		<category><![CDATA[relative risk]]></category>
		<category><![CDATA[statistics]]></category>
		<guid isPermaLink="false">https://www.kddanalytics.com/?p=1939</guid>

					<description><![CDATA[<p>You like potato and I like potahto You like tomato and I like tomahto Potato, potahto, tomato, tomahto Let&#8217;s call the whole thing off But oh, if we call the whole thing off Then we must part And oh, if we ever part then that might break my heart &#8212;Ira Gershwin The eye-popping efficacy rates&#8230;</p>
<p>The post <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/">Efficacy vs Effectiveness of the COVID Vaccines…&#8221;tomato, tomahto&#8221;?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p style="text-align: center;"><em>You like potato and I like potahto</em><br />
<em>You like tomato and I like tomahto</em><br />
<em>Potato, potahto, tomato, tomahto</em><br />
<em>Let&#8217;s call the whole thing off</em></p>
<p style="text-align: center;"><em>But oh, if we call the whole thing off</em><br />
<em>Then we must part</em><br />
<em>And oh, if we ever part</em><br />
<em>then that might break my heart</em></p>
<p style="text-align: center;"><em>&#8212;Ira Gershwin</em></p>
<p>The eye-popping efficacy rates reported for the Moderna (<a href="https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/Moderna.html"><strong>94%</strong></a>), Pfizer (<a href="https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/Pfizer-BioNTech.html"><strong>95%</strong></a>) and, to a lesser extent, the Johnson &amp; Johnson (<a href="https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/janssen.html"><strong>66%</strong></a>) COVID-19 vaccines have undoubtedly not escaped your attention.</p>
<p>But what is vaccine <em><strong>efficacy</strong></em> and how is it calculated? And how does it differ from vaccine <em><strong>effectiveness</strong></em>?</p>
<h2>Moderna vaccine efficacy</h2>
<p>First, consider efficacy. Using Moderna’s reported clinical trial results as an example, we see that it is a straightforward calculation.</p>
<p>Moderna <strong><a href="https://www.modernatx.com/covid19vaccine-eua/providers/clinical-trial-data">reported</a></strong> results from it&#8217;s COVID-19 vaccine trial in November 2020. The results are shown below in a 2&#215;2 “<a href="https://en.wikipedia.org/wiki/Contingency_table"><strong><em>contingency</em></strong></a>” or “<strong><em>cross-tabulation</em></strong>” table. The columns show the number of subjects who were infected (or not); the rows show the number who received the vaccine (or the placebo). And the cells show the intersection of those two events.</p>
<h4><img data-recalc-dims="1" decoding="async" loading="lazy" class="size-full wp-image-1954 aligncenter" src="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Moderna-COVID-Clinical-Trial-Contingency-Table-v2.png?resize=458%2C164&#038;ssl=1" alt="Efficacy of Moderna COVID vaccine" width="458" height="164" srcset="https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Moderna-COVID-Clinical-Trial-Contingency-Table-v2.png?w=458&amp;ssl=1 458w, https://i0.wp.com/www.kddanalytics.com/wp-content/uploads/2021/04/Moderna-COVID-Clinical-Trial-Contingency-Table-v2.png?resize=300%2C107&amp;ssl=1 300w" sizes="auto, (max-width: 458px) 100vw, 458px" /></h4>
<h3>Relative risk</h3>
<p>The <strong>strength of the association, </strong>or the<strong><em> effect size</em>,</strong> between receiving the vaccine and not getting infected is measured by the <em><strong>relative risk</strong></em>.</p>
<p>The <em><strong>probability</strong></em> or <em><strong>risk</strong></em> of a vaccinated subject being infected is 0.08%. That is, (11 / 14,134) or the expected number of events / sum of events and non-events. For a subject receiving the placebo, the probability of infection is higher at 1.31% (i.e., 185 / 14,073).</p>
<p>So, using the placebo group as the reference group, the <em><strong>relative risk</strong></em> is (11 / 14,134) / (185 / 14,073) or 0.059.<a href="#_ftn1" name="_ftnref1">[1]</a></p>
<p>In other words, <strong>the risk of a vaccinated person being infected is 94.1% <span style="text-decoration: underline;">lower</span> compared to a subject who received the placebo</strong> (i.e., (1 – 0.059) * 100)).</p>
<p>It is this calculation of 94.1% that was reported by Moderna as the vaccine&#8217;s <strong><em><a href="https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section6.html">efficacy rate</a></em>.</strong><a href="#_ftn2" name="_ftnref2">[2]</a></p>
<h2>Vaccine effectiveness</h2>
<p>So, what about <em><strong>vaccine effectiveness</strong></em>? The term effectiveness refers to <strong>how the vaccine performs in the real world</strong>.  Efficacy refers to how the vaccine performs under “optimal” conditions of a clinical trial.</p>
<p>Clinical trials are based on a sample of subjects who may not be fully representative of the general population (e.g., all <a href="https://www.verywellhealth.com/comorbidity-5081615"><strong>comorbidities</strong></a> are not controlled for). In addition, the COVID strain that existed in the population during the clinical trial period may not be the same that occurs when the vaccine is released. Also, vaccine transportation, storage and delivery may differ from the more controlled environment of the clinical trial. Thus, the effectiveness of the vaccine may be different from what was found during the clinical trial.</p>
<h3>Studies on COVID vaccine effectiveness</h3>
<p>So, do we have any data yet on the real-world effectiveness of the COVID vaccines? It takes time to collect data, but <strong>we do have some indication that vaccine effectiveness is very high.</strong></p>
<p>An early <a href="https://www.nejm.org/doi/full/10.1056/NEJMoa2101765"><strong>study</strong></a> appeared February 24, 2021 in the New England Journal of Medicine.  The study examined the Pfizer vaccine performance in Israel. The sample was matched data from over 1 million people, half who were vaccinated between December 2020 to February 2021 and half who were not. The results of the study suggest a <strong>symptomatic infection effectiveness rate of 94%</strong> 7+ days after the second dose.</p>
<p>A more recent <a href="https://www.cdc.gov/mmwr/volumes/70/wr/mm7013e3.htm"><strong>study</strong></a> released by the CDC on April 2 examined both the Pfizer and Moderna vaccines.  This study used US data from December 2020 to March 2021. The sample consisted of 3,950 health care personnel, first responders, and other front-line workers.  The study found that the <strong>vaccines were 90% effective against COVID infection</strong> 14+ days after the second dose. <strong>Even 14+ days after the <span style="text-decoration: underline;">first</span> dose the vaccines were 80% effective.</strong></p>
<p>As a point of comparison, according to the <a href="https://www.cdc.gov/flu/vaccines-work/vaccineeffect.htm"><strong>CDC</strong></a>, effectiveness of the annual flu vaccination ranges between 40 and 60%.<a href="#_ftn3" name="_ftnref3">[3]</a></p>
<p><strong>So, the effectiveness rate, after 2 doses of the Pfizer and Moderna vaccines, appears to be very close in magnitude to the efficacy rate.</strong></p>
<p>Very good news indeed!</p>
<p>Tomato, tomahto?</p>
<p>&nbsp;</p>
<p><a href="#_ftnref1" name="_ftn1">[1]</a> A relative risk ratio of 1.0 would mean no difference in effect between the treatment types.</p>
<p><a href="#_ftnref2" name="_ftn2">[2]</a> A summary of efficacy rates across the range of current COVID vaccines can be found <a href="http://www.healthdata.org/covid/covid-19-vaccine-efficacy-summary">here</a>.</p>
<p><a href="#_ftnref3" name="_ftn3">[3]</a> One reason for the range is that the flu strain that is in circulation can differ from what was predicted when the annual flu vaccine was developed earlier in the year.</p>
<p>The post <a href="https://www.kddanalytics.com/covid-vaccine-efficacy-effectiveness/">Efficacy vs Effectiveness of the COVID Vaccines…&#8221;tomato, tomahto&#8221;?</a> appeared first on <a href="https://www.kddanalytics.com">KDD Analytics</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1939</post-id>	</item>
	</channel>
</rss>
