<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The Madness of Methods</title>
	<atom:link href="http://chicagoboyz.net/archives/2555.html/feed" rel="self" type="application/rss+xml" />
	<link>http://chicagoboyz.net/archives/2555.html</link>
	<description>Some Chicago Boyz know each other from student days at the University of Chicago. Others are Chicago boys in spirit. The blog name is also intended as a good-humored gesture of admiration for distinguished Chicago boys including those pictured above.</description>
	<lastBuildDate>Thu, 24 May 2012 03:04:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Skip Smith</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8016</link>
		<dc:creator>Skip Smith</dc:creator>
		<pubDate>Thu, 04 Nov 2004 18:43:06 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8016</guid>
		<description>All replies are in the original thread on the subject.
</description>
		<content:encoded><![CDATA[<p>All replies are in the original thread on the subject.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: incog</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8015</link>
		<dc:creator>incog</dc:creator>
		<pubDate>Thu, 04 Nov 2004 10:31:38 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8015</guid>
		<description>Shannon,

would you agree that in case of a small sample, and where the entities studied are ralatively rare and skewd, then the MOST LIKELY result is that those rare entities are under represented in the sample?</description>
		<content:encoded><![CDATA[<p>Shannon,</p>
<p>would you agree that in case of a small sample, and where the entities studied are ralatively rare and skewd, then the MOST LIKELY result is that those rare entities are under represented in the sample?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dsquared</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8014</link>
		<dc:creator>dsquared</dc:creator>
		<pubDate>Thu, 04 Nov 2004 06:43:38 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8014</guid>
		<description>&lt;i&gt;As I&#039;ve said before, statistical estimators are not a magical black box that correct for bad samples.&lt;/i&gt;

Now this is purest rubbish.  I never said they were.  However, the Iraqi clusters, excluding Fallujah, are &lt;i&gt;not&lt;/i&gt; a bad sample, and constantly saying that they were doesn&#039;t make them one.  They were a small sample relative to the population, but as luck would have it (or rather, unfortunately) the effect they were measuring was a very significant one indeed.  In all the governorates except one, the death rate post invasion was between 150% and three times the pre-invasion death rate.

This means that, when due allowance is made for the size of the sample, including reducing its effective size to take account of clustering, the effect is still there.

If the overall effect was only about a 25% increase in the death rate, then I would have agreed with you that this sample was not really big enough to say with confidence that the effect had been picked up.  However, large effects, present in nearly every cluster, are really quite an improbable outcome under the null hypothesis that the underlying death rate was unchanged.

&lt;a href=&quot;http://www.mori.com/pubinfo/aiz/cluster-sampling-a-false-economy.shtml&quot; rel=&quot;nofollow&quot;&gt;Here&#039;s&lt;/a&gt; a reference for you on cluster sampling and design effects.  Perhaps you&#039;d now be able to explain to me precisely why you think that this methodology is not valid, or whether you believe (incorrectly) that it was not followed.

I think that what you&#039;re trying to do by talking about a &quot;bad sample&quot; is to equivocate between a nonrandom sample (which of course could not be statistically corrected, but is not what the JHU team actually took) and a sample which is merely clustered (in which case the variance inflation method described produces valid estimates).  Do be very sure that I&#039;m not going to be fooled by this one.</description>
		<content:encoded><![CDATA[<p><i>As I&#8217;ve said before, statistical estimators are not a magical black box that correct for bad samples.</i></p>
<p>Now this is purest rubbish.  I never said they were.  However, the Iraqi clusters, excluding Fallujah, are <i>not</i> a bad sample, and constantly saying that they were doesn&#8217;t make them one.  They were a small sample relative to the population, but as luck would have it (or rather, unfortunately) the effect they were measuring was a very significant one indeed.  In all the governorates except one, the death rate post invasion was between 150% and three times the pre-invasion death rate.</p>
<p>This means that, when due allowance is made for the size of the sample, including reducing its effective size to take account of clustering, the effect is still there.</p>
<p>If the overall effect was only about a 25% increase in the death rate, then I would have agreed with you that this sample was not really big enough to say with confidence that the effect had been picked up.  However, large effects, present in nearly every cluster, are really quite an improbable outcome under the null hypothesis that the underlying death rate was unchanged.</p>
<p><a href="http://www.mori.com/pubinfo/aiz/cluster-sampling-a-false-economy.shtml" rel="nofollow">Here&#8217;s</a> a reference for you on cluster sampling and design effects.  Perhaps you&#8217;d now be able to explain to me precisely why you think that this methodology is not valid, or whether you believe (incorrectly) that it was not followed.</p>
<p>I think that what you&#8217;re trying to do by talking about a &#8220;bad sample&#8221; is to equivocate between a nonrandom sample (which of course could not be statistically corrected, but is not what the JHU team actually took) and a sample which is merely clustered (in which case the variance inflation method described produces valid estimates).  Do be very sure that I&#8217;m not going to be fooled by this one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Skip Smith</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8013</link>
		<dc:creator>Skip Smith</dc:creator>
		<pubDate>Thu, 04 Nov 2004 05:35:03 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8013</guid>
		<description>As I&#039;ve said before, statistical estimators are not a magical black box that correct for bad samples.  See the references on survey methodology I cite in the original thread on this subject if you want more details on why a cluster sample is likely to be misleading here.  

Of course, I&#039;m beginning to get the impression you don&#039;t actually care what the right answer is ...</description>
		<content:encoded><![CDATA[<p>As I&#8217;ve said before, statistical estimators are not a magical black box that correct for bad samples.  See the references on survey methodology I cite in the original thread on this subject if you want more details on why a cluster sample is likely to be misleading here.  </p>
<p>Of course, I&#8217;m beginning to get the impression you don&#8217;t actually care what the right answer is &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dsquared</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8012</link>
		<dc:creator>dsquared</dc:creator>
		<pubDate>Wed, 03 Nov 2004 19:55:06 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8012</guid>
		<description>&lt;i&gt;Think of it this way: clustering effectively reduces your sample size&lt;/i&gt;

Think of it this way; the words &quot;design effect&quot; reappear frequently in the paper, and they are there for a reason.</description>
		<content:encoded><![CDATA[<p><i>Think of it this way: clustering effectively reduces your sample size</i></p>
<p>Think of it this way; the words &#8220;design effect&#8221; reappear frequently in the paper, and they are there for a reason.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shannon Love</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8011</link>
		<dc:creator>Shannon Love</dc:creator>
		<pubDate>Wed, 03 Nov 2004 17:20:37 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8011</guid>
		<description>dsquared, 

You keep assuming that a cluster sample is always equivalent to a random sample of the same size. That is true only if the distribution is random. You also assume that you can iterate the experiment to arrive at the correct value. In the case of the Iraq study, we can&#039;t yet do so, we just have one snapshot. 

Try this: I hand you a container with 5 purple balls and 5 yellow balls in it. I tell you that the container is 1 of 10 containers each containing 10 balls. That&#039;s all you know. Now what can you tell me statistically about the distribution of the other balls?

If you assume that the distribution of balls is random you can make a good guess but what if a I tell you the distribution of balls in the containers is non-random? 

The problem with cluster sampling is that you are grabbing balls not one at a time at random but in chunks of ten. You deviation will always be higher because the theoretical minimum deviation is 10. Random samples (from a thousand) of a hundred balls could produce numbers like (1 white, 99 black), (43 white, 57 black), (72 white, 28 black) etc. Cluster samples from a non-random sample (each container all one color) would produce numbers like (0 white, 100 black), (30 white, 70 black), (50 white, 50 black), (100 white, zero black) etc. 

Because the composition of balls within each container (cluster) is non-random you MAGNIFY YOUR VARIANCE.

 Think of it this way: clustering effectively reduces your sample size. Selecting from 10 containers of either all white or black out a 100 containers (1000 balls total) is the statistical equivalent of selecting 10 balls from a population of 100. It obvious that your far more likely to get an extreme value than if you chose 100 balls from a random sample.</description>
		<content:encoded><![CDATA[<p>dsquared, </p>
<p>You keep assuming that a cluster sample is always equivalent to a random sample of the same size. That is true only if the distribution is random. You also assume that you can iterate the experiment to arrive at the correct value. In the case of the Iraq study, we can&#8217;t yet do so, we just have one snapshot. </p>
<p>Try this: I hand you a container with 5 purple balls and 5 yellow balls in it. I tell you that the container is 1 of 10 containers each containing 10 balls. That&#8217;s all you know. Now what can you tell me statistically about the distribution of the other balls?</p>
<p>If you assume that the distribution of balls is random you can make a good guess but what if a I tell you the distribution of balls in the containers is non-random? </p>
<p>The problem with cluster sampling is that you are grabbing balls not one at a time at random but in chunks of ten. You deviation will always be higher because the theoretical minimum deviation is 10. Random samples (from a thousand) of a hundred balls could produce numbers like (1 white, 99 black), (43 white, 57 black), (72 white, 28 black) etc. Cluster samples from a non-random sample (each container all one color) would produce numbers like (0 white, 100 black), (30 white, 70 black), (50 white, 50 black), (100 white, zero black) etc. </p>
<p>Because the composition of balls within each container (cluster) is non-random you MAGNIFY YOUR VARIANCE.</p>
<p> Think of it this way: clustering effectively reduces your sample size. Selecting from 10 containers of either all white or black out a 100 containers (1000 balls total) is the statistical equivalent of selecting 10 balls from a population of 100. It obvious that your far more likely to get an extreme value than if you chose 100 balls from a random sample.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dsquared</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8010</link>
		<dc:creator>dsquared</dc:creator>
		<pubDate>Wed, 03 Nov 2004 15:26:15 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8010</guid>
		<description>And finally finally, this numerical example doesn&#039;t work.  If you have 1000 balls in containers of ten, then a lower bound on how unrepresentative your sample can be would be the case where the containers are either all black or all white.  But this would be just the same as having 100 balls and drawing ten from them.  And I don&#039;t think anyone would seriously argue that drawing ten balls from a hundred isn&#039;t a perfectly reasonable way to estimate the proportion of black and white balls, or that your measured standard errors would be utterly perverse.

The error here is the statement that &quot;in case B the deviation would be zero&quot;.  This would only be the case if you pulled out ten cases, all containing only black balls.  If the true proportion of black balls was, say 70%, then the chance of this happening would be no more than 1 in 50.</description>
		<content:encoded><![CDATA[<p>And finally finally, this numerical example doesn&#8217;t work.  If you have 1000 balls in containers of ten, then a lower bound on how unrepresentative your sample can be would be the case where the containers are either all black or all white.  But this would be just the same as having 100 balls and drawing ten from them.  And I don&#8217;t think anyone would seriously argue that drawing ten balls from a hundred isn&#8217;t a perfectly reasonable way to estimate the proportion of black and white balls, or that your measured standard errors would be utterly perverse.</p>
<p>The error here is the statement that &#8220;in case B the deviation would be zero&#8221;.  This would only be the case if you pulled out ten cases, all containing only black balls.  If the true proportion of black balls was, say 70%, then the chance of this happening would be no more than 1 in 50.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dsquared</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8009</link>
		<dc:creator>dsquared</dc:creator>
		<pubDate>Wed, 03 Nov 2004 14:59:06 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8009</guid>
		<description>And finally, I don&#039;t understand why you say that the Lancet figure is &quot;at least double every other estimate&quot;, when there are no other estimates.  The Iraq Body Count estimate is not an estimate of total casualties; it&#039;s a lower bound on the number of civilian casualties of acts of violence.  And to think that you are using the phrase &quot;lying by omission&quot; to describe the Lancet&#039;s discussion of infant mortality rates!  For shame!</description>
		<content:encoded><![CDATA[<p>And finally, I don&#8217;t understand why you say that the Lancet figure is &#8220;at least double every other estimate&#8221;, when there are no other estimates.  The Iraq Body Count estimate is not an estimate of total casualties; it&#8217;s a lower bound on the number of civilian casualties of acts of violence.  And to think that you are using the phrase &#8220;lying by omission&#8221; to describe the Lancet&#8217;s discussion of infant mortality rates!  For shame!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dsquared</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8008</link>
		<dc:creator>dsquared</dc:creator>
		<pubDate>Wed, 03 Nov 2004 14:54:20 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8008</guid>
		<description>Furthermore, you are wrong in your claim about what Table 1 shows.  You appear to be correct about clusters moving out of Kurdish areas into Sunni ones; the movement from &quot;North&quot; into &quot;Centre&quot; is a result of the pairing of Dehuk and Ninawa governorates; this would tend to oversample the violent province of Ninawa relative to Dehuk.  

However, against this, there is a net movement out of the region &quot;Upper South&quot; and into &quot;Lower South&quot; as the regions of Qadisiyah and Dhi Qar were paired.  This adds one cluster to the Shiite province of Dhi Qar and means that Qadisiyah (which contains Samarra, which saw significant violence) is not sampled at all.  Improvements in Shiite areas would be exaggerated, not minimised by this effect.

(and I maintain that your ball/urn story is entirely a priori, while inspection of the data does not give grounds for believing that the sample is heterogeneous)</description>
		<content:encoded><![CDATA[<p>Furthermore, you are wrong in your claim about what Table 1 shows.  You appear to be correct about clusters moving out of Kurdish areas into Sunni ones; the movement from &#8220;North&#8221; into &#8220;Centre&#8221; is a result of the pairing of Dehuk and Ninawa governorates; this would tend to oversample the violent province of Ninawa relative to Dehuk.  </p>
<p>However, against this, there is a net movement out of the region &#8220;Upper South&#8221; and into &#8220;Lower South&#8221; as the regions of Qadisiyah and Dhi Qar were paired.  This adds one cluster to the Shiite province of Dhi Qar and means that Qadisiyah (which contains Samarra, which saw significant violence) is not sampled at all.  Improvements in Shiite areas would be exaggerated, not minimised by this effect.</p>
<p>(and I maintain that your ball/urn story is entirely a priori, while inspection of the data does not give grounds for believing that the sample is heterogeneous)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dsquared</title>
		<link>http://chicagoboyz.net/archives/2555.html/comment-page-1#comment-8007</link>
		<dc:creator>dsquared</dc:creator>
		<pubDate>Wed, 03 Nov 2004 14:35:10 +0000</pubDate>
		<guid isPermaLink="false">http://www390.pair.com/chicagob/blog/002555.php#comment-8007</guid>
		<description>But surely this numerical example just goes to show that a clustered sample is far more likely to underestimate than to overestimate rare events like deaths?</description>
		<content:encoded><![CDATA[<p>But surely this numerical example just goes to show that a clustered sample is far more likely to underestimate than to overestimate rare events like deaths?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

