would you agree that in case of a small sample, and where the entities studied are ralatively rare and skewd, then the MOST LIKELY result is that those rare entities are under represented in the sample?

]]>Now this is purest rubbish. I never said they were. However, the Iraqi clusters, excluding Fallujah, are *not* a bad sample, and constantly saying that they were doesn’t make them one. They were a small sample relative to the population, but as luck would have it (or rather, unfortunately) the effect they were measuring was a very significant one indeed. In all the governorates except one, the death rate post invasion was between 150% and three times the pre-invasion death rate.

This means that, when due allowance is made for the size of the sample, including reducing its effective size to take account of clustering, the effect is still there.

If the overall effect was only about a 25% increase in the death rate, then I would have agreed with you that this sample was not really big enough to say with confidence that the effect had been picked up. However, large effects, present in nearly every cluster, are really quite an improbable outcome under the null hypothesis that the underlying death rate was unchanged.

Here’s a reference for you on cluster sampling and design effects. Perhaps you’d now be able to explain to me precisely why you think that this methodology is not valid, or whether you believe (incorrectly) that it was not followed.

I think that what you’re trying to do by talking about a “bad sample” is to equivocate between a nonrandom sample (which of course could not be statistically corrected, but is not what the JHU team actually took) and a sample which is merely clustered (in which case the variance inflation method described produces valid estimates). Do be very sure that I’m not going to be fooled by this one.

]]>Of course, I’m beginning to get the impression you don’t actually care what the right answer is …

]]>Think of it this way; the words “design effect” reappear frequently in the paper, and they are there for a reason.

]]>You keep assuming that a cluster sample is always equivalent to a random sample of the same size. That is true only if the distribution is random. You also assume that you can iterate the experiment to arrive at the correct value. In the case of the Iraq study, we can’t yet do so, we just have one snapshot.

Try this: I hand you a container with 5 purple balls and 5 yellow balls in it. I tell you that the container is 1 of 10 containers each containing 10 balls. That’s all you know. Now what can you tell me statistically about the distribution of the other balls?

If you assume that the distribution of balls is random you can make a good guess but what if a I tell you the distribution of balls in the containers is non-random?

The problem with cluster sampling is that you are grabbing balls not one at a time at random but in chunks of ten. You deviation will always be higher because the theoretical minimum deviation is 10. Random samples (from a thousand) of a hundred balls could produce numbers like (1 white, 99 black), (43 white, 57 black), (72 white, 28 black) etc. Cluster samples from a non-random sample (each container all one color) would produce numbers like (0 white, 100 black), (30 white, 70 black), (50 white, 50 black), (100 white, zero black) etc.

Because the composition of balls within each container (cluster) is non-random you MAGNIFY YOUR VARIANCE.

Think of it this way: clustering effectively reduces your sample size. Selecting from 10 containers of either all white or black out a 100 containers (1000 balls total) is the statistical equivalent of selecting 10 balls from a population of 100. It obvious that your far more likely to get an extreme value than if you chose 100 balls from a random sample.

]]>The error here is the statement that “in case B the deviation would be zero”. This would only be the case if you pulled out ten cases, all containing only black balls. If the true proportion of black balls was, say 70%, then the chance of this happening would be no more than 1 in 50.

]]>However, against this, there is a net movement out of the region “Upper South” and into “Lower South” as the regions of Qadisiyah and Dhi Qar were paired. This adds one cluster to the Shiite province of Dhi Qar and means that Qadisiyah (which contains Samarra, which saw significant violence) is not sampled at all. Improvements in Shiite areas would be exaggerated, not minimised by this effect.

(and I maintain that your ball/urn story is entirely a priori, while inspection of the data does not give grounds for believing that the sample is heterogeneous)

]]>