Courtesy of Amac comes letters just published in Lancet concerning the Iraqi Mortality Survey.
Stephen Apfelroth, Department of Pathology, Albert Einstein College of Medicine writes:
In their Article on mortality before and after the 2003 invasion of Iraq (Nov 20, p 1857),1 Les Roberts and colleagues use several questionable sampling techniques that should have been more thoroughly examined before publication.
Although sampling of 988 households randomly selected from a list of all households in a country would be routinely acceptable for a survey, this was far from the method actually used–a point basically lost in the news releases such a report inevitably engenders. The survey actually only included 33 randomised selections, with 30 households interviewed surrounding each selected cluster point. Again, this technique would be adequate for rough estimates of variables expected to be fairly homogeneous within a geographic region, such as political opinion or even natural mortality, but it is wholly inadequate for variables (such as violent death) that can be expected to show extreme local variation within each geographic region. In such a situation, multiple random sample points are required within each geographic region, not one per 739 000 individuals
” In my opinion, such a flaw by itself is fatal, and should have precluded publication in a peer-reviewed journal.”
Glad I’m not the only nut job out here.
The rest of the letter concerns the details of sampling each individual cluster. I think he makes some minor points about the sampling technique itself, but I think those are largely peripheral issues unlikely to affect the major outlines of the study. I don’t think that Apfelroth has realized that the inclusion or exclusion of the Falluja cluster swamps every other effect in the study save the cluster methodology itself.
Les Roberts replies:
” The ability of a 33-neighbourhood sample to portray adequately the mortality experience of an entire country has seemed problematic to many critics of our study in Iraq. However, most mortality surveys in war zones only contain 30 clusters, as do the recommended approaches of United Nations agencies and the US Agency for International Development.1″
Except no one has ever attempted to conduct such a study under wartime conditions. Moreover, except for the Kosovo attempt nobody has tried to use cluster sampling to measure deaths by violence. The results of the Kosovo study have not been replicated by other means, so we have no means of assessing whether it succeeded or not. The general consensus right now seems to be that it did not (see below).
“Ample evidence suggests that 30 locations are reasonably adequate for measuring the level of malnutrition or immunisation coverage in an entire country.2″
If violence had the same distribution as malnutrition or immunization we would have no disagreement. Malnutrition and immunization don’t clump together like incidences of violence do. I think this actually reveals Roberts’s main weakness as a researcher in this case. I don’t think he actually knows anything about warfare. He really seems to believe that the patterns produced by the conscious acts of violence will be close to the patterns produced by malnutrition, disease or accident.
” Unfortunately, as Stephen Apfelroth rightly points out, our study and a similar one in Kosovo,3 suggest that in settings where most deaths are from bombing-type events, the standard 30-cluster approach might not produce a high level of precision in the death toll”
Thank you — although he is avoiding Apfelroth’s main criticism, i.e., cluster sampling fails with highly heterogeneous (unevenly distributed) phenomena. He implies that the problem is restricted purely to bombing. In fact, cluster sampling will fail for any phenomenon with a highly heterogeneous distrubution. Choosing to study the incidence of violence, which he knew before hand to be highly geographically concentrated, using cluster sampling was bad, bad design. It was so bad that it might invalidate the entire study.
“But the key public-health findings of this study are robust despite this imprecision.”
How can the key findings, all of which refer to deaths from violence be considered robust if the sampling method is highly imprecise? Precision is important not only in determining the scale of the problem but in determining the causes of death and the profiles of the victims.
“These findings include: a higher death rate after the invasion;”
Which came as an utter shock to everyone I am sure. I am going out on limb here and say that when armies fight the death rate always goes up.
“a 58-fold increase in death from violence, making it the main cause of death; and most violent deaths being caused by air-strikes from Coalition Forces.”
Only true if the Falluja outlier cluster is included. Its nice to see Roberts up to his old tricks. Check out the next line.
“Whether the true death toll is 90 000 or 150 000…”
Or maybe 300,000 or higher! Golly, Roberts, why did you settle on those low numbers? Your actual study shows the death toll is much, much higher. Why downplay it? (I also like the way he makes 90,000 the implied floor for the estimate instead of being near the mainline.) Got to love that precision. It doesn’t matter if the actual number is 90,000 or 150,000. What’s 60,000 dead people give or take? What is important is who to blame.
“these three findings give ample guidance towards understanding what must happen to reduce civilian deaths.”
Putting aside the fact that the three findings are wholly dependent on the results from a single cluster and that the combined findings from the other 32 clusters are radically different, what exactly in the study could actually be used to reduce civilian deaths? The study just assigns blame for the deaths and then stops. How does that help in any practical way?
Let me recap this paragraph:
Roberts admits that cluster sampling does not measure violence well (at least in regard to airstrikes). Nevertheless, he then asserts that his results are “robust” even though they are grounded on an imprecise sampling methodology. He continues his tactic of switching back and forth from findings including the Falluja cluster to those not including the cluster without any indication he is doing so.
“As we noted in the paper, by storing the randomly picked point in a global positioning system (GPS) unit and visiting the nearest 30 households as defined by the GPS, there was little subjectivity in the choice of households”
I think Roberts is on firm ground in this case. Unlike Apfelroth, I don’t have a problem with this part of the methodology. I do think Roberts should have pointed out that all his main conclusions are in fact based on one single cluster in Falluja, defined using a different method and collected under highly adverse conditions[p6 pg6-7].
” We also stated in the paper that people had to have been sleeping under the same roof with a family for 2 months before their death to be considered a household death. This strict definition of household member may have prevented the recoding of some deaths, particularly among former military members who did not live with any household in the weeks before their death, but it ensured that the type of overestimation that concerns Apfelroth did not occur.”
This two month window means that men in the military and those living transiently, like insurgents and jihadists, would never be counted in the survey[p6 pg2]. It would skew the age and gender of the victims of violence away from adult males and toward women, children and the elderly. It might also skew the cause of death away from small arms and toward airstrikes. Basically, this attribute of the study means the study was designed to measure the deaths of individuals that occurred in or near their own homes. It is therefore hardly surprising that the study shows a large percentage of non-combatant deaths.
” Before publication, the article was critically reviewed by many leading authorities in statistics and public health and their suggestions were incorporated into the paper.”
Trust us, we have many unnamed authority figures backing us up. Honest, some even have lab coats and clipboards!
“The death toll estimated by our study is indeed imprecise…”
Nice of them to admit it. Funny, is it not, that this imprecision doesn’t actually affect the overall quality of the study? If this was a study of disease or drug safety, the question of whether 30,000, 90,000, 150,000, 300,000+ people died would be considered rather relevant. (And just to repeat myself, the precision also affects the measurements of who died and how.)
“and those interested in international law and historical records should not be content with our study”
Neither should those interested in scientific integrity.
“In the interim, we feel this study, as well as the only other published sample survey we know of on the subject,5 point to violence from the Coalition Forces as the main cause of death and remind us that the number of Iraqi deaths is certainly many times higher than reported by passive surveillance methods or in press accounts”
” We declare that we have no conflict of interest.”
Beyond the political, that is.
To sum up: Apfelroth asserts that cluster sampling was a poor methodology to study a heterogeneous (unevenly clumped together) phenomenon like military violence (making the same point I made in my original critique of the study). It was such a poor choice as to fatally undermine the entire study. Roberts never refutes this point but merely pays it lip service, and plows on making statements based wholly on the Falluja cluster data.
I am more convinced than ever that Roberts is not being honest.