Judging Methodology

Short or reproducing it, how can one judge the likely accuracy of a study?

Statistics won’t help. Statistics only tell one the odds the results spring from sheer chance, not whether your original measurements were valid in the first place. You get the same statistics from the same data set whether the data represent colored ping-pong balls, car wrecks or the lengths of salamander penises.

About the only way to calibrate the study is to see how it measures the same phenomenon that other studies measured. If the study’s methodology returns results consistent with other studies for one measurement, then we can be more confident that its other measurements are accurate.

The Johns Hopkins funded study of Iraqi mortality before and after the war (published with much media attention in The Lancet) has many critics and defenders. Is there any means of judging the study’s likely accuracy without reproducing it?

I think there is.

The Johns Hopkins study replicated one measurement, pre-war Iraqi infant mortality, that was extensively studied by multiple sources long before the war, and indeed long before 9/11. We can compare the Johns Hopkins study’s measurement of pre-war Iraqi infant mortality to other similar studies, and that will give us at least a rough idea of the likely accuracy of its methodology.

So how does the JH study compare? Not too well actually. The JH study (paragraph 4, page 8) reported a pre-war infant mortality rate of 29/1000. (That’s 29 deaths of children less than 1 year old per every 1000 live births.) After the war, the study says that the death rate jumped to 59/1000. Much of the increased death toll in the study not attributed to violence comes from this near doubling of the infant mortality rate.

By comparison, a Unicef report published in 1999 showed that, in the 1984-1989 time frame, Iraqi infant mortality was 47/1000, and rose to 108/1000 for 1994-1999. Unicef’s last pre-war report in 2002 put the infant mortality rate at 102/1000.

A paper published in The Lancet in 2000 (pdf), reported an increase in infant mortality in the southern 85% of the country, from 47/1000 to 108/1000 for the same period as the Unicef study. (In the northern autonomous zone where Saddam did not rule, infant mortality fell from 64/1000 to 59/1000.)

So the Johns Hopkins study reports a pre-war infant mortality rate well under the 47/1000 level from the halcyon days of the 80’s, when there were no sanctions but an ongoing war. Worse, the study shows a rate that’s just over a quarter of the pre-war infant-mortality rates that the other studies showed.

The John Hopkins study so deviates from the other studies that its measurement of post-war infant mortality of 57/1000 shows a vast improvement over the other pre-war rates. With these numbers, one could argue that by nearly halving the infant mortality rate, the war saved the lives of thousands or even tens of thousands of babies.

There are many possible reasons for this divergence, but political contamination of both sets of studies is the most likely answer. Prior to 9/11, Saddam sought to use infant mortality to undermine support for the sanctions regime. He carried out an orchestrated campaign to both falsify infant deaths and to deny care to those in areas and groups hostile to his regime. Unfortunately, there is evidence that the UN and human-rights groups played along with this manipulation. Even so, the true infant mortality rate was most likely in the 70-100/1000 range. It definitely wasn’t anywhere near 29/1000.

In the post-war Johns Hopkins study, the pre-war infant mortality rate was measured by the same method, household self-reports, that was used to measure deaths from violence. Since the purpose of the study was made very clear to those being interviewed, it would be easy for interviewees to lie about pre-war deaths. They might do so in hopes of undermining the war effort. Since the actual number of infant deaths unrelated to violence is very small (table 2, page 4) it would not take but two or three unreported infant deaths to seriously skew the results.

In a nutshell, since the study’s methodology doesn’t come anywhere close to reproducing the infant mortality rates measured by other studies, under much better conditions, we can safely assume that its methodology doesn’t accurately measure other deaths either.

23 thoughts on “Judging Methodology”

  1. Shanon,

    You are the biggest loser ever to be born in Chicago. After Crooked Timber has demolished your stupid critique of methodology, you still have the fake bravado to actually defend your foolishness. You guys suck big time.

    Love,

    Milton

  2. I don’t think salamanders have penises. And is the plural for penis penii?

    The biggest loser ever to be born in Chicago is me. So get your facts straight.

  3. Shannon,

    You may wish to perform a follow-up analysis dealing with Milton’s methodology in calculating “biggest loser”. I was unable to ascertain if “biggest” was a quanitative, qualitative or metaphysical measure. Perhaps you could help Milton develop his measures? “Loser”, of course has innumerable definitions; none of which was determinable from Milton’s overly succinct analysis.

    Milton, alas, seems to be a representative product of a government education – he “feels” his way to his mathematical verities.

    Perhaps he will accept tutoring?

  4. You might also address Milton’s assumption that you were born in Chicago, which I find rather questionable given that

    1) the “Chicago” part of “Chicagoboyz” refers to the University of Chicago, not all of whose alumni were in fact born in that city, and

    2) the Chicagoboyz website explicitly states that not all Chicagoboyz went to U of C.

    Perhaps cluster sampling was involved.

  5. Pingback: CounterPundit
  6. It doesn’t apply in this case, but isn’t there a statistical analysis of variance that is used as an indicator that the data has been smoothed? I thought there was a drug company and a couple of Aids NGOs that had been caught by it. And it’s part of the reason even scientists are pushing the journals to make the clinical trial data available on the web for others to analyze?

    Thanks,

    Matya no baka

  7. In a nutshell, since the study’s methodology doesn’t come anywhere close to reproducing the infant mortality rates measured by other studies, under much better conditions, we can safely assume that its methodology doesn’t accurately measure other deaths either

    I cannot believe that someone claiming to have experience with statistics would make this argument. It’s ludicrous.

    In actual fact, if we accept your view that the 108 figure was politically biased and the real number was 47 per 1000 live births, then you would expect to get a result of exactly 8 infant deaths in a sample of 275 births 4.6% of the time by pure chance (it’s a binomial distribution). Given the small sample and even a small bias, that probability might be much higher, without affecting the overall result to anything like the degree you seem to want to affect it.

    In any case, I will take your bald unsupported assertion “it definitely wasn’t anywhere near 29/1000” more seriously when I see Mesopotamian dust on your boots. You are now trying to claim that the entire cluster study methodology is unusable, simply in order to challenge a result you don’t like? For shame.

  8. I wish this blog used Trackbacks or had otherwise alerted readers to the rebuttal of critiques of Roberts et al. at the partisan-left intellectual blog Crooked Timber, found here as “Talking Rubbish About Epidemiology..

    Concerning Shannon Love’s earlier critique and its 140 (as of this writing) follow-on comments, Daniel of C.T. wrote:

    The Chicago Boyz blog post is an excellent example of the “Devastating Critique”. Surprise surprise, estimating civilian casualties is a difficult business. That’s why the confidence interval is so wide. They don’t actually raise any principled reasons why the confidence interval ought to be wider than the one published, and therefore they aren’t raising any questions which would make us think that this confidence interval should include zero.

    The last sentence refers to an argument that Daniel earlier imputed to critics of the Roberts et al. study: that we fault its findings because it rules out zero civilian casualties during the invasion and its aftermath. There is a name for such arguments.

    Some of the many comments posted there are worth reading, though it’s heavy sailing.

  9. There’s a name for putting arguments in my mouth that I never made, as well.

    The argument I made was that the survey’s results are interesting despite the wide confidence interval, because that confidence interval is not wide enough to include zero. Why you lot don’t believe that is your own affair, though I hope nobody thinks I am silly enough to believe you would be raising any of these quibbles if the survey had found 100,000 fewer deaths as a result of the invasion.

    Factually, none of Shannon’s arguments does give a reason for thinking that the confidence interval should be wider than the one reported; the objections raised to the cluster sampling methodology are all taken account of in the stadnard errors and design effects estimated.

  10. I haven’t looked at the methodology of the report (I have a bias against registering to view reports in the public domain). I have seen elsewhere that the sampling error is rather large, which in turn suggests certain features of the design. The first is that few first stage sampling units were selected, perhaps only two, and that there was no stratification employed (for example, they might have taken clusters from Suni areas in stratum 1, more from Kurdish areas in stratum 2, and so on).

    In this case, it might just be that the base numbers are correct, but the survey team was unlucky and got an outlier cluster in their sample. Alternately, they may not have performed a careful outlier review, which would have resulted in down-weighting the results from the suspect cluster, and so ended up with these numbers.

    In short, I would rather suggest a naive design than impute political motives.

    Green Bear

  11. There were 33 clusters of 30 households each (minus Fallujah), selected by randomly assigning clusters to Governates, then picking clusters at random within the governate from a GPS system grid (this didn’t happen in Sadr City or Fallujah, where it was unsafe to carry a GPS).

    You can get a user ID and password from http://www.bugmenot.com if you want to see the report without registering

  12. To his credit, Milton probably knows what he’s talking about when it comes to the bigger losers.

    I actually think Kerry will make it, 52 to 49.

    And yes, I have some crow on hand in the fridge.

  13. dsquared,

    I can safely question the studies infant mortality not because I have “mesopotamian dust” on my boots but because the studies estimate of pre-war infant mortality diverges strongly from ALL the other pre-war studies of infant mortality. That does not mean, in an of itself, that the study is inaccurate but it is fact that requires explanation. That the studies authors did not acknowledge this problem (even though their own citations revealed it) nor attempt to explain and that the peer review and editors of Lancet did not also do so, should raise red flags for everybody interested in good science.

    The pre-war infant mortality rate of 23/1000 is the same rate as Jordon’s in the same period. Jordan does not have a police state, UN sanctions or had a war or major civil disturbance in last 25 years. Just as a gut hunch, does it seem plausible that the Iraq of 2002 had the same infant mortality rate?

    If it did, that means that the Unicef and other studies were massively inaccurate. But if we had to guess which studies were inaccurate would you pick the Unicef study with it’s thousands of samples conducted in a time of relative peace or the Lancet study with it’s 16 samples?

    You don’t need a PHd in statistics to see there is an anomaly here that requires an explanation or at least notice.

  14. Since the estimate of excess deaths is based on the change in the death rate in the sample rather than the point estimate, this is much less of a problem than you imply (you have, of course, not bothered to mention this issue, and I draw my own conclusions about your scientific honesty from that).

    You also do not mention the specific problems with respect to Saddam-era estimates of child mortality, despite the fact that I know you are aware of them.

    Furthermore, you are not being completely honest when you claim that the authors of the survey did not mention the issue of infant deaths; page 6 of the report notes that there is a commonly known problem with respect to underreporting of neonatal deaths in similar surveys; you would never have known about this issue if the paper didn’t mention it. The authors actually use the phrase “we acknowledge the potential for recall bias to create an apparent increase in infant mortality”. In my view, this is an extremely generous concession for them to make; I don’t see why underreporting of infant mortality should necessarily affect the change in the reported death rate.

    And you are now claiming that the study had “16 samples”, when it actually had 33 clusters of 30 households each, covering 7500 individuals.

    All of these issues raise red flags, fireworks and Mighty Wurlitzer Organs playing “Sound Science Hacks Are Here Again” to me, with respect to your own credibility.

  15. dsquared,

    “…you claim that the authors of the survey did not mention the issue of infant death…”

    You misunderstood. My complaint is that the study authors (1) did not acknowledge that their studies measurement of pre-war infant mortality differed significantly from all other published sources and (2) they did not seek to explain the difference. They did so even the their own citations list vastly different numbers.

    “And you are now claiming that the study had “16 samples”, when it actually had 33 clusters of 30 households each, covering 7500 individuals”

    Actually I was referring to table 2 (p4) and it’s listing for “Neonatal and unexplained infant” with 6 listed pre-war and 10 listed afterward. I took this as the best data for the number of infant deaths reportedly counted. I could be wrong. In any case the Unicef and other studies counted hundreds or thousands of individuals whereas this study counted a maximum of 65 deaths for individuals “Since the estimate of excess deaths is based on the change in the death rate in the sample rather than the point estimate, this is much less of a problem than you imply”

    Sorry, but I think you have that backwards. The estimation of deaths results on an extrapolation of the differences between pre-war and post-war mortality rates. For example, if you substitute the Unicef 2002 rate of 102/1000 for the studies 27/1000, the study shows that the war dramatically improved infant mortality rate to 57/1000 saving the lives of thousands of children.

    The infant mortality anomaly doesn’t mean in and off itself that the study is in error but it does strongly suggest it. The authors, reviewers and publisher showed have acknowledged this and in a different context I an sure they would have.

  16. (2) they did not seek to explain the difference

    Yes they did; I even gave you the words they used so I have no idea (other than the obvious one) why you continue to make this claim. To coin a phrase, in a different context, I’m sure you would not have.

    Your penultimate paragraph is horribly confused. If you substitute estimates from a different study into this one, you are ignoring the fact that this has been structured as a cohort study; it relates to changes in the death experience of this one sample as a result of an intervention. Of course you are going to get spurious results if you throw away the information that this is the same sample measured at two different times, which is why the authors didn’t do it.

  17. dsquared wrote (6:02am), in reference to my (AMac’s) earlier (5:22am) post:

    >There’s a name for putting arguments in my mouth that I never made, as well.

    I see that you and “Daniel” of Crooked Timber are the same person. You ended your C.T. criticism of Love’s post with:

    They don’t actually raise any principled reasons why the confidence interval ought to be wider than the one published, and therefore they aren’t raising any questions which would make us think that this confidence interval should include zero.

    As best I could tell, you were saying that any valid critique of the confidence interval must “make us think that this confidence interval should include zero,” i.e. no excess civilian casualties due to the invasion. (On writing this, I have inserted the word “excess” in the preceding sentence.)

    I don’t see that any of the numerous criticisms by Love invoke a claim that “the confidence interval should include zero,” however “zero” is defined. I thus made reference to this sentence as a straw-man. If, indeed, this claim was made, it was not particularly important to Love’s point, or to the points made by those of us who expressed skepticism of Roberts et al. in her comments.

    Your response above is:

    >There’s a name for putting arguments in my mouth that I never made, as well.

    So you are implying that I am a liar.

    I recall your exchange with Steven Den Beste. Your conduct in that affair was not a model of civil discourse. Nor is your conduct here.

    Readers can use this link to your Crooked Timber post to see if they can discover what your “zero” comment meant. Or they can read your 6:02 am explanation, reposted below:

    The argument I made was that the survey’s results are interesting despite the wide confidence interval, because that confidence interval is not wide enough to include zero. Why you lot don’t believe that is your own affair, though I hope nobody thinks I am silly enough to believe you would be raising any of these quibbles if the survey had found 100,000 fewer deaths as a result of the invasion.

    Factually, none of Shannon’s arguments does give a reason for thinking that the confidence interval should be wider than the one reported; the objections raised to the cluster sampling methodology are all taken account of in the stadnard errors and design effects estimated.

    This lengthy response doesn’t make your original assertion any less of a straw man. Nor does readily imputing base motives to other commenters.

    To do so, you would need to provide a link to a “ChicagoBoyz” post that makes the “zero” point that you have presumed to rebut. Kudos if you can; further applause if you acknowledge that, contra your original Crooked Timber post, this is at most a minor part of the credible concerns about this study that have been discussed on this web-log.

  18. dsquared,

    Just make myself clear I said that:

    “(1) did not acknowledge that their studies measurement of pre-war infant mortality differed significantly from all other published sources and (2) they did not seek to explain the difference.”

    In (2) I meant that they did seek to explain the difference between the results of their study and the results of other studies of infant mortality for the same period.

    Here is the paragraph from p 6 where they discuss infant mortality:

    “It is possible that deaths were not reported, because families might wish to conceal the death or because neonatal deaths might go without mention. In other settings, under-reporting of neonatal and infant deaths in similar surveys has been documented. 18,19 In particular, the further back in time the infant death occurred, the less likely it was to be reported. The recall period of this survey, 2·7 years, was longer than most surveys of crude mortality. Thus, infant deaths from earlier periods might be under-reported, and recent infant deaths might be more readily reported, producing an apparent but spurious increase in infant mortality. We do not think that this is a major factor in this survey for two reasons. First, the preconflict infant mortality rate (29 deaths per 1000 livebirths) we recorded is similar to estimates from neighbouring countries. 20 Second, the January, 2002, to March, 2003, rate applied to the 366 births recorded in the interview households post-invasion would project 10·4 infant deaths, whereas we noted 21 to have happened. Of these, three were attributed to coalition bombings and three to births at home when security concerns prevented travel to hospital for delivery. Thus, most of the increase in infant mortality is plausibly linked to the conflict, although we acknowledge the potential for recall bias to create an apparent increase in infant mortality. “

    The studies authors list reasons why their measurements might be inaccurate but at no time do they acknowledge that the result they did actually get for pre-war infant mortality is strongly at odds with all other published studies. In fact, they pat themselves on the back for getting a rate the same as neighboring countries. (It’s exactly the same as Jordans rate) Since we know for a fact that Saddam denied medical care, electricity and clean water to vast swaths of the Shia population does it really seem credible that pre-war Iraq had an infant mortality rate on par with Jordon’s?

    The researchers, reviewers and publisher should have mentioned that the studies pre-war rate was extreme. They did not. In fact, they sought to imply that the study’s rate was reasonable as it compared to neighboring nations. Such lies of omission might be acceptable in other arena of discourse but they are not in the sciences.

  19. This is the Milloy-style “devastating critique” in full cry; invent a bogus standard, apply it only to arguments whose conclusions you dislike, and hope that uninformed onlookers will think that there’s something going on.

    The only basis you have for asserting that “the researchers, reviewers and publisher should have mentioned that the studies pre-war rate was extreme” is your own say-so. The issue was discussed in significant detail, a fact which you should have made clear, but did not. They decided that the comparison of their 2003-4 estimate to a UNESCO number based on work in 2000, and a CPA number based on nothing very clear at all, was not a good comparison to make. Since one of the authors was Richard Garfield, who knows vastly more about Iraqi child mortality numbers than anyone else, I am inclined to accept their judgement. You apparently think otherwise, but you are able to express that view because they mentioned the issue in the first place.

    Your criticism is hollow. And you should be very wary of raising slurs like “lying by ommission” and “not acceptable in the sciences”, because you would not look very good if held to these standards yourself.

  20. Thanks Shannon for your article. I am one reader who values your input and research on the numbers.

    Here’s my input. When I read the headline, approximately a week ago on my Yahoo! Browser, that 100,000 Iraqis had been killed, my response was, “WHAT?

    I have learned through the years, that if you are either persistent or patient enough, the truth usually surfaces. Hence, I read the whole Yahoo column. Usually reading leftist material; i.e. Yahoo, MSN, New York Times, etc. I have found if I read “da hole thang” I can get the truth.

    You see I only have a High School Education which limits my thinking powers, but then again I have many years of experience which sometimes helps with my intellectual infirmity.

    My point is this; the more you read (persistence), the more you understand (truth). The less education that you have, the less you read, the more experience is required (patience), to get to the truth (read more).

    So here………If you stop reading at the headline “100,000 Iraqis killed” you can make the assumption that “Bush is a madman and he just loves killing people” and that you should vote for Kerry. I would assume that people in this category only have a 3rd grade education.

    If you continue reading, as most people of substance would do, they would realize that an esteemed professor from a highly regarded institution, did in fact pull the data together, and I would like to be associated with that level of intelligence……you probably have a high school education, but would like to be linked to the elitist’s on the left—and vote for Kerry.

    Now what separates me from the other high school grads is this; I believe for every 10 years you live past the age of 20, it is equivalent to 1 year of college. So if you live to be 60 and have a high school education, it is the same as a 4 year college degree. I have a B.A. in B.S. This has allowed me to pursue and recognize the truth, and has given me the ability to wade through the sludge of political waste.

    One disturbing item, I have not resolved, is the problematic educational level of some of those who oppose your critique of the BS.
    I can only assume that some of your detractors hold Doctorate degrees, because they use words like ”penultimate paragraph”, yet their arguments lead me to conclude, they didn’t read “da hole thang”; Or could it mean he only has a 3rd grade education and is in fact 190 years old, to obtain the Doctorate Equivalency Degree. If in fact he holds the DED. It could be concluded he suffers from a physical impairment versus the assumed intellectual impairment. Sad, but he couldn’t read “da hole thang”, to get to the truth. Shame, he probably voted for sKerry.

    I’m truly thankful for the opportunity I’ve had to obtain my BA in BS. If for no other reason than my years of service. I also would like to take this opportunity to thank persons like your self that continue to literally “Flush” out the BS.

    Not to be confused with “William Hung”; I have never had writing lessons!

Comments are closed.