Number Gut

When I was in college, one of my professors used to complain that too many of his students had no “number gut.”

A number gut is an intuitive feel for the possible magnitude of a particular number that describes a particular phenomenon. A good number gut tells you if the results of some calculation are at least in the ball park. People develop number guts through experience with particular phenomena but they also develop it just by doing a lot math by hand. When you do math by hand, you have to do more physical writing to deal with very large numbers so you develop a kind of visceral sense of scale. The coming of calculators, however, destroyed this physical relationship, leading many budding scientists to make gross errors of magnitude without realizing it.

The lack of a number gut destroys any sense of context for numbers that describe a phenomenon, leading people to casually accept as valid statements that a little double-checking would show to be just plain silly.

For example, there was news story published back in the late 80s that reported that the state of New Jersey produced 50 billion used tires every year which caused a huge environmental problem. The story got widely disseminated before somebody pointed out that since New Jersey had a population on only around 8 million, 50 billion tires a year came out to 6,250 tires per capita per year. The story got play because the editors had no intuitive feel for the significance of 4 orders of magnitude difference between the size of the population and the tire consumption.

Which brings me to the subject of the Lancet Iraqi Mortality Survey (LIMS) [free reg].

A lot of people who would know better in another context seem perfectly willing to swallow the estimate of 300,000+ dead that LIMS reports with the Falluja cluster included. Examined in detail, LIMS reports that of those 300,000, roughly 250,000 died from violence, and of those something like 220,000 died from Coalition airstrikes. The LIMS authors even suggest [p6 pg7] that this is likely an underestimate.

Anyone with a good number gut for such phenomenon would immediately recognize such numbers as implausible.

Why couldn’t 250,000 be dead from violence? Well, the first clue is that the total population of Iraq is around 25 million, so 250,000 dead represents 1% of the entire population. That means if LIMS is accurate then 1 in every 100 Iraqis were killed in the war up to Sept 2004. So what? After all, it’s a war and lots of people die in wars right? Well, not as many as most people think.

For example, during WWII the Japanese mainland suffered the most extensive aerial bombardment in history. Every major urban area save one (Kyoto) was burned to the ground. On march 10th, 1945 the great Tokyo fire raid burned down a third of the city and killed 100,000 people. Two major cities were nuked. Japan at the time had a population of 78 million, so 1% of the population would have been around 780,000. Now, what is your guess as to the number of Japanese killed on the Japanese mainland?

Did you guess around 500,000? Under 1%? Well, that is in fact the number (note: that’s only dead, not dead-and-wounded).

So, with the Falluja cluster included, LIMS asks us to believe that Iraq has suffered a worse proportional aerial bombardment than did Japan during WWII. Common sense compels us to ask: does Iraq look like it suffered such a fate? Where are the mass graves? Where are the leveled cities? Where are the hundreds of thousands of walking wounded? Where are the millions of refugees that such intense fighting must have inevitably produced?

Worse still, given the known geographical areas where the fighting occurred, most of the deaths would have had to be concentrated in an area of 100 klicks or so from Baghdad, which would have meant an even higher percentage of the local population killed and the physical evidence even more obvious. (After the recent publication of the ILCS, it also means that the deaths would have to been compressed in time as well. The ILCS reported only 24,000 war related deaths up until May 25, 2004. For the LIMS to be true, the additional 200,000 deaths would have to have occurred between then and early Sept 2004 when LIMS was conducted. That comes out to roughly 2,000 deaths per day.)

Even the most casual student of military history or, indeed, just a curious person with access to Google, should instantly know that the 250,000 figure would be far too high based on the direct observation of facts on the ground. You can’t kill that many people without leaving massive physical evidence. There is absolutely no precedent for killing that high a percentage of the population with air power (or even ground forces) but leaving so few clues that the information could only be teased out by an epidemiological study.

People are doing the same thing with LIMS that they did with New Jersey tire story. They hear a number from an authority figure and then accept it without thinking about its real-world implications. Had similar results been offered in a less political context they would have been rejected immediately. People’s number gut or even just their basic BS detectors would have gone off when they heard the 300,000 number. That in turn would have led the more experienced to question the basic methodology of the study. Everyone should have wondered whether it was good practice to use the experiences of one little neighborhood of 30 houses as representative of the wider experience of everyone in Iraq or even of just one governorate. At the very least, it should have caused the complete removal of the data from the cluster that produced the nonsensical results.

That so many swallow such implausible numbers without any other supporting evidence (and indeed actively ignore opposing evidence) indicates how desperately they want the liberation of Iraq to be judged a failure. They are willing to corrupt our scientific institutions, and to provide powerful propaganda for the fascists, in their selfish pursuit of their own narrow political goals.

21 thoughts on “Number Gut”

  1. Bravo. A nice, clear statement of the obvious, for the many who need a map to find it.

    If our air attacks had done to Iraq, proportionally, what we did to Japan between March and August 1945, the whole world would have seen it very clearly on TV. There would be no quibbling about methodology in journal articles. Baghdad would be a field of craters. There would have been columns of trucks stacked high with thousands of charred civilian corpses, and swarms of back-hoes and bulldozers digging the pits to put them in. There would be tens of thousands with burns or other traumatic injuries, and hundreds of thousands de-housed and living in home-made dugouts in the rubble. What the US did in Iraq with aerial bombardmentis simply not on the same scale of destruction, and that is obvious at a glance. Here is downtown Tokyo after the fire-raid. The standing buildings are gutted shells. That is what it looks like when you kill 1% of the population by means of air attack. It didn’t happen in Iraq. (Here is an eyewitness account of what it was like on the ground during the Tokyo raid.)

  2. Or the women beaten on Superbowl Sunday or kidnapped children or . . .
    Your observation about pencil calculations seems really good – we “feel” the numbers differently. Of course, it is also related to the sense there is no real truth so any truth is approximate.

  3. Nevermind. The muslim faithful are intensively trying to kill many hundreds of thousands of innocent Iraqis. It is the way. Death, you know.

  4. Ah, sweet estimation!

    ‘Number gut’ is why in the late 1980s my math teacher made us first use sliderules and then, later, calculators.

    I can certainly tell the difference with folks now who haven’t ever done lots of calculation by hand. Oh and of course before we were let anywhere near the sliderules, we had to first learn how to do everything totally by hand with log tables and interpolation (of several kinds!)

    THEN we could use sliderules (the one I have now is a nice circular one), then calculators, and then computer software.

    Did you know that it takes about 6 hours to calculate and plot a natal astrological chart using only a log table, an ephemeris, a pencil and a straightedge? Less than a second on even ‘slow’ computers, and the output is nicer, and the math is more accurate. All of which is now starting to get harnessed, to the tune of several percent productivity increase a year now that we have all these computers talking to each other.

    Oy, can you tell I’m a math major?

  5. I have found that trying to do calculations in my head is the best way to get a feel for their magnatude. Beyond that its just practice, practice practice.

  6. Pingback: Pseudo-Polymath
  7. My best professor in college was an Astrophysics prof who spent some time on the valuable skill of order-of-magnitude estimation. (In many cases in astrophysics, estimation is about as likely to be accurate as more formal calculation) The skill has proven to be very useful since.

    One curious tool for such estimation:

    The number of seconds in a year is within about 1/2% of pi x 10E7.

  8. We used to be offered “Fermi Problems” of this sort: How many piano tuners in Chicago?

    Enrico Fermi was famous for being able to estimate the neutron capture probability of newly discovered isotopes based on the skimpiest of data. He estimated the mega-tonnage of the Trinity nuclear explosions by scattering bits of tissue paper in the air at the “flash” and measuring how far they were carried by the power of the later-arriving “bang”. And he demanded his students practice problems like piano-tuner estimations.

    You don’t know the population of Chicago, but you know it’s smaller than New York and you know THAT. You don’t know the percentage of families with pianos — but take a guess. Less than half, more than one percent, somewhere. You don’t know how often a piano has to be tuned — take another guess: less than once a week, surely? How many pianos can a piano tuner tune if the tuner turns tunes every day? Etc. The Fermi notion was that guesses tended to be just as likely high as low, and offset.

    Another principle of the Fermi problem was that the math should never be better than the data. Guessing that the population of Chicago was 3 million, and average family size was 4 — 3 million people divided by 4 people per family equals about 1 million families. If about 3 percent own pianos then 1 million times .03 is — call it 36500 pianos. If one piano tuner tunes one a day for a year it would take him 100 years to tune each. If a piano holds a tune for, what? two years each and all tuners stay busy then there must be about 50 tuners in Chicago.

    Doing Fermi problems is like sit-ups — builds a strong gut.

  9. In the realm of the “number gut”, doesn’t the US military keep detailed records of the number of sorties they fly? We know how many bombs and missiles each aircraft return with. Knowing this, we could figure how many people would have had to been killed (based on the LIMS) per munition used (based on US military numbers).

    Would this ratio come close to believability?

    I wouldn’t know where to go to locate the military’s numbers on this, but we heard again and again during the active ground campaign how much ordinance was used, and how often our aircraft were coming back with almost full compliments of missiles and bombs because of the lack of targets.

  10. I haven’t heard of any such gallery. It would be pretty funny, though. Tokyo was literally a scorchhmark. Berlin, which had much more solidly built buildings, was smashed all to Hell. Baghdad from everything I’ve seen is largely intact, though with patches of severe damage here and there. Not downplaying how much Baghdad has gone through, but it is nothing like what happened to Berlin or Tokyo. Similarly good would be to compare pictures of Falujah with Dresden. I have seen some people make that comparison. Falujah is pretty banged up, but Dresden, the whole place was incinerated. No comparison whatsoever.

  11. My “number gut” for this was figuring how many Mai Lai’s per year the Lancet study implied and then asking if it would realy be possible to hide that many “events”.

    The Lancet study is the equivalent of 500 Mai Lais over two years or almost two every three days.

  12. I was recently reading Heinlein’s Expanded Universe collection, which included a couple of non-fiction articles he did about his visit to the Soviet Union. One of the things that he pointed out was a sheer “number gut” calculation— the actual vs. the reported population of the USSR and its major cities.

    Heinlein and his wife did their calculations based on observation— i.e. ‘we went to these cities, and saw and talked to this many people, and based on the physical size of the place, with the density we observed, the city has this many people.’ But the reported number of people was five to ten times that— and they knew that couldn’t be right.

    So Heinlein talked to a friend of his, a military strategist, and asked him the population of Moscow. The friend thought a minute and came up with a number very like Heinlein’s estimate. When asked, he said he’d envisioned the physical map of the place, and from a knowledge of infrastructure such as roads and sewers, determined that the city could withstand no more than the number he quoted before it started breaking down. Which only reinforced the idea that the atlases and other world information sources were using bogus information, given out by the USSR.

    (One can, of course, figure out many reasons for such a misleading set of figures.)

    Of course, now I have to wonder how they went about correcting those figures, and how much of a problem it was when the correction was made. (“The former Soviet Republic lost over half its population last year?”)

  13. I believe your professor’s ‘number gut’ was, in former times, called ‘a sense of proportion’. This was, in its turn, considered part of a ‘common sense’ referred to in classical sources as the virtue of ‘prudence’. Given the current fascination with extreme cases these faculties have, regrettably, gone somewhat out of fashion.

  14. Applying the “number gut” to your numbers gets entertaining. 100,000 dead in a single night in Tokyo, 62,000 from the bomb on Hiroshima, and 38,000 from Nagasaki gets us 200,000 in 3 days. That means the rest of the war, only about 200 civilian deaths per day when we bombed every major city save one. That doesn’t feel right in my gut. Perhaps your 500,000 is low?

  15. Actually reading the Lancet link, I don’t see any claims of 200,000 or 300,000 dead. The abstract summary clearly states 100,000 people. That number is within my Number Gut. That many dead all over the country, yeah its plausible to me.

  16. The UN est. 25 k seems much closer than the 100 k of the Lancet. But a huge issue is determining whether the death was really due to the war or not; I can believe lots more Iraqis die NOT due to the war directly, but with some influence. Bad equipment at a hospital? Lack of medicine or doctors? Bad transportation/ ambulance? … a death that “might” have been prevented.

    Having a number gut is really important, but it’s also part of general range number “truth”. It’s interesting to look at the Japan-China flap over the Rape of Nanking. In the newer nationalist book, the number of casualties is “disputed”, and no range is given. 300 000? 500 000? 50 000? The fact that we will never have calculator 9 digit accuracy for war deaths means all such numbers are estimates. The 6 million/10 million Jews/all murdered in Nazi death camps is a similar issue — without 11 digit accuracy, it’s ‘only an estimate’.

    Most human truths about mass events are only estimates; even the 2500-3000 killed in the WTC. The media has failed to emphasize and explain this fact so that more folk understand.

    (When I read that the Lancet 100 k estimate included a range of between 8 000 and 200 000, it was clearly of little value to me. Just another stick for the Bush-haters.)

  17. Forwarded message
    From: Andy Barenberg
    Date: May 31, 2005 11:05 AM
    Subject: [lbo-talk] Lancet Study (was Basic Social Science)

    Wow – what a terrible review of that study – let’s fact check it:
    > By serendipity via link on MyDD, a Democratic Partyesque blog,
    > October 29, 2004
    > Bogus Lancet Study
    > Via The Command Post comes this study published in Lancet (free reg)
    > which purports that 100,000 Iraqi have died from violence, most of it
    > caused by Coalition air strikes, since the invasion of Iraq.

    Wrong – The study calculates excess deaths FROM ALL CAUSES postwar as
    opposed to prewar. The 95% Confidence Intervals is wide, from 8,000
    to 194,000. Luckily we are starting to get larger sample studies with
    smaller CIs, such as Iraq Living Quality Study – which largely
    vindicate the lancet study.

    > to say, this study will become an article of faith in certain circles
    > but the study is obviously bogus on its face.
    > First, even without reading the study, alarm bells should go off. The
    > study purports to show civilian casualties 5 to 6 times higher than
    > any other reputable source. Most other sources put total combined
    > civilian and military deaths from all causes at between 15,000 to
    > 20,000. The Lancet study is a degree of magnitude higher. Why the
    > difference?

    What source is this? If its from Iraq body count (that has numbers in
    that range) then that would be a study of violent deaths – not deaths
    from all sources. The Iraq Living Quality Survey has recently been touted as
    disproving the Lancet study because it only shows 24,000 violent
    deaths from warfare. The Lancet study number of deaths from war
    violence is 33,000 – however the lancet study covered 18 months after
    the invasion while the ILQS covered only one year after the war – the
    month average death by war is similar but slightly higher the much
    larger ILQS study.

    > Moreover, just rough calculations should call the figure into doubt.
    > 100,000 deaths over roughly a year and a half equates to 183 deaths
    > per day. Seen anything like that on the news? With that many people
    > dying from air strikes every day we would expect to have at least one
    > or two incidents where several hundred or even thousands of people
    > died. Heard of anything like that? In fact, heard of any air strikes
    > at all where more than a couple of dozen people died total?

    Again – the Lancet study covers all causes of deaths not just war
    related violence – Comparing the the lancet study numbers to Iraq Body
    Count one can have a rough estimate that roughly one out of three
    deaths are not reported in the media – a not so unlikely proposition.

    > Where did this suspicious number come from? Bad methodology.
    >From the leading figures in the field of epidemiology. For example the
    Chronicle of higher education
    ( notes

    “Les has used, and consistently uses, the best possible methodology,”
    says Bradley A. Woodruff, a medical epidemiologist at the U.S. Centers
    for Disease Control and Prevention.

    Indeed, the United Nations and the State Department have cited
    mortality numbers compiled by Mr. Roberts on previous conflicts as
    fact — and have acted on those results.

    > >From the summary:
    > Mistake One:
    > “A cluster sample survey was undertaken throughout Iraq during September, 2004”
    > It is bad practice to use a cluster sample for a distribution known to
    > be highly asymmetrical. Since all sources agree that violence in Iraq
    > is highly geographically concentrated, this means a cluster sample has
    > a very high chance of exaggerating the number of deaths. If one or two
    > of your clusters just happen to fall in a contended area it will skew
    > everything. n fact, the study inadvertently suggests that this
    > happened when it points out later that:

    > “Violent deaths were widespread, reported in 15 of 33 clusters…”
    > In fact, this suggest that violent deaths were not “widespread” as 18
    > of the 33 clusters reported zero deaths. if 54% of the clusters had no
    > deaths then all the other deaths occurred in 46% of the clusters. If
    > the deaths in those clusters followed a standard distribution most of
    > the deaths would have occurred in less than 15% of the total clusters.
    > And bingo we see that:
    > “Two-thirds of all violent deaths were reported in one cluster in the
    > city of Falluja”
    Falluja was considered an outlier and was exculded when deriving the
    conclusions had they included it the final result would have been
    higher. the Economist on this issue:
    “The Fallujah data-point highlights how the variable distribution of
    deaths in a war can make it difficult to make estimates. But Scott
    Zeger, the head of the department of biostatistics at Johns Hopkins,
    who performed the statistical analysis in the study, points out that
    clustered sampling is the rule rather than the exception in
    public-health studies, and that the patterns of deaths caused by
    epidemics are also very variable by location.”

    > (They also used a secondary grouping system (page 2, paragraph 3) that
    > would cause further skewing.)
    > Mistake Two:
    > “33 clusters of 30 households each were interviewed about household
    > composition, births, and deaths since January, 2002.”
    > Self-reporting in third-world countries is notoriously unreliable. In
    > the guts of the paper (page 3, paragraph 2) they say they tried to get
    > death certificates for at least two deaths for each cluster but they
    > never say how many of the deaths, if any, they actually verified. It
    > is probable that many of the deaths, especially the oddly high number
    > of a deaths of children by violence, never actually occurred.
    They state that they got 81% of the death certificates they asked for,
    those who didn’t have them had believable reasons why, and the
    interviewers were asked to judge if they thought the person was lying
    – which they didn’t find any.

    > So we have a sampling method that fails for diverse distributions, at
    > least one tremendously skewed cluster and unverified reports of
    > deaths.
    > Looking at the raw data they provide doesn’t inspire any confidence
    > whatsoever. Table 2 (page 4) shows the actual number of deaths
    > reported. The study recorded 142 post-invasion deaths total with with
    > 73 (51%) due to violence. Of those 73 deaths from violence, 52
    > occurred in Falluja. That means that all the other 21 deaths occurred
    > in one of the 14 clusters where somebody died, or 1.5 deaths per
    > cluster. Given what we know of the actual combat I am betting that
    > most of the deaths occurred in three or four clusters and the rest had
    > 1 death each. Given the low numbers of samples, one or two fabricated
    > reports of deaths could seriously warp the entire study.
    > At the very end of the paper (page 7, paragraph 1) they concede that:
    > “We suspect that a random sample of 33 Iraqi locations is likely to
    > encounter one or a couple of particularly devastated areas.
    > Nonetheless, since 52 of 73 (71%) violent deaths and 53 of 142 (37%)
    > deaths during the conflict occurred in one cluster, it is possible
    > that by extraordinary chance, the survey mortality estimate has been
    > skewed upward. ”

    How many times do we need to remind that Falluja was an excluded outlier!

    > Gee, you think? It’s almost as if military violence is not randomly
    > distributed across the population of Iraq but is instead intelligently
    > directed at specific areas, rendering a statistical extrapolation of
    > deaths totally useless.
    > In the next paragraph they admit:
    > “Removing half the increase in infant deaths and the Falluja data
    > still produces a 37% increase in estimated mortality.”

    Again! How many times do we need to remind that Falluja was an excluded outlier!

    In short this review cherry picks quotes to distort what the lancet
    study said. Of course the wide confidence Interval makes larger
    studies a neccessity, but lets not shoot the messenger.

    Andy Barenberg

    PS. More indepth discussion of the lancet study here:


    Michael Pugliese

Comments are closed.