15 thoughts on “Five Thought-Provoking Statistics Problems”

  1. Good article.

    While fascinating Abraham Wald’s Memo is not an example of statistics. No statistics were required. It is a concrete example of a joke I heard as a child:

    “A plane crashes on the US-Canadian border. On which side were the survivors buried?”

    “A plane crashes in England with damage to its fuel tank. Where should armor be increased on the plane?”

  2. I think you actually do need statistics for the Wald problem….haven’t read his detailed analysis, but seems that the distribution of hits on the aircraft is important.

    Suppose the enemy flak and fighters were, for whatever reason, successful only at hitting fuselage and fuel system (which I assume means tanks and fuel lines, etc)…and suppose that these hits brought down a plane 10% and 30% of the time, respectively. Then, even though 70% of the planes receiving such hits made it home, you’d still want to armor those components, to the extent this could be done without harming aircraft performance too much)

    But if hitting the fuel system and the tail assembly were equally probable, and NONE of the planes that came home had hits on the tail assembly, that would be telling you something about the vulnerability of this part of the aircraft, and would lead you to conclude that it should be protected…if that were indeed possible.

  3. The Wald problem sure got me – then reading the solution – slapped myself on the head – it was obvious – but outside the box. Of course it would depend on the plane. A plane like a Mustang – with a liquid-cooled V12 – would be brought down with a hit to the cooling system or an oil line.

    A lot of p47s – with a radial engine – came back with one cylinder shot off.

  4. This is used all the time in medical diagnostics. Bayes Theorem is the basis of most diagnostic tests. Many doctors in practice don’t understand it and that leads to wrong diagnoses or too many tests. Another important diagnostic concept is ROC or Receiver Operating Characteristic.

    The first can be done with a four cell spread sheet. The cells are true positive and true negative, plus false positive and false negative. The ROC does the same calculation in graphic form.

    In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. the fraction of false positives out of the negatives (FPR = false positive rate), at various threshold settings. TPR is also known as sensitivity, and FPR is one minus the specificity or true negative rate.

    All diagnostic tests are based on these theories. Ask a general surgeon what is the probability that a positive mammogram means a positive biopsy. Most will get it wrong.

  5. Mike K,

    In meteorology & ocean modeling we use ROC curves frequently but there we call them Relative Operating Characteristics. We’re trying to measure the accuracy of our weather models by looking at the ‘hit’ rate (forecasting a wx event that happens) vs the false alarm rate (a forecast that didn’t happen) for various probability thresholds.

  6. }}} “A plane crashes in England with damage to its fuel tank. Where should armor be increased on the plane?”

    Erm… depends on whether it crashed because it ran out of fuel due to leaks, don’t it? :-D

  7. Abraham Wald’s Memo is an important statement of selection bias. People forget that statistics don’t describe reality, but meerly measurements taken of reality by inherently faulty means. Something that isn’t captured by the measurement tools, e.g. planes that never came back, simply won’t show up in the stats.

    Another good example is crimes prevented by the presence of a gun even though a gun was not used. The FBI doesn’t even track incidents in which guns were displayed to deter crime and its like the majority of those incidents are never reported. Even if they did, it wouldn’t capture the crimes that never occur because the criminals are afraid the target is armed. Up until cell phones and gps, rural households should be easy pickings to roving gangs of criminals who could ambush a household in the night, commit their crimes and the drive off into the darkness.That was tried back in the late-20s and early-30 but it proved short lived because of the odds of being shot by a farm family and their impressive arsenal.

    Brayer’s theorem is a major reason why screening for terrorist at airports and other places is so hard. The number of actual terrorist moving about or moving to attack in probably under 100 on any given day, while there are several million air line passengers world wide. Even if the detection system is 99.9999% percent accurate, the false positives will swamp the system.

  8. Someone explain to me why (assuming an honest game) the goat/car game results would be better if one switched after being shown a door with a goat behind it. I can’t see anything but a one chance in two whether you switch or not.

  9. The Monty Hall problem is highly counter-intuitive. The first time I encountered it I actually wrote a little script to run it several thousand times to confirm it to myself.

    You have to think of the problem “backwards” and instead of calculating the odds that door you picked is correct, concentrate on the odds that the prize lays behind all the other unchoosen doors as a set.

    In the initial statement of the problem, the probability of winning by choosing one door A from three doors A,B,C is 1/3. However, the chance that car is behind either B or C is 1/2. People intuit that door A retains its 1 in 3 chance after door C is removed but they miss that door B also retains its initial 1/2 probability. Since 1/2 > 1/3 switching to door B usually pays off.

    I think this paradox arises because of a quirk in our information processing. It has a formal name, which escapes me for the moment, but the quirk is that we will make different decisions based on the sequence of the information presented to us and the emotional connection of that information.

    E.g. You give people a choice between two military options for army of 1,000 men. You say, “In plan A, 600 of the soldiers survive and in plan B 400 will be killed.” Ask to respond quickly, most people will choose plan A presumably keying off the 600 survivors and it’s first place in the sequence.

    You see this effect exploited in attacks on pharmaceuticals. Trial lawyers and activist will say, “This drug harms 5% of the people that take it!” while the drug companies will say, “95% of the people who take this drug benefit form it with no harm.” Both are telling the truth but individuals’ responses to each statement will be different. In the former, they will believe the drug dangerous but not in the latter.

    Our brains are wired for speed first and accuracy second. That works most of the time, especially under primitive conditions, but it can really trip us up in more complex situations.

  10. Shannon: “In the initial statement of the problem, the probability of winning by choosing one door A from three doors A,B,C is 1/3. However, the chance that car is behind either B or C is 1/2. People intuit that door A retains its 1 in 3 chance after door C is removed but they miss that door B also retains its initial 1/2 probability. Since 1/2 > 1/3 switching to door B usually pays off. ”

    No, I don’t think so. The original probability (1/3) is immaterial. The problem lies in the probability after a goat is revealed. At that point both doors have a 1/2 chance (unless some psychological tendency is being played or preyed upon). My poppycock radar is going off!

  11. David, I followed the link to the wiki article. The Vos Savant solution appears to consider the case of the door that has been opened (and a goat revealed) as still being an option. That open door is not, sanely, an option. Clearly, (in the linked article in the post) that goat-door remains open and the contestant chooses between the two remaining doors.

    Perhaps the Business Insider misstates the case?

  12. Call the door picked by the contestant Door A.

    The goat is behind A, B, or C.

    But Monty Hall is not, by the rules of the game, going to open the door that the contestant has picked; therefore, he must open B or C.

    If a goat was actually behind A, then Monty MUST pick the single door that has the other goat…call it C. In this scenario, the probability of a car behind A is zero, and the probability of a car behind B is one. So the value of “stay” under this scenario is 0, and the value of “switch” is 1.

    If the CAR was actually behind A, then Monty can pick either of the two remaining doors, because they both have goats. So the value of “stay” under this scenario is 1, and the value of “switch” is 0.

    BUT, the probability that we are in scenario goat-behind-A is 2/3 (two goats and only one car), so chance of getting a car with a “stay” strategy is (2/3)*0 plus (1/3)*1…which is 1/3. And the probability of getting a car with a “switch” strategy is (2/3)*1 plus (1/3)*0…which is 2/3.

    I think that logic is right…it still feels counterintuitive, though.

  13. Tyouth,

    Shannon: “In the initial statement of the problem, the probability of winning by choosing one door A from three doors A,B,C is 1/3. However, the chance that car is behind either B or C is 1/2. People intuit that door A retains its 1 in 3 chance after door C is removed but they miss that door B also retains its initial 1/2 probability. Since 1/2 > 1/3 switching to door B usually pays off. ”

    No, I don’t think so. The original probability (1/3) is immaterial. The problem lies in the probability after a goat is revealed. At that point both doors have a 1/2 chance (unless some psychological tendency is being played or preyed upon). My poppycock radar is going off!

    Sorry, typo. I should have written:

    In the initial statement of the problem, the probability of winning by choosing one door A from three doors A,B,C is 1/3. However, the chance that car is behind either B or C is 2/3 .

    Suppose the door removed is C. If the door removed is always a non-winning door, then door B “inherits” all the summed chances of winning of the all the removed doors. 2/3 in the case of three doors or 9/10 in the case of 10 doors and so on.

  14. then door B “inherits” all the summed chances

    Dude, “inherits”?

    Fellows, the question is what should he do. Forget 3 doors. One door is completely eliminated. 2 doors, 1 car when the choice must be made. You two aren’t haveing me on are you?

Comments are closed.