The Fatal Difficulties of Medical Studies

Megan McArdle posts about the shallow and imprecise study done to justify the insane symbolic tradeoff between zero preservation of the ozone layer versus dead asthma sufferers. One commenter so precisely explains the real world challenges of creating a statistically valid study that I feel justified in reproducing it here in its entirety. 


Megan said,
■Small sample: the smaller the sample, the harder it is to find an adverse effect. That’s why drugs like Vioxx made it to market: distinguishing problems from background statistical noise needed a lot of patients. I know more than one analyst who argues that medical studies are generally too small–because humans are so variable, they don’t reliably pick up any but the strongest effects.
Just curious – Since finding and examining enough patients is the biggest expense behind developing drugs, are you willing to increase the cost of future drugs by 20-30% to have a larger sample size? 40-50%?
I’m intimately involved in this process, and it’s unbelievable to watch these study protocols get designed….as an example:

Study X is for Chronic Obstructive Pulmonary Disease. It needs 2000 patients with moderate-to-severe COPD who will fail a POST-bronchodilator breathing test (i.e. even after they get a drug, they still can’t breathe at 70% of normal).
The patients can’t have any confounding major medical conditions.
Patients in this population tend to have received alot of drugs which start to cause eye problems. But patients in this study can’t have any existing eye problems, so that they can study whether the investigational drug causes such problems.
The patients have to agree to several 12-hour long study visits, agree to coming off all of their current medications for at least 4 weeks prior to receiving study drug. Or placebo. Let’s not forget that they might be required to take a placebo.
They might have a exacerbation while they stop taking their medication, and the exacerbation might be enough to hospitalize them. But since they haven’t started taking the study drug yet, their hospitalization wouldn’t be covered by the study sponsor.
They won’t really get any compensation for their participation, because it can’t be sufficient to make them WANT to participate. That’d be unethical.
And they won’t be able to get the study drug after their participation ends, because it’s investigational.
So basically, you need to find people with a significant illness who are willing to put themselves through hardship, sacrifice, and possible severe health problems for the sake of altruism.
Do you have any idea how hard it is to find 2000 people fitting that profile? The time and money involved in simply IDENTIFYING those people?
To give you an example, I just spent about $1 million finding 50 such individuals. That doesn’t include conducting the actual research. That’s just creating awareness, pre-screening them, and connecting them with the doctor’s office.
And this is just one study….you need several such studies to get a drug approved. And many more studies before that to show the drug is worth taking into a 2000 person study.
My point is: People have no idea what it takes to conduct such research, and why drugs are so commensurately expensive. If you want less statistical noise, be willing to pay for drugs that are much more expensive.

The ugly secret of modern medicine is that for many drugs and procedures we don’t have statistically significant population sizes to conduct truly valid studies of them. Neither do we have the ability to test for all of the possible drug combinations that real-world patients have to take. We’re guessing a lot more than we would like to admit in many cases. 

We get bombarded by conflicting medical information, in large part due to the practical limitations of medical studies. The statistical significance of many studies is dubious due to their population size even before you get into questions of individual variations of the real people in the study. For example, few studies actually track the race of individuals involved and just assume that all races respond to medications the same, even though we know that is often not true. Yet chopping up the already limited study population into even smaller groups based on race completely destroys any statistical significance the study might start with. 

We need to reexamine how we conduct studies and how we process their information in medical and political decision making. For starters, we could start paying subjects to compensate them for the risk they take for the common good. After all, we pay soldiers. Why are people who put their lives on the line battling disease any less deserving of compensation than those who battle human evil? I doubt that the desire for financial gain will distort studies more than does the dearth of subjects we currently have. We also need an easy to understand system of grading studies for the media based on their statistical and methodological soundness. For example, we could simply grade studies as being “A”, “B” or “C” grade starting with large, long-term studies, and going down to studies with a dozen subjects over the course of week. That way, people could see at a glance how sound a study’s results are likely to be. 

I’ll say it again. Having bad data is worse than no data at all. This is especially true when political symbolism drives the creation of the data. We can expect to see more and more of this as the Left tries to use a veneer of shallow science to justify policies they’ve already decided on. 

13 thoughts on “The Fatal Difficulties of Medical Studies”

  1. Having bad data is worse than no data at all.

    Well, leaving aside the pharmaceutical example and addressing this as an abstract epistemological concern, I think that this depends on how bad the data is. Isn’t it better to know that we don’t know something than to know something that isn’t true? That is, isn’t it better to know that we’re guessing about something than to confidently make decisions that a based on a faulty premise?

  2. John,

    That is, isn’t it better to know that we’re guessing about something than to confidently make decisions that a based on a faulty premise?

    Well, yes that is why bad data is worse than no data. With no data, you think, “I don’t understand this very well I should be cautious.” With bad data, you think, “Okay, I understand this I don’t need to be cautious.”

    If you don’t have a map, you travel cautiously always getting your bearings. If you have an incorrect map, you follow the map blithely until something goes badly wrong.

    It’s like Will Rodgers said, “It’s not what we don’t know that’s the problem, it’s the things we know that ain’t so.”

  3. I cannot understand the need for a placebo for testing drugs in a well researched field where at least twenty studies have already been done where a placebo has been tested. There is absolutely no problem in using placebo data from another study if the sample requirements for the current and the placebo test are the same.

    It is unethical to trick people into taking a placebo when the placebo results alreay exist from many, many other tests. Actually, it is cold blooded murder which is committed only because it is required by the laws written by the FDA.

  4. If you don’t have a map, you travel cautiously always getting your bearings. If you have an incorrect map, you follow the map blithely until something goes badly wrong.

    Oh, yes. Duh. I’m not sure how upon rereading the post, but somehow I thought you were saying the opposite of this. Thanks for the clarification.

  5. Sol Vason,

    I cannot understand the need for a placebo for testing drugs in a well researched field where at least twenty studies have already been done where a placebo has been tested.

    That’s called an “historical” control and it is used in the cases of treatments for acute terminal illnesses. For example, if you want to try a new cancer treatment you don’t eliminate all other treatments and then put half the subjects on placebo.

    The problem with historical controls is that they introduce more variables and therefore reduce the sensitivity of the study. To compensate, you have to conduct a larger study and then you’re right back where you started data-wise.

  6. There are a couple of interesting studies, done years ago, that bear on the question of placebo. Doing randomized trials in surgery is extremely difficult because of the issue of placebo effect. Sham surgery is required and ethical issues usually ban such controls. However, one such trial was conducted when I was a medical student. It was in the days before coronary bypass (although I still wonder why no one thought of it until Favaloro) and surgery was uniformly unsuccessful although many operations, like the Vineberg procedure, were in vogue for a while. There was a report from Italy that ligating that internal mammary arteries would increase collateral circulation to the heart via the arteries that supply the pericardium. It was difficult to explain why this should work as the arteries involved are tiny but there was a lot of interest including articles in Reader’s Digest. Several surgery departments performed randomized trials because the procedure was a minor one and lent itself to a sham surgery control. Two small incisions were made on either side of the sternum. Anyway, the U of Washington study showed considerable improvement, including one patients whose EKG reverted to normal. Then they broke the code and the fellow whose EKG improved was in the sham group.

    That was a classic example of randomized trials in surgery but was duplicated in recent years with arthroscopy of the knee. There was a vogue for scoping arthritic knees and washing them out and snipping a few spots but with no major procedure. Many patients felt better. A study was done that included a sham surgery control series. A discussion is here.

    Placebo effect is powerful. It will heal 30% of ulcers. Eliminating it in studies of drugs that may add only a 10% benefit over present therapy can be a challenge.

  7. Some drugs with a popular off-label use may never have been subjected to a study.

    I take primidone and nadolol to control essential tremor. (Nadolol also serves as a migraine prophylactic.) Because tremor isn’t a medical problem that must be treated, only a damned nuisance, no one’s going to do an expensive and time-consuming study to find a safe, effective drug. Rather, what happens is that someone is being treated for epilepsy or high blood pressure, and happens to say to the doc, “hey, my hands don’t shake anymore”. And the docs apparently have a mechanism for comparing notes, so there we are. Works for me.

  8. Many of the basic drugs we use were the result of accidental observation. The drugs for type II diabetes were discovered when patients being treated for infection with sulfa drugs in France developed dizzyness and sweating. They were found to have low blood sugar. This was eventually used in the development of the oral agents for diabetes. Likewise, the use of an anti-tuberculosis drug related to Isoniazide was found to elevate mood in depressed patients. It led to the tricyclic antidepressant drugs. Lithium for bipolar disorder was found by accident when the researcher was testing guinea pigs for a theory of depression that was proven invalid. Still, the lithium improved depression. Lots of serendipity in medical research.

  9. Karl Gallagher,

    That is true of new vaccines for diseases which have themselves have a low level of complications. It’s not true in the case of vaccines which have been in use for decades for diseases with high rates of serious complications.

  10. Michael Kennedy,

    Lots of serendipity in medical research.

    Related to the other post on medical records, detailed medical records could help uncover more unsuspected useful treatments.

  11. The biggest gap in vaccine research is looking at the cumulative impact of multiple vaccines. No one’s looking to see if 24 doses in 18 months is too much for an immature immune system. Likewise, grouping three vaccines into a single injection is being approved without seeing if the impact would be easier if they were spread over more time.

    There’s also the sample size problem. I’ve seen interesting speculation that vaccines may have worse impacts on children with a family history of autoimmune disorders. The existing studies aren’t large enough to find such effects, and there’s no funding for specific studies.

  12. I do not doubt that a placebo effect exists. It is the basis of faith healing. A charismatic doctor, shaman or evangelist can cure true believers by force of will alone. This is why the double blind protocols exist. My argument is that if you have 20 or more studies on the exactly same ailment, and if the effectiveness of the placebo remains the same across all 20 studies, and if the samples used in all 20 tests are the same, and if these “ifs” are confirmed at the 95% confidence level by the proper statistical tests, then one may assume that in the 21st test that the placebo effect will be the same as it was in the previous 20 tests provided the ailment, the sample and the placebo are the same.

    Remember, drugs are tested against very carefully defined ailments. The samples are carefully assembled to represent the universe of victims suffering from this ailment. The FDA has specific formulae for placebos. Therefore, these conditions can be met if the 20+ tests were themselves acceptable to the FDA. Obviously the methodologies should be compared and replicated in the current test.

    Under these conditions I do not think it is necessary to experimentally confirm in the 21st test that the placebo effect is unchanged from the previous 20 tests. It is reasonable to conclude that for this specific ailment, for a specific placebo, and for a given sample composition, and a giveen test design and methodology that the placebo effect is known.

    After all, the placebo has been tested 20 times and has produced the same results 20 times. It will be behave the same in test 21. Indeed, if the placebo effect in the 21st test is statistically different from the previous 20 tests, then this would indicate a fatal flaw in the design of the 21st test.

Comments are closed.