Jim Miller, discussing customer-satisfaction surveys, highlights a common error of inference:
Consumer Reports does not seem to understand that all its surveys, not just those on cars, have a systematic problem; the respondents are self selected, which often biases the results, as any good survey researcher can tell you.
So (following Jim’s example) if the Consumer Reports survey shows the Camry as more reliable than the Corvette, is this because the Camry is really more reliable or is it because people who buy Corvettes tend to drive them hard? The reliability data provided by Consumer Reports do not provide enough information to answer this question.
Similar cases of flawed inference abound and undoubtedly contribute to policy errors. For example, Congressmen often say that their constituents’ letters and phone calls to them indicate support for their position on this or that issue. But if few people contact a Congressman to object to his position on an issue, does that really mean that few constituents object? It might mean that most of the constituents who object assume, perhaps for good reason, that the Congressman will ignore their opinions, and therefore do not bother to contact him. (A friend of mine once telephoned the office of the late Sidney Yates, D-IL, to complain about Yates’s position on some issue. The staff person who took my friend’s call asked him if he was one of those people who listened to Rush Limbaugh, then hung up on him. I am guessing that my friend never called back and that Yates received few critical calls overall.)
There are many other examples of this phenomenon. In my experience, people who work in medicine often overestimate the risks of motorcycling and other risky activities. It’s easy to understand why they do this, since they see many people who have suffered terrible injuries while engaging in such activities. However, the population samples that they see are not representative and therefore their conclusions are unreliable. (I assume that motorcyclists, skydivers et al who have been involved in their activities for some time have generally good ideas of how much risk they face, and have decided that on balance the risk is worth taking. Problems come from outsiders who overestimate risks and/or underestimate benefits, and who think it’s their prerogative to calculate other people’s tradeoffs.)
Public education about basic statistics may be the only way to reduce the negative consequences of such widespread errors of inference. There are already plenty of activists who seek to “educate” us and “raise our awareness” about various issues, many of them frivolous. It’s too bad that we do not have statistics activists as well.
UPDATE: See also David Foster’s post, How Not to Do Market Research.
Related: See my post How Not to do Market Research.
You are of course correct about the respondents. However, I am more likely to trust those who respond and regularly go to a magazine like Consumers–middle class and educated–than a random sampling of the American public. In fact, respondents regularly cite Toyota and Honda as long-lasting and substantial cars that last for many miles. Used car lots and those owners I know and resale values confirm the findings of the “selecteive” surveys done by that magazine.
Josepdh Hill,
I am more likely to trust those who respond and regularly go to a magazine like Consumers–middle class and educated–than a random sampling of the American public.
Nice bit of elitism. You selection criteria biases your sample because people drive different cars or buy other products based on their economic class. By your criteria, Consumer Reports will not be able to determine the reliability of car that few middle-class people drive.
Self-selected reports usually overstate negatives. People will go out of their way to complain but if things work, they stay silent.
But of course I am an elitist! I have a number of degrees and see no reason to dumb down to be a populist in appearance. If you know the Consumeres ratings, then you know that ALL cars are rated and that goes from the lest expensive to the most expensive…the reports are from those that drive low end Fords to those with Lexus SUVs, hardly middle class exclusive. And if the “bias” is in my camp, what do I care? I am buying for me, not for the rest of the nation.
It is impossible to find a review in the automotive press that was not bought and paid for by the manufacturer whose product was reviewed. Even statistics such as interior measurements, curb weight, top speed, horsepower, 0-60, 0-100, stopping distance cannot be duplicated by a new owner measuring his new car. Every fact has been cooked. Every pig has been lipsticked.
In spite of all the truth in advertising, we expect dishonesty when it comes to buying a car and we get it. It is my personal experience that the Consumer Reports surveys give a more accurate view of a particular make/model than does the automotive press or all the brochures in all the showrooms in all the countries on this planet.
The self-selected surveys cannot represent the universe of respondents who may or may not own cars, nor the smaller universe of car owners, nor the still smaller universe of owners of a particular make/model. However a self selected survey can represent the views of those who actually participated and by extension those who enjoy self selecting themselves because they like filling out these surveys.
It is statistically sound to compare the results from one self-selected sample for make/model X with a self-selected sample for make/model Y. An acceptable conclusion is that people who care enough about the car they own to fill out a questionnaire and buy a stamp and put the envelop in the mail box have rated make/model X differently from the way a similar group of make/model Y owners rate make/model Y.
Sample size is vital. Samples less than 100 are useless.
Also remember that the auto manufacturers hire people to fill out and mail in bogus questionnaires. Consumer Reports has 50 years experience in weeding out these bad questionnaires
(they usually have identical answers and usually arrive in the same 30 period) but some slip through. A large sample of honest replies can wash out a few bogus ones).
Implicit in the reporting method used by CI is a strict significane test. Which means that they sow a difference, it is statistically meaningful among people inclined to fill out questionaires about a car they own.
If you are buying tires, a retailer called the TireRack publises self selected surveys of its customers who rate the tires they’ve bought after using them. Again, here is a survey that is unpolluted by tire manufacturers with big sample sizes for each make/model; a survey which is more reliable than articles in the automotive press or even Consumer Reports.
Josepdh Hill,
And if the “bias” is in my camp, what do I care?
Bias is not a social process but a distortion in our own internal thinking that unchecked can kill us by causing us to take foolish actions.
Anonymous,
I have no problem Consumer reports. They probably provide the best source of information about cars and other products available. However, a lot of times “the best” is still crap.
No matter how hard you try, self-selected reports are fraught with error. When I worked at Apple Technical Support we received ratings based on self-selected survey cards. After a few years, someone thought to do a random survey. The random survey reported that two types of customers comprising a small minority answered the surveys: (1) People who had a fantastic experience and (2) people who had an awful experience. The latter reported more often by a factor of (IIRC) five or more. People got promoted or demoted based partially on a flawed measuring system.
I suspect that Consumer Reports suffers the same distorting effect. They probably also have problems with sample size on low cost or older cars.
It’s just important to remember that methodology sets the outer bounds of accuracy.
CR should publish its raw data.
I’ve had experience with Tire Rack and other self-selecting rating systems. They are all highly flawed, but many of the merchants’ systems are superior to that of CR, because they report the raw data — actual customer reviews. So you can tell at a glance how many responses they received, and you can read the unfiltered responses. (NewEgg and Amazon, for example, have good systems of this type, and Amazon’s commenting system even allows other reviewers to highlight reviews that seem to be fraudulent or otherwise grossly flawed.) With CR you have to accept their conclusions on trust, and knowing that they made a conscious decision to withhold data.
Not to make this into a thread about Consumer Reports, but CR does seem to have a lot of loyal readers who become offended if you suggest that CR is less than entirely objective.
I thought some readers might be interested in this piece, in the Saturday edition of NY Times. It is al about Consumers magazine (above)
http://www.nytimes.com/2007/12/08/business/media/08consumer.html?_r=1&ref=business&oref=slogin
Shannon,
I think you’ve missed the point. I agree that self-selected samples cannot project to the universe of all people. However such a sample can answer the question of whether or not one make/model car, tire or alarm clock can be percieved as different from another make/model; please note how carefully I worded that statement. And, given a sample size of 100+, how they compare on specific attributes prelisted on the questionnaire. Questions like “what did you like? dislike? are useless for this purpose. Likes and dislikes are useful only to make sure the list of atttributes is complete.
The data Apple collected can be used only to establish a trend for the group and not for individual evaluations.
In your personal case at Apple, the customer service person (CSP) can help the customer only if the customer can explain the problem and only if the problem can be solved. Some CSP’s are simply more knowledgable than others and their co-workers often refer the toughest problems to them. This results in the best CSPs getting the worst ratings. Surgeons and hospitals have the same problem.
The solution is simple. The manager, personally, should randomly monitor phone calls and then review each call with the CSP, heaping praise where indicated and suggesting behavior changes where required.