Sometimes forgotten lessons will get refound. Writing up a comment on the breast cancer guidelines brouhaha, I dredged up what turned out to be an inappropriate analogy, but one that is useful elsewhere.
Remember LynxGate? The allegation at the time (early 2000s) was that forest service employees falsely added lynx hairs to collection samples in order to get habitat declared protected when it should not have been. After investigation, a more complicated story emerged, one of false consensus, unauthorized controls/faked samples, and a general finding that there was no conspiracy.
The 1998 Weaver survey, at the time considered reliable but since discredited, showed a much more extensive lynx habitat than the federal three year survey was detecting. Independently, a couple of government employees decided to submit control samples of lynx hair, one obvious, the other less so, without going through the normal process of creating such controls that would ensure that their data would not get mixed in with the rest of the survey results. The intention, as reported to the investigators, was to ensure that lynx was not getting misidentified as domestic cats (feral domestic cats do live in the woods sometimes).
The lesson that a false consensus can make scientists skip certain safeguard protocols got buried as the right found itself embarrassed and the left uninterested in any sort of blood sport against people on its side.
Fast forward to today’s Climategate. From the Harry Read Me. we find, about 40% of the way in:
If an update station matches a ‘master’ station by WMO code, but the data is unpalatably
inconsistent, the operator is given three choices:
You have failed a match despite the WMO codes matching.
This must be resolved!! Please choose one:
1. Match them after all.
2. Leave the existing station alone, and discard the update.
3. Give existing station a false code, and make the update the new WMO station.
Enter 1,2 or 3:
You can’t imagine what this has cost me – to actually allow the operator to assign false
WMO codes!! But what else is there in such situations? Especially when dealing with a ‘Master’
database of dubious provenance (which, er, they all are and always will be).
False codes will be obtained by multiplying the legitimate code (5 digits) by 100, then adding
1 at a time until a number is found with no matches in the database. THIS IS NOT PERFECT but as
there is no central repository for WMO codes – especially made-up ones – we’ll have to chance
duplicating one that’s present in one of the other databases. In any case, anyone comparing WMO
codes between databases – something I’ve studiously avoided doing except for tmin/tmax where I
had to – will be treating the false codes with suspicion anyway. Hopefully.
One of the things that happened in Lynxgate was that the “obvious” control being sent in was not so obvious to the lab which had in other contexts seen plenty of legitimate samples be that sloppy. They treated it as legitimate data.
So what happens if somebody randomly decides to give the CRU unit at the UAE a bit of control data with not so unusual but falsely high values? In 2 out of the 3 choices the control will be included with the rest of the data. In option 3, a false station would be added to the list of WMO stations and used going forward. This is part of the process of good databases going bad and bad ones not being corrected that Harry famously complained about just a little bit later in the same file.
Somebody will, if they haven’t already, claim that nobody would ever just submit false data, that this can all be explained away by climate station central offices not keeping up with new stations in the field. And that would sound plausible, unless you’ve forgotten that obscure scandal that wasn’t, Lynxgate where they did just that based on the mistaken conclusions of a soon to be discredited study.
But this isn’t the only past scandal that is illustrative of the large potential problems facing CRU. Pulling in Briffa’s suspect Yamal chronology you have an additional difficulty. It seems that some data points are more equal than others in the climate game. Any unusually influential data points now have to also get traced back to an actual station, something that hasn’t been done on any of them.
And how good are those actual stations? Anthony Watts’ experiment over at surfacestations.org is pointing to the answer “not very”. If you look at a global map of stations it’s amazing how many of the stations are in the USA. Watts’ survey of all USHCN stations is 82% complete and only 10% of stations have an NOAA error rating of less than 1C.
So without any conspiracy we seem to be betting trillions on science that does not adequately coordinate to prevent control data from entering real data sets, has practices in the discipline that are inadequate to guard against undue weight, and is taking large chunks of its data from weather stations whose error bars far exceed the global warming signal we’re all supposed to be worried about.
At this point a finding of “no conspiracy” would not reassure me. It should not reassure us at all.