The real shocking revelation in the Climategate incident isn’t the emails that show influential scientists possibly engaging in the disruption of the scientific process and possibly even committing legal fraud. Those emails might be explained away.
No, the real shocking revelation lies in the computer code and data that were dumped along with the emails. Arguably, these are the most important computer programs in the world. These programs generate the data that is used to create the climate models which purport to show an inevitable catastrophic warming caused by human activity. It is on the basis of these programs that we are supposed to massively reengineer the entire planetary economy and technology base.
The dumped files revealed that those critical programs are complete and utter train wrecks.
It’s hard to explain to non-programmers just how bad the code is but I will try. Suppose the code was a motorcycle. Based on the repeated statements that Catastrophic Anthropogenic Global Warming was “settled science” you would expect that the computer code that helped settle the science would look like this…
…when in reality it looks like this:
Yes, it’s that bad.
Programmers all over the world have begun wading through the code and they have been stunned by how bad it is. It’s quite clearly amateurish and nothing but an accumulation of seat-of-the-pants hacks and patches.
How did this happen?
I don’t think it resulted from any conscious fraud or deception on the part of the scientists. Instead, I think the problem arose from the simple fact that scientists do not know how to manage a large, long-term software project.
Scientists are not engineers. More importantly, they are not engineering managers. It would be stupid to put a scientist in charge of building a bridge. Yes, the scientist might understand all the basic principles and math required to build a bridge but he would have no experience in all the real-world minutiae of actually cobbling steel and concrete together to span a body of water. The scientist wouldn’t even understand the basic terminology of the profession. Few if any scientists have any experience in managing long-term, large-scale projects of any kind. They don’t understand the management principles and practices that experience has taught make a project successful.
(As an aside, this cuts both ways. Engineers are not scientists and when they try to act like scientists the results are often ugly. When you see some pseudo-scientific nonsense being peddled, an engineer is often behind it.)
The design, production and maintenance of large pieces of software require project management skills greater than those required for large material construction projects. Computer programs are the most complicated pieces of technology ever created. By several orders of magnitude they have more “parts” and more interactions between those parts than any other technology.
Software engineers and software project managers have created procedures for managing that complexity. It begins with seemingly trivial things like style guides that regulate what names programmers can give to attributes of software and the associated datafiles. Then you have version control in which every change to the software is recorded in a database. Programmers have to document absolutely everything they do. Before they write code, there is extensive planning by many people. After the code is written comes the dreaded code review in which other programmers and managers go over the code line by line and look for faults. After the code reaches its semi-complete form, it is handed over to Quality Assurance which is staffed by drooling, befanged, malicious sociopaths who live for nothing more than to take a programmer’s greatest, most elegant code and rip it apart and possibly sexually violate it. (Yes, I’m still bitter.)
Institutions pay for all this oversight and double-checking and programmers tolerate it because it is impossible to create a large, reliable and accurate piece of software without such procedures firmly in place. Software is just too complex to wing it.
Clearly, nothing like these established procedures was used at CRU. Indeed, the code seems to have been written overwhelmingly by just two people (one at a time) over the past 30 years. Neither of these individuals was a formally trained programmer and there appears to have been no project planning or even formal documentation. Indeed, the comments of the second programmer, the hapless “Harry”, as he struggled to understand the work of his predecessor are now being read as a kind of programmer’s Icelandic saga describing a death march through an inexplicable maze of ineptitude and boobytraps.
CRU isn’t that unusual. Few scientific teams use any kind of formal software-project management. Why? Well, most people doing scientific programming are not educated as programmers. They’re usually just scientists who taught themselves programming. Moreover, most custom-written scientific software, no matter how large, doesn’t begin as a big, planned project. Instead the software evolves from some small, simple programs written to handle relatively trivial tasks. After a small program proves useful, the scientist finds another related processing task so instead of rewriting everything from scratch, he bolts that new task onto an existing program. Then he does it again and again and again…
Most people who use spreadsheets a lot have seen this process firsthand. You start with a simple one sheet spreadsheet with a few dozen cells. It’s small, quick and useful but then you discover another calculation that requires the data in the sheet so you add that new calculation and any necessary data to the initial sheet. Then you add another and another. Over the years, you end up with a gargantuan monster. It’s not uncommon for systems analysts brought in to overhaul a company’s information technology to find that some critical node in the system is a gigantic, byzantine spreadsheet that only one person knows how to use and which started life as a now long-dead manager’s to-do list. (Yes, I’m still bitter.)
This is clearly the ad hoc process by which the CRU software evolved. It began as some simple Fortran programs back in the early ’80s which were progressively grafted onto until they became an incoherent rat’s nest of interlocking improvisations. The process encapsulates a dangerous mindset which NASA termed “the normalization of deviancy/risk”. This process happens when you do something statistically risky several times without any ill effect. After a time, people forget that the risky act was even dangerous. This is how two space shuttles came to be destroyed.
A lot of the CRU code is clearly composed of hacks. Hacks are informal, off-the-cuff solutions that programmers think up on the spur of the moment to fix some little problem. Sometimes they are so elegant as to be awe inspiring and they enter programming lore. More often, however, they are crude, sloppy and dangerously unreliable. Programmers usually use hacks as a temporary quick solution to a bottleneck problem. The intention is always to come back later and replace the hack with a more well-thought-out and reliable solution, but with no formal project management and time constraints it’s easy to forget to do so. After a time, more code evolves that depends on the existence of the hack, so replacing it becomes a much bigger task than just replacing the initial hack would have been.
(One hack in the CRU software will no doubt become famous. The programmer needed to calculate the distance and overlapping effect between weather monitoring stations. The non-hack way to do so would be to break out the trigonometry and write a planned piece of code to calculate the spatial relationships. Instead, the CRU programmer noticed that that the visualization software that displayed the program’s results already plotted the station’s locations so he sampled individual pixels on the screen and used the color of the pixels between the stations to determine their location and overlap! This is a fragile hack because if the visualization changes the colors it uses, the components that depend on the hack will fail silently.)
Regardless of how smart the scientists who wrote the CRU code were or how accomplished they were in their scientific area of specialization, they clearly were/are amateur programmers and utterly clueless software project managers. They let their software get away from them.
Of course the obvious question is why no one external to CRU realized they had a problem with their software. After all, isn’t this the type of problem that scientific peer review is supposed to catch? Yes, it is but contemporary science has a dirty little secret:
As far as I can tell, none of the software on which the entire concept of Catastrophic Anthropogenic Global Warming (CAGW) is based has been examined, reviewed or tested by anyone save the people who wrote the code in the first place. This is a staggering omission of scientific oversight and correction. Nothing like it has happened in the history of science.
For now, we can safely say all the data produced by this CRU code is highly suspect. By the ancient and proven rule in computing of “Garbage in, Garbage Out” this means that all the climate simulations by other teams that make predictions using this dubious data are likewise corrupted. Given that literally hundreds of millions of lives over the next century will depend on getting the climate models correct, we have to start all our climate modeling over from scratch.