Since the recent Amtrak crash, I’ve seen people in several places…including here…suggesting that engineers should be eliminated and trains operated entirely by automatic control. Here is a cautionary tale about such automation, which I originally posted about a year ago under the title Blood on the Tracks. I’ve added a few thoughts at the end.
Kevin Meyer has a thought-provoking post (referencing, among other things, the Asiana Flight 214 crash) on achieving the right balance between manual and automatic control of systems. His post reminded me of something that has been lurking in my queue of things-to-blog-about for a long time.
On January 6, 1996, Washington Metrorail train T-111 departed the Rockville (MD) station northbound. Operating under automatic control as was standard practice, the train accelerated to a speed of 75 mph, and then began slowing for a station stop at Shady Grove. The speed was still too great for the icy rail conditions, however, and T-111 slid into a stopped train at the station, killing the driver.
What happened? I think the answer to this question is relevant not only to the specific topics of mass transit and railroad safety, but also to the more general issues of manual and automatic operation in system design, and perhaps even to the architecture of organizations and political systems.
Here is the NTSB report. Metrorail had decided to employ automatic operation in all but emergency conditions, the stated reason being that some operators were using the brakes too aggressively and flattening the wheels. Train operators were not allowed to switch to manual operation without permission from controllers, and the controllers and supervisors were given little discretion to approve such requests. The automatic control system had a hardwired maximum speed for each section of track, based on the track geometry, but this could be overridden downward (because of weather conditions, for example) by electronic commands from Metro’s Operations Control Center. When T-111 left Rockville station, it was supposed to have received a command limiting its speed to Level 3 (59 mph), but missed the command (because the first car was past the station platform)…and in the absence of further information, the automatic train controller set the target speed at the full 75 mph, which was the maximum limiting speed ever to be used for that segment. This goes against 100 years of railroad practice for vital systems (“vital” in this context meaning not only “very important,” but “life-critical”), which is that these systems may occasionally fail, but they must fail to the safest possible condition…which in this case, would have meant either the minimum speed for the segment or a refusal to move at all until the speed situation was clarified.
Even given this serious flaw in the system design, however, it seems clear that the accident could likely have been avoided by a timely switch to manual operation. Even after the most restrictive performance level had been entered for the track segment leading into Rockville (and apparently received correctly in this case), T-111 still overran that station; one might think this was grounds for switching to manual. Yet automatic operation was still continued. When the T-111 driver reported that his target speed had been set to 75 mph, automatic operation was still continued.
There seems to have been in Metro an almost religious level of belief, enforced by the threat of job loss, in the goodness of strictly automatic operation. NTSB says that ” Such was the confidence in the automated system that controllers were directed to defer to the ATC system even when it displayed anomalies such as allowing trains to exceed programmed speeds. Even under those circumstances, human controllers and train operators were told to stand aside and allow the ATC system to “do what it is supposed to do.”” The radio controller who was on duty at the time of the accident said “I had…asked questions before, if a train was running 75 mph, should we stop it and go mode 2? I was told, ‘Negative, mode 1 operation; let the train do what it’s supposed to do.’
Regarding T-111, the radio controller told NTSB “At this time I had a feeling the system was doing what it was supposed to do. It was slowing the train down to make the stop at Shady Grove. At this time, I didn’t feel I had an emergency where I could step in and overrule—put my job on the line—and tell the man to go manual.” (emphasis added)
In its analysis, NTSB made the following point:
The investigation revealed that many, if not most, important Metrorail policies and operating procedures were derived from rigid, top-down, highly centralized management processes. Apparently, many important decisions emanated from the deputy general manager for operations and were dutifully, and generally without question, carried out by senior and middle-level managers. This was apparently the case with the decision to require exclusive automatic train operation. Metrorail engineering personnel had access to information regarding the incompatibility between the A TC block design and the stopping profile of trains in inclement weather, but this information was not sought out or considered during the decisionmaking process. In fact, before the accident, the decision to require exclusive automatic train operation was known only to senior management and OCC personnel. That decision, with its defects, thus went into effect without the knowledge of train operators or key individuals in the safety, training, or engineering departments who could have pointed out its shortcomings. The inflexibility of this highly centralized structure also explains the adherence of OCC controllers to the exclusive automatic train operation decision after train overruns began occurring on the night of the accident.
I would assert that in effect, Metro had established the following decision-making priority levels:
*The judgment of the individual actually on the train, and with a direct view of the situation, was less to be counted on than…
*The judgment of the controllers and supervisors at the OCC in downtown DC…BUT the judgment of these individuals, who were not geographically present on the train but were involved in the situation in real time, was less to be counted on than…
*The judgment of the designers of the control system and the Metro operating policies…who, by the very nature of their work, could not be aware of the specific context of an individual situation.
Of how many other systems throughout American could a similar conclusion be drawn?
Upate 5/19/15: I posted this at Ricochet a couple of days ago (members-only section), leading to some interesting discussion. One commenter drew an analogy with the move War Games, and remarked that “In short, since humans create the automation, it is error prone, so you still need humans to potentially override it even though humans themselves are error prone.” To which I responded that “The kinds of errors made by humans and by automation are, generally speaking, pretty different. The human errors will be a matter of inattention, inability to process sensory data quickly enough, etc. The automation errors will be due to applying a generic set of rules to a situation where those rules don’t fit and/or lead to unforeseen results of a malignant kind” and also:
“An excellent example of automation for safety can be found in railroad interlocking signaling systems, first introduced in the late 1800s. The idea is that the human operator *physically can not* cause a setting of switches and signals which will lead to an accident…he can achieve settings which will achieve a god-awful traffic snarl and bring the RR to a halt, but can’t cause a wreck. The logical checks were first enforced by mechanical means, later by relays and electronics. (This all assumes that the train engineer does obey the signal indications)”
Someone else commented that “The problem in this parable (the Washington Metrorail crash) isn’t the automation, it’s the centralization. If you can’t trust the guy on the ground then you’re already waiting for disaster.” My response:
“Automation isn’t *inherently* linked to centralization, but the way it is implemented in practice is so linked more often than not. For example, a chain store outlet in Florida received a shipment of snow blowers, because the company’s centralized ERP system was deciding what each store “needed” without any involvement of store management.”
“Of course, I’m not arguing against automation. But improved communications & computation frequently *enables* centralization that just wouldn’t have been possible before. For example, diplomats had far more authority to act on their own prior to undersea cables and radio. Selection of individual bombing targets, as practiced by LBJ and apparently also Obama, would not have been practical during the WWII era. Too often, human nature of those running the organization seems to make them think that because centralizing control in their hands in *possible*, it is also *desirable*.”
See also my post When Humans and Robots Communicate