Kevin Meyer has a thought-provoking post (referencing, among other things, the Asiana Flight 214 crash) on achieving the right balance between manual and automatic control of systems. His post reminded me of something that has been lurking in my queue of things-to-blog-about for a long time.
On January 6, 1996, Washington Metrorail train T-111 departed the Rockville (MD) station northbound. Operating under automatic control as was standard practice, the train accelerated to a speed of 75 mph, and then began slowing for a station stop at Shady Grove. The speed was still too great for the icy rail conditions, however, and T-111 slid into a stopped train at the station, killing the driver.
What happened? I think the answer to this question is relevant not only to the specific topics of mass transit and railroad safety, but also to the more general issues of manual and automatic operation in system design, and perhaps even to the architecture of organizations and political systems.
Here is the NTSB report. Metrorail had decided to employ automatic operation in all but emergency conditions, the stated reason being that some operators were using the brakes too aggressively and flattening the wheels. Train operators were not allowed to switch to manual operation without permission from controllers, and the controllers and supervisors were given little discretion to approve such requests. The automatic control system had a hardwired maximum speed for each section of track, based on the track geometry, but this could be overridden downward (because of weather conditions, for example) by electronic commands from Metro’s Operations Control Center. When T-111 left Rockville station, it was supposed to have received a command limiting its speed to Level 3 (59 mph), but missed the command (because the first car was past the station platform)…and in the absence of further information, the automatic train controller set the target speed at the full 75 mph, which was the maximum limiting speed ever to be used for that segment. This goes against 100 years of railroad practice for vital systems (“vital” in this context meaning not only “very important,” but “life-critical”), which is that these systems may occasionally fail, but they must fail to the safest possible condition…which in this case, would have meant either the minimum speed for the segment or a refusal to move at all until the speed situation was clarified.
Even given this serious flaw in the system design, however, it seems clear that the accident could likely have been avoided by a timely switch to manual operation. Even after the most restrictive performance level had been entered for the track segment leading into Rockville (and apparently received correctly in this case), T-111 still overran that station; one might think this was grounds for switching to manual. Yet automatic operation was still continued. When the T-111 driver reported that his target speed had been set to 75 mph, automatic operation was still continued.
There seems to have been in Metro an almost religious level of belief, enforced by the threat of job loss, in the goodness of strictly automatic operation. NTSB says that ” Such was the confidence in the automated system that controllers were directed to defer to the ATC system even when it displayed anomalies such as allowing trains to exceed programmed speeds. Even under those circumstances, human controllers and train operators were told to stand aside and allow the ATC system to “do what it is supposed to do.”” The radio controller who was on duty at the time of the accident said “I had…asked questions before, if a train was running 75 mph, should we stop it and go mode 2? I was told, ‘Negative, mode 1 operation; let the train do what it’s supposed to do.’
Regarding T-111, the radio controller told NTSB “At this time I had a feeling the system was doing what it was supposed to do. It was slowing the train down to make the stop at Shady Grove. At this time, I didn’t feel I had an emergency where I could step in and overrule—put my job on the line—and tell the man to go manual.” (emphasis added)
In its analysis, NTSB made the following point:
The investigation revealed that many, if not most, important Metrorail policies and operating procedures were derived from rigid, top-down, highly centralized management processes. Apparently, many important decisions emanated from the deputy general manager for operations and were dutifully, and generally without question, carried out by senior and middle-level managers. This was apparently the case with the decision to require exclusive automatic train operation. Metrorail engineering personnel had access to information regarding the incompatibility between the A TC block design and the stopping profile of trains in inclement weather, but this information was not sought out or considered during the decisionmaking process. In fact, before the accident, the decision to require exclusive automatic train operation was known only to senior management and OCC personnel. That decision, with its defects, thus went into effect without the knowledge of train operators or key individuals in the safety, training, or engineering departments who could have pointed out its shortcomings. The inflexibility of this highly centralized structure also explains the adherence of OCC controllers to the exclusive automatic train operation decision after train overruns began occurring on the night of the accident.
I would assert that in effect, Metro had established the following decision-making priority levels:
*The judgment of the individual actually on the train, and with a direct view of the situation, was less to be counted on than…
*The judgment of the controllers and supervisors at the OCC in downtown DC…BUT the judgment of these individuals, who were not geographically present on the train but were involved in the situation in real time, was less to be counted on than…
*The judgment of the designers of the control system and the Metro operating policies…who, by the very nature of their work, could not be aware of the specific context of an individual situation.
Of how many other systems throughout American could a similar conclusion be drawn?