Blood on the Tracks

Kevin Meyer has a thought-provoking post (referencing, among other things, the Asiana Flight 214 crash)  on achieving the right balance between manual and automatic control of systems.  His post reminded me of something that has been lurking in my queue of things-to-blog-about for a long time.

On January 6, 1996, Washington Metrorail train T-111 departed the Rockville (MD) station northbound.  Operating under automatic control as was standard practice, the train accelerated to a speed of 75 mph, and then began slowing for a station stop at Shady Grove. The speed was still too great for the icy rail conditions, however, and T-111 slid into a stopped train at the station, killing the driver.

What happened?  I think the answer to this question is relevant not only to the specific topics of mass transit and railroad safety, but also to the more general issues of manual and automatic operation in system design, and perhaps even to the architecture of organizations and political systems.

 

Here is the NTSB report.  Metrorail had decided to employ automatic operation in all but emergency conditions, the stated reason being that some operators were using the brakes too aggressively and flattening the wheels. Train operators were not allowed to switch to manual operation without permission from controllers, and the controllers and supervisors were given little discretion to approve such requests. The automatic control system had a hardwired maximum speed for each section of track, based on the track geometry, but this could be overridden downward (because of weather conditions, for example) by electronic commands from Metro’s Operations Control Center.  When T-111 left Rockville station, it was supposed to have received a command limiting its speed to Level 3 (59 mph), but missed the command (because the first car was past the station platform)…and in the absence of further information, the automatic train controller set the target speed at the full 75 mph, which was the maximum limiting speed ever to be used for that segment.  This goes against 100 years of railroad practice for vital systems (“vital” in this context meaning not only “very important,” but “life-critical”), which is that these systems may occasionally fail, but they must fail to the safest possible condition…which in this case, would have meant either the minimum speed for the segment or a refusal to move at all until the speed situation was clarified.

Even given this serious flaw in the system design, however, it seems clear that the accident could likely have been avoided by a timely switch to manual operation.  Even after the most restrictive performance level had been entered for the track segment leading into Rockville (and apparently received correctly in this case), T-111 still overran that station; one might think this was grounds for switching to manual. Yet automatic operation was still continued. When the T-111 driver reported that his target speed had been set to 75 mph, automatic operation was still continued.

There seems to have been in Metro an almost religious level of belief, enforced by the threat of job loss, in the goodness of  strictly automatic operation. NTSB says that ”  Such was the confidence in the automated system that controllers were directed to defer to the ATC system even when it displayed anomalies such as allowing trains to exceed programmed speeds. Even under those circumstances, human controllers and train operators were told to stand aside and allow the ATC system to “do what it is supposed to do.””  The radio controller who was on duty at the time of the accident said  “I had…asked questions before, if a train was running 75 mph, should we stop it and go mode 2? I was told, ‘Negative, mode 1 operation; let the train do what it’s supposed to do.’

Regarding T-111, the radio controller told NTSB  “At this time I had a feeling the system was doing what it was supposed to do. It was slowing the train down to make the stop at Shady Grove. At this time, I didn’t feel I had an emergency  where I could step in and overrule—put my job on the line—and tell the man to go manual.” (emphasis added)

In its analysis, NTSB made the following point:

The investigation revealed that many, if not most, important Metrorail policies and operating procedures were derived from rigid, top-down, highly centralized management processes. Apparently, many important decisions emanated from the deputy general manager for operations and were dutifully, and generally without question, carried out by senior and middle-level managers. This was apparently the case with the decision to require exclusive automatic train operation. Metrorail engineering personnel had access to information regarding the incompatibility between the A TC block design and the stopping profile of trains in inclement weather, but this information was not sought out or considered during the decisionmaking process. In fact, before the accident, the decision to require exclusive automatic train operation was known only to senior management and OCC personnel. That decision, with its defects, thus went into effect without the knowledge of train operators or key individuals in the safety, training, or engineering departments who could have pointed out its shortcomings. The inflexibility of this highly centralized structure also explains the adherence of OCC controllers to the exclusive automatic train operation decision after train overruns began occurring on the night of the accident.

I would assert that in effect, Metro had established the following decision-making priority levels:

*The judgment of the individual actually on the train, and with a direct view of the situation, was less to be counted on than…

*The judgment of the controllers and supervisors at the OCC in downtown DC…BUT the judgment of these individuals, who were not geographically present on the train but were involved in the situation in real time, was less to be counted on than…

*The judgment of the designers of the control system and the Metro operating policies…who, by the very nature of their work, could not be aware of the specific context of an individual situation.

Of how many other systems throughout American could a similar conclusion be drawn?

 

19 thoughts on “Blood on the Tracks”

  1. I think mandatory rudimentary programming course should be required of these bureaucrats. It teaches humility, if nothing else.

  2. I’ve been reviewing Casey Jones’ accident (http://en.wikipedia.org/wiki/Casey_Jones) and realized that this new Positive Train Control system (PTC) being implemented under an expensive Congressional mandate for US railroads would have not likely prevented Casey’s famous passenger train wreck.

    Casey’s accident was caused by four cars separating from the rear of a freight train that was being cleared off the main line at the Vaughn, MS station in 1900, directly in the path of the Cannonball Express. The air brake system, a fail-safe system, worked as designed, immobilizing the four cars when they broke the air connection to the rest of the train.

    Before the cars could be cleared, Casey, who was probably speeding trying to make up lost time, on finally seeing the cars on his track, hit his air brake. He became famous because he stayed in the cab and manually applied the locomotive brake (a separate system) that helped slow his train considerablely. Casey was the only fatality or serious injury.

    PTC works by maintaining adequate stopping distances between trains. One identified problem is that the system transmitter is in the lead locomotive and can’t really monitor in real time where the end of the train is. Losing four end cars would have been invisible to PTC. It would have helped in limiting Casey’s speed though.

    I have to grant you point that humans on the spot need over-ride capability. In my nuclear power plant design experience, recent designs have gone uncomfortably far in automation at the expense of wise operator intervention.

  3. Good metaphor. Regrettably, the people who could most benefit from studying it are also least likely to see it as applying to them.

  4. The wise operator costs money. When the first moon landing was made, all done on autopilot, I knew that automation was coming to the airline industry. I knew we would shrink the crew from thee (Captain, co-pilot,& flight engineer) to two.

    We had auto-pilots, and the capability to couple the auto-pilot to the ILS signal for what was called a coupled approach. Landings were always manual. The DC-10 came out with an early auto-flight guidance system. It was designed to be coupled up shortly after takeoff and be guided by inputs to a controller on the panel. It included auto-throttles, which were notoriously bad, but usable. You could fly the entire flight from shortly after takeoff down to 100 feet above the runway on auto-pilot and auto-throttles. The only way to be proficient with both, flying manually and with the flight guidance system, was to fly segments where you alternated between the two modes. Even the flight guidance system required situational awareness. It was necessary to be able to recognize when you might be getting into the weeds and to go manual.

    Hand flying an aircraft is a hand, eye, deep muscle coordination exercise that requires you to be thinking out ahead of the aircraft. Being in top form for hand flying requires regular practice. Also, flying a big aircraft on a visual approach without electronic aids (the Asiana 214 approach) is one of the more demanding maneuvers for airline pilots. Particularly when the approach is over water or over uneven terrain. (Sloping down to the runway threshold, or up to the runway threshold, or over undulating hills.) This requires knowing how to make a manual, (or at least be on the throttle and know how to program the proper descent profile into the auto-pilot) visual approach where the pilot is always aware of the situation and in control of both the profile and the throttles to maintain proper descent rate and airspeed.

    Heavy automation is desired by the industry because theoretically it cuts down on training costs and pilot error. However, as in the case of Asiana 214, there are hidden dangers.

  5. Actually Jimmy, Armstrong overrode the auto pilot and did the landing himself when he and Aldrin realized the autopilot had them headed for an area that was covered with large rocks. He took manual control and shifted the Eagle over to a clear area and landed with only a few seconds of fuel left.

  6. Was the deputy general manager for operations charged with the manslaughter of the driver? Only if absurd jacks-in-office get held responsible for the results of their idiocies will there be any hope of improvement.

  7. Jimmy,

    I’ve also read claims that flying a commercial airliner on auto-pilot also saves fuel.

  8. Jonathan, I intended this post both ways–as a metaphor for political (and organizational) overcentralization, but also in its more literal meaning having to do with the design and operation of control systems doing important things.

    I think it would be interesting to study business failures and economic disasters that have occurred because of excessive trust in the output of a computer system. The Halloween candy debacle at Hershey is well known…there are certainly many more such.

    I wish I could find the original article somewhere, but I read many years ago a piece by a US Army colonel who observed that prior to automation of parts ordering, the supply sergeants considered themselves responsible for parts inventory levels…post-automation, they considered themselves responsible for transmitting the usage data to the systems which decided on the reorders. The same sort of thing happens in many chain stores.

    The problem of using automated business systems while allowing (and encouraging) employee discretion is an important one; of course, there exist organizations that have so little respect for their employees that they can’t imagine that any such discretion would ever be desirable.

  9. In the Metrorail example: if Metro were concerned about the drivers braking too quickly and flattening the wheels (which excessive braking would have also been uncomfortable for passengers), they could have simply fitted the trains with “g” recorders, and fired the drivers who were perpetual bad actors from this standpoint. (They might have even been able to just use the data which was already being captured from the track circuits.) But this approach would probably have involved them in unpleasant HR situations, litigation, EEO complaints, etc.

    If judging people on performance is made very difficult (and I’m merely speculating that this was the case with Metro at the time), then management will tend to minimize individual discretion. OTOH, if people can be rewarded or gotten rid of as appropriate, then one can assume a reasonable level of employee rational behavior, and it will make sense to establish systems and policies in a way which enable and take advantage of broader employee discretion.

  10. See Stuart and Herbert Dreyfus’ model of skill acquisition or their book Mind Over Machine.

    The over-centralized enterprise is reluctant to entrust too much responsibility to human agency, initiative, and discretion because in order to reach the level of expertise, people have to transcend the hierarchy and in doing so disrupt the status quo. In the lower skill levels, results are separated from personal responsibility, which may be desirable for the less competent and certainly is for the micromanager or control freak. At the higher skill levels, responsibility becomes ingrained because they have an emotional attachment to doing a good job because they feel the benefits and advantages of the optimal experience.

  11. I have had an interesting series of experiences this week which may be related. I moved to a new, larger home 7/1. I had ordered a new bed and a new refrigerator to be delivered that day. A week ago, I got an automated call from Sit N Sleep,” the vendor for the bed, which cost about $3200, that it would be delivered “tomorrow,” that is a week before I moved to, the address of a house I sold four years ago. I called the salesman and he promised to correct the information but it was finally delivered on 7/2.

    I ordered a new refrigerator from Sears on 6/24 to be delivered to the new house on 7/1/14, or so it says on the receipt. However, the buyers’s name had a different middle initial, a different street address and a different telephone number from the ones I gave the salesman. One Monday 6/30, I called to check on the delivery after I noticed the discrepancy, and discovered the mixup. Sears promised it would be delivered on 7/3 as it was too late for 7/1 or 7/2. Today, I called again and was told it would not be delivered until 7/5. I am using an ice chest in the meantime.

    I went to a local discount appliance store today and asked if I could get delivery on Saturday, if I bought the refrigerator now. The salesman said, “You can have it tomorrow if you want.” Private enterprise (other than Sears) is great. I then called Sears and had an interesting conversation with a fellow in the Tempe AZ call center. He says he gets calls like this all the time. He also told me the wrong personal information and address were still in the system. Who knows where it would have gone on Saturday?

    I wonder if the delivery woes are related to these issues of central control and incompetence. The local store seems on top of the matter.

    As I was typing this, my daughter in Tucson called to say her brand new four tires had had a blowout in one and she was on side of the road. She plans to drive to California for the holiday and was leaving. I told her to call Enterprise and rent a car for the trip.

    Does anything work anymore ?

  12. Whitehall: :I’ve also read claims that flying a commercial airliner on auto-pilot also saves fuel.”

    It’s possible in the new automated systems that the flight guidance computer is always optimizing fuel burn. I’m long out of the nosiness (21 years) so not really up to date on that. From the days of the Arab oil embargoes on, fuel savings was a top priority. The problem was that the company I worked for computed fuel savings as amount of fuel burned per minute of flight. As an example let’s say the flight plan called for a 1 hour flight burning 6000 pounds of fuel with a 4000 pound reserves left at the gate. If you actually flew that profile you burned 100 pounds of fuel per minute. If you managed to shave 5 minutes off the flight time and arrived at the gate with 4,200 pounds they claimed you were burning too much fuel because that was a burn of 105 pound per minute.
    Yet, you had saved 200 pounds and got to your destination with a few minutes extra for the ground crew to turn you around. I went around and around with management on that and they never accepted my reasoning. Sigh!

    I used to save a lot of fuel and time when flying east to west when the jet stream was very strong on the route. Our flight plans always called for us to fly at the highest altitude for our weight because, all things being equal, that was the best fuel burn altitude. When you have a 150 knot headwind at 36,000 feet and can reduce that to 75 knots at 28,000 feet, you can beat the flight time and save fuel although your fuel burn per minute will be higher. For example: A 1000 mile flight with an average 150 knot headwind. Your true air speed at 36,000 would be about 488, but your groundspeed would be 338. Time enroute would be 2 hours and 57 minutes. Fuel burn at 100 pounds per minute would be 17,700 pounds. Alternatively, fly at 28,000 with a 75 knot headwind and a ground speed of 413. Time enroute is 2 hours and 25 minutes. Your fuel burn at that altitude would be about 120 pounds per minute. Your total fuel burned would be 17,400 pounds for a saving of 300 pounds and 32 minutes on schedule. Yet I would be told I burned too much fuel because my burn per minute rate was higher.

    As a co-pilot I flew with a Captain who played their game. It was maddening. He would never arrive early, if he could avoid it, because it made his fuel burn look worse. I swore I would not play that game when I moved into the Captain’s seat. And I didn’t.

  13. Here’s an interesting (and depressing) example of a business-harming behavior created by system rigidity:

    “One of my favorite examples of how the provisioning of an ERP system can go wrong was the inventory management portion of an ERP system at an airline’s maintenance depot. The new system – designed by accountants and auditors – followed a standard, “common sense” process of requiring the defective part to be checked in before a new one could be checked out. However, the system it replaced allowed pilots to radio ahead when some piece of equipment or component failed so that replacement could start as soon as the plane arrived at the gate. From the perspective of a pilot or maintenance personnel, this approach was sensible. But when the airline changed over to the new system with a process designed for inventory control, very expensive aircraft and many grumpy passengers were left waiting at the gate while the old part was shuttled to the inventory cage and the replacement part was found and reissued.”

    http://smartdatacollective.com/robert-kugel/182146/challenge-making-erp-systems-more-configurable

  14. The NTSB investigation calls out Washington Metro for putting too much faith in fully-automatic train operation. Note also that the motorman is relatively new to Metro (thus, still developing route knowledge?) and still unfamiliar with automatic operation at track speed (thus a supervisor might trust the machinery rather than the motorman)?

    In railroading, every line of the Book of Rules is written in somebody’s blood. There are three rules from my 1967 Consolidated Code of Operating Rules that apply here.

    First, the discretion clause, Rule 109. “In case of doubt or uncertainty, the safe course must be taken.” The motorman sought the advice of the supervisor, and the supervisor judged reliance on the automatic train operation to be the safe course. Oops.

    Second, the motorman is complying with Rule 106. “The conductor and engineer and anyone acting as pilot are equally responsible for the safety of the train and the observance of the rules, and under conditions not provided for by the rules, must take every precaution for protection.” A rapid-transit system might not have the same operating rules as the railroad. The motorman seeking advice from the supervisor is complying with the spirit of this rule. Disregarding a direct instruction and overriding the automatic train operation is ex post the correct course, but a good way to be disciplined.

    There is, however, a provision to the contrary in Rule 101. “Trains and engines must be fully protected against any known condition which interferes with their safe passage at normal speed.” In the Board’s recommendations, apparently the effect of wet tracks on the efficacy of the automatic train operation is a known unknown, but supervisors might have exercised better judgement with better training, or a better understanding of what the automatic train operation could handle, or not.

  15. I see in this morning’s Drudge Report a link to a Mercedes-Benz claim that driver-less trucks will use less fuel.

    God help us!

  16. Whitehall: I see driverless vehicles every time I am out on the highway. There is always a human being in the front left seat behind the steering wheel. Yesterday a woman was putting on make-up. They also eat, talk on the phone, etc.

    Having a computer drive the car will be a big step forward.

Comments are closed.