Automation and Safety

Since the recent Amtrak crash, I’ve seen people in several places…including here…suggesting that engineers should be eliminated and trains operated entirely by automatic control. Here is a cautionary tale about such automation, which I originally posted about a year ago under the title Blood on the Tracks. I’ve added a few thoughts at the end.

Kevin Meyer has a thought-provoking post (referencing, among other things, the Asiana Flight 214 crash) on achieving the right balance between manual and automatic control of systems.  His post reminded me of something that has been lurking in my queue of things-to-blog-about for a long time.

On January 6, 1996, Washington Metrorail train T-111 departed the Rockville (MD) station northbound.  Operating under automatic control as was standard practice, the train accelerated to a speed of 75 mph, and then began slowing for a station stop at Shady Grove. The speed was still too great for the icy rail conditions, however, and T-111 slid into a stopped train at the station, killing the driver.

What happened?  I think the answer to this question is relevant not only to the specific topics of mass transit and railroad safety, but also to the more general issues of manual and automatic operation in system design, and perhaps even to the architecture of organizations and political systems.

 

Here is the NTSB report.  Metrorail had decided to employ automatic operation in all but emergency conditions, the stated reason being that some operators were using the brakes too aggressively and flattening the wheels. Train operators were not allowed to switch to manual operation without permission from controllers, and the controllers and supervisors were given little discretion to approve such requests. The automatic control system had a hardwired maximum speed for each section of track, based on the track geometry, but this could be overridden downward (because of weather conditions, for example) by electronic commands from Metro’s Operations Control Center.  When T-111 left Rockville station, it was supposed to have received a command limiting its speed to Level 3 (59 mph), but missed the command (because the first car was past the station platform)…and in the absence of further information, the automatic train controller set the target speed at the full 75 mph, which was the maximum limiting speed ever to be used for that segment.  This goes against 100 years of railroad practice for vital systems (“vital” in this context meaning not only “very important,” but “life-critical”), which is that these systems may occasionally fail, but they must fail to the safest possible condition…which in this case, would have meant either the minimum speed for the segment or a refusal to move at all until the speed situation was clarified.

Even given this serious flaw in the system design, however, it seems clear that the accident could likely have been avoided by a timely switch to manual operation.  Even after the most restrictive performance level had been entered for the track segment leading into Rockville (and apparently received correctly in this case), T-111 still overran that station; one might think this was grounds for switching to manual. Yet automatic operation was still continued. When the T-111 driver reported that his target speed had been set to 75 mph, automatic operation was still continued.

There seems to have been in Metro an almost religious level of belief, enforced by the threat of job loss, in the goodness of  strictly automatic operation. NTSB says that ” Such was the confidence in the automated system that controllers were directed to defer to the ATC system even when it displayed anomalies such as allowing trains to exceed programmed speeds. Even under those circumstances, human controllers and train operators were told to stand aside and allow the ATC system to “do what it is supposed to do.””  The radio controller who was on duty at the time of the accident said  “I had…asked questions before, if a train was running 75 mph, should we stop it and go mode 2? I was told, ‘Negative, mode 1 operation; let the train do what it’s supposed to do.’

Regarding T-111, the radio controller told NTSB  “At this time I had a feeling the system was doing what it was supposed to do. It was slowing the train down to make the stop at Shady Grove. At this time, I didn’t feel I had an emergency where I could step in and overrule—put my job on the line—and tell the man to go manual.” (emphasis added)

In its analysis, NTSB made the following point:

The investigation revealed that many, if not most, important Metrorail policies and operating procedures were derived from rigid, top-down, highly centralized management processes. Apparently, many important decisions emanated from the deputy general manager for operations and were dutifully, and generally without question, carried out by senior and middle-level managers. This was apparently the case with the decision to require exclusive automatic train operation. Metrorail engineering personnel had access to information regarding the incompatibility between the A TC block design and the stopping profile of trains in inclement weather, but this information was not sought out or considered during the decisionmaking process. In fact, before the accident, the decision to require exclusive automatic train operation was known only to senior management and OCC personnel. That decision, with its defects, thus went into effect without the knowledge of train operators or key individuals in the safety, training, or engineering departments who could have pointed out its shortcomings. The inflexibility of this highly centralized structure also explains the adherence of OCC controllers to the exclusive automatic train operation decision after train overruns began occurring on the night of the accident.

I would assert that in effect, Metro had established the following decision-making priority levels:

*The judgment of the individual actually on the train, and with a direct view of the situation, was less to be counted on than…

*The judgment of the controllers and supervisors at the OCC in downtown DC…BUT the judgment of these individuals, who were not geographically present on the train but were involved in the situation in real time, was less to be counted on than…

*The judgment of the designers of the control system and the Metro operating policies…who, by the very nature of their work, could not be aware of the specific context of an individual situation.

Of how many other systems throughout American could a similar conclusion be drawn?

Upate 5/19/15:  I posted this at Ricochet a couple of days ago (members-only section), leading to some interesting discussion.  One commenter drew an analogy with the move War Games, and remarked that “In short, since humans create the automation, it is error prone, so you still need humans to potentially override it even though humans themselves are error prone.”  To which I responded that “The kinds of errors made by humans and by automation are, generally speaking, pretty different.  The human errors will be a matter of inattention, inability to process sensory data quickly enough, etc.  The automation errors will be due to applying a generic set of rules to a situation where those rules don’t fit and/or lead to unforeseen results of a malignant kind” and also:

“An excellent example of automation for safety can be found in railroad interlocking signaling systems, first introduced in the late 1800s.  The idea is that the human operator *physically can not* cause a setting of switches and signals which will lead to an accident…he can achieve settings which will achieve a god-awful traffic snarl and bring the RR to a halt, but can’t cause a wreck. The logical checks were first enforced by mechanical means, later by relays and electronics. (This all assumes that the train engineer does obey the signal indications)”

Someone else commented that “The problem in this parable (the Washington Metrorail crash) isn’t the automation, it’s the centralization. If you can’t trust the guy on the ground then you’re already waiting for disaster.”  My response:

“Automation isn’t *inherently* linked to centralization, but the way it is implemented in practice is so linked more often than not. For example, a chain store outlet in Florida received a shipment of snow blowers, because the company’s centralized ERP system was deciding what each store “needed” without any involvement of store management.”

and

“Of course, I’m not arguing against automation.  But improved communications & computation frequently *enables* centralization that just wouldn’t have been possible before.  For example, diplomats had far more authority to act on their own prior to undersea cables and radio.  Selection of individual bombing targets, as practiced by LBJ and apparently also Obama, would not have been practical during the WWII era.  Too often, human nature of those running the organization seems to make them think that because centralizing control in their hands in *possible*, it is also *desirable*.”

See also my post When Humans and Robots Communicate

15 thoughts on “Automation and Safety”

  1. I think that programmer of the ATC system on that train later got a job working on the design of Obamacare.

  2. I have posted somewhere a description of the system that made me an enthusiast for the Electronic Medical Record 30 years ago. I was a member of this organization and went to annual meetings for years. I was even a presenter at one.

    The study was on this clinical trial and some of the results are here.

    Briefly, Adult Respiratory Distress Syndrome or “ARDS” is a condition of lung failure seen especially in young trauma patients. In 1994, when the study was begun and I first heard about it, the mortality of this condition at the Mass General, maybe the best hospital in the world for respiratory care, was 85%. The proposal was to use the membrane oxygenator on these patients, called ECMO (extracorporeal membrane oxygenator.) It was estimated that each case would cost $100,000 with the initiation of treatment, which could cost more with prolonged use. This machine does not damage the red cells like the heart lung bypass does in heart surgery and can be used for a month. The institution was originally the LDS hospital in Salt Lake City but it was expanded to Intermountain Health Systems, the chain.

    Before embarking on this expensive program, they decided to optimize care without the ECMO. They set up a system where all orders were done by a computer EHR and a decision support system was used to optimize the care. An algorithm was set up by the pulmonary medicine staff and it was to create the orders for the respirator, the IV fluids and all the meds. Doctors could over-ride the prompts but had to sign any such over ride. If results were better with the over ride, the algorithm was modified. By two or three weeks, the algorithm was writing 95% of the orders.

    About three or four months later, they realized they were getting a massive improvement in survival over the previous best care.

    After a year or so, they had reduced the mortality of ARDS to about 45% WITHOUT the ECMO. I was really excited by this result.

    I’ve never seen results this good achieved by another EHR. I’m convinced it’s the programming.

  3. It is unbelievable that the system defaulted to “Track Maximum Speed” in the absence of command information, but that seems to be what the NTSB report says.

    But even then, if the Metro management had been more interested in listening to employees and their concerns it seems almost certain that the accident could have been avoided.

    If there was a serious problem with drivers in manual mode using the brakes too aggressively, recording accelerometers could have been placed in the cars without too much added expense.

  4. “a serious problem with drivers in manual mode using the brakes too aggressively”

    I vaguely remember something about this and the “flattening wheels” excuse. Bureaucracy in action.

  5. In the recent train crash in the East Coast there are discussions of a “projectile” hitting

    From the news the train speed was jumps during the travel in the curve route which should not be done,
    It might be due driver mental status rather suggesting projectile” hitting, just like what happened to the Gearman airplane with co-pilot

  6. Automatic systems produce both a lack of experience in the operator and a lack of attention. The Asiana crash at SFO last year was caused by the Korean pilots relying so much on the autopilots that they were both reluctant and poorly skilled at using the older manual landing system (the automated one was being upgraded but the weather was sunny and calm on the approach). A good operator/driver needs to run the machine a lot to keep their skills, attention level and interest high. A 19th century engineer had less reliable equipment but knew it’s capabilities.

  7. The base case of these problems is what’s called ‘modeling deficiency’. People use simplified models to understand complex systems, in the belief that the excess complex features that were left out of the model weren’t necessary to understanding the overall aspects that they want to see.
    In this approach, a climate modeler who doesn’t understand function relationships makes sweeping pronouncements about weather shifts, a railway manager who is more concerned about maintenance costs decides what track speeds are appropriate, and a politician who decides agricultural policy assumes rainfall will always remain the same.
    There’s a superb textbook on this effect, “Seeing Like a State”, that looks at three different examples, back to the English Middle ages, where the very best people basically managed their way into poverty by making all the ‘right’ choices.

  8. These very issues of what factors are modeled when the automated systems are programmed are what concern me most with the coming of self driving trucks. Several test vehicles are currently operating worldwide. What happens when you have a sensor or radar unit go bad in transit? What happens with black ice? What happens if when a tire blows out? And so on.

    An 80,000 pound projectile possibly loaded with hazardous materials travelling 65 or 70 miles per hour down the highway. The truck thinks everything is okay. The driver doesn’t hear or see any alarms. He’s been tasked with working on his logs and paperwork when the trucks on autopilot so as to increase his efficiency, and isn’t paying close attention to driving conditions or the truck’s operating behavior. Scary thought.

    I know of many instances where a truck’s electrical and light systems are inspected and functioning perfectly when it leaves the lot. A couple of miles down the road a light is out. What happened? A pothole, or a railroad crossing, or just the crumbling infrastructure of our roads and bridges. I suspect the sensors needed in these systems are probably more delicate than a simple light bulb.

  9. “These very issues of what factors are modeled when the automated systems are programmed are what concern me most with the coming of self driving trucks.”

    Here’s one way they’re looking at it

    http://www.automotiveitnews.org/articles/share/641232/

    The idea here is to have a fleet of cars, each with a mix of 12 cameras and radar or lidar sensors, driven around by humans, and data recorded by the team uploaded to a specialized cloud service.

    This data can be used to train a neural network to understand life on the road, and how to react to situations, objects and people, from the way humans behave while driving around. The Drive PX features two beefy Nvidia Tegra X1 processors, aimed at self-aware cars.

    Nvidia engineers have already apparently captured at least 40 hours of video, and used the Amazon Mechanical Turk people-for-hire service to get other humans to classify 68,000 objects in the footage. This information – the classification and what they mean in the context of driving – is fed into the neural network. Now the prototype-grade software can pick out signs in the rain, know to avoid cyclists, and so on.

    So the self-learning neural network will get constantly updated by a legion of crowd-sourcing human slaves… er Turks who will help identify 99% of all those situations and conditions that typical drivers encounter.

  10. Jeff,

    I suspect the sensors needed in these systems are probably more delicate than a simple light bulb.

    You suspect wrong–a ‘simple light bulb’, as long as by ‘simple’ you mean ‘incandescent’, is just about the most fragile thing on any vehicle.

  11. “Seeing Like a State”

    D*mn you, Ed! My reading list is already going to outlive me… and then you go and add another very interesting title to my field of view. Dang dang dang…

  12. Kirk,

    Actually, the electrical to the various lights fails as often as the bulb. My point is the same. The beating the equipment takes on the road is very hard on the truck. Stuff that seems like it shouldn’t break, sometimes does. What happens then?

  13. We have similar debates in the operation of nuclear power plants. The push is on for “passively safe reactors” which might not be worth the effort and the compromises.

    “Positive Train Control” or PTC is being imposed on the railroads. I think it interesting that PTC would not have prevent that most famous American train wreck – Casey Jones.

  14. The story of the Air France crash in the Atlantic is an interesting one. The best account I have found is The Vanity Fair one.

    Even today—with the flight recorders recovered from the sea floor, French technical reports in hand, and exhaustive inquests under way in French courts—it remains almost unimaginable that the airplane crashed. A small glitch took Flight 447 down, a brief loss of airspeed indications—the merest blip of an information problem during steady straight-and-level flight. It seems absurd, but the pilots were overwhelmed.

    There were human factors, especially with the captain, but it was technology failure and failure of training.

    To put it briefly, automation has made it more and more unlikely that ordinary airline pilots will ever have to face a raw crisis in flight—but also more and more unlikely that they will be able to cope with such a crisis if one arises. Moreover, it is not clear that there is a way to resolve this paradox. That is why, to many observers, the loss of Air France 447 stands out as the most perplexing and significant airline accident of modern times.

  15. Another interesting aspect is the push to retire the experienced pilots when automation came in. This is also addressed in the article.

    First, you put the Clipper Skipper out to pasture, because he has the unilateral power to screw things up. You replace him with a teamwork concept—call it Crew Resource Management—that encourages checks and balances and requires pilots to take turns at flying. Now it takes two to screw things up.

    and:

    Nonetheless there are worries even among the people who invented the future. Boeing’s Delmar Fadden explained, “We say, ‘Well, I’m going to cover the 98 percent of situations I can predict, and the pilots will have to cover the 2 percent I can’t predict.’ This poses a significant problem. I’m going to have them do something only 2 percent of the time. Look at the burden that places on them. First they have to recognize that it’s time to intervene, when 98 percent of the time they’re not intervening. Then they’re expected to handle the 2 percent we couldn’t predict. What’s the data? How are we going to provide the training? How are we going to provide the supplementary information that will help them make the decisions? There is no easy answer. From the design point of view, we really worry about the tasks we ask them to do just occasionally.”

    I said, “Like fly the airplane?”

    Yes, that too. Once you put pilots on automation, their manual abilities degrade and their flight-path awareness is dulled: flying becomes a monitoring task, an abstraction on a screen, a mind-numbing wait for the next hotel.

    This reminds me a bit of “managed care” when it came in the 80s. Older physicians were often pushed out as “Unsuited for managed care.” That means they used independent judgement that didn’t always arrive at the cheapest solution.

    Well, I’m flying to Charleston Tuesday and then on to Chicago Friday so I hope the computer and Flight Management System doesn’t burp.

Comments are closed.