Advertisement

Business

Regulatory Resource   Threat Intelligence      Resilient IT      Boardroom Strategies      
Resilient IT / Analytics and ROI

Rehearsing Your Next Nightmare

By Rob Austin

In November 2002, the communicative capabilities of the computer systems at Boston's renowned Beth Israel Deaconess Medical Center (BIDMC) failed totally. As is often the case with such extensive failures, it was far from obvious what the cause was but the effects were clear. Nothing worked.
 
Computerized patient histories, systems to prevent drug interactions and allergic reactions, the means of accessing and enhancing x-rays and other diagnostic images -- all these and many other critical resources became unavailable. According to John Halamka, BIDMC's CIO, who has contributed greatly to learning in our industry by discussing this incident so candidly, "In an instant, we became a hospital of the 1970s."
 
Talk about a business continuity problem. Patients were expected to be quite patient -- the event at BIDMC lasted three and a half days. It showed everyone at the hospital just how dependent on IT systems healthcare operations had become. No one had deliberately decided the hospital's systems would become so vital. But as the hospital added world-class systems one by one, dependence had grown. At the same time, the network had grown too, little by little, until it was well out of spec.
 
The technical causes of the outage at BIDMC are complex; the out-of-spec network caused a problem for the algorithm used to compute new communication routes when a component fails. But the lessons we can draw from the BIDMC event are less technical than managerial.
 
Interestingly, the problem at BIDMC arose not because of lack of redundancy or inadequate investment in technology assets -- the hospital's network components were redundant. Ironically, the problem causing the outage could only happen to redundant networks. The complexity of the network, evolved over time by a combination of incremental growth and deliberate decisions to add redundancy, was a direct cause of the system's unpredictable behavior. Thus, this story argues for a formal management process to monitor the aggregate effects of incremental systems changes.
 
But there is a deeper point here, too. Charles Perrow -- whose highly recommended 1984 book Normal Accidents explains how the very actions we take to make our systems more robust contribute to overall complexity and increase the likelihood of what he calls "normal" accidents -- suggests that accidents of this kind are inevitable. What then should we do?
 
The medical staff at BIDMC reacted brilliantly to the outage with a combination of staff additions and heroic individual effort (no adverse incidents resulted from the outage), while the IT staff and Cisco Systems got the system back up and running (and back in spec). One major success factor: the Y2K plans the hospital had prepared three years earlier. But those plans had to be dug out of deep storage. Once located, they proved critical in a way that managers never anticipated once January 1, 2000 had passed.
 
The BIDMC experience argues, of course, for serious planning for possible incidents and reminds us of the need to invest in accident avoidance and response. The "normal accidents" character of this incident however, teaches us something else: It's important to rehearse response to failures, especially those you can't precisely predict. Organizations don't like to rehearse potential disasters. Rehearsals are costly and inconvenient. But without them, readiness for incidents -- which will happen sooner or later -- is incomplete, however earnestly a company invests in redundancy or makes plans on paper.
 
Rob Austin is a professor at Harvard Business School and chair of "Delivering Information Services," the school's CIO Executive Education program.

CIO Strategy Center is a daily editorial resource offering innovative insights and strategies for building an integrated, secure and resilient IT infrastructure.

Articles by Topic
Network and Infrastructure
Analytics and ROI
Strategies
Related Content
Fast Fact

"In an instant, we became a hospital of the 1970s."

-- John Halamka, Beth Israel Deaconess Medical Center's CIO

Sponsor Tools
Podcast Audio Content

CIO Strategy Center is now available in audio format.

This week's feature topic is:


Risks of Wireless Email
Playtime: 8 min 23 sec



Download | Subscribe