A few weeks ago, Southwest Airlines suffered a major disruption to service, which lasted several days. It blamed the failure on "lingering disruptions following performance issues across multiple technology systems", apparently triggered by a power outage.
Today it's Delta's turn.Click below for the latest update on our system and operation: https://t.co/bqV1qwahmz— Southwest Airlines (@SouthwestAir) July 21, 2016
New statement from Delta - power outage caused IT failure pic.twitter.com/trkQbpym05— Rory Cellan-Jones (@ruskin147) August 8, 2016
@ruskin147 A power outage *triggered* this issue, but poor planning & no HA *caused* it. Why can Netflix get this right but airlines cant?— Richard Price (@RichardPrice) August 8, 2016
I am no computer expert but it seems like a whole system crashing (3 separate airlines) points to bad design (single point of failure)? 3/— Dan DePodwin (@WxDepo) August 8, 2016
The concept of "single point of failure" is widely known and understood. And the airline industry is rightly obsessed by safety. They wouldn't fly a plane without backup power for all systems. So what idiot runs a whole company without backup power?
We might speculate what degree of complacency or technical debt can account for this pattern of adverse incidents. I haven't worked with any of these organizations myself. However, my guess is that some people within the organization were aware of the vulnerability, but this awareness didn't somehow didn't penetrate the management hierarchy. (In terms of orgintelligence, a short-sighted board of directors becomes the single point of failure!) I'm also guessing it's not quite as simple and straightforward as the press reports and public statements imply, but that's no excuse. Management is paid (among other things) to manage complexity. (Hopefully with the help of system architects.)
If you are the boss of one of the many airlines not mentioned in this post, you might want to schedule a conversation with a system architect. Just a suggestion.
American Airlines Gradually Restores Service After Yesterday's Power Outage (PR Newswire, 15 August 2003)
Delta: ‘Large-scale cancellations’ after crippling power outage (CNN Wire, 8 August 2016)
Gatwick Airport Christmas Eve chaos a 'wake-up call' (BBC News, 11 April 2014)
Jon Cox, Ask the Captain: Do vital functions on planes have backup power? (USA Today, 6 May 2013)
Jad Mouawad, American Airlines Resumes Flights After a Computer Problem (New York Times, 16 April 2013)
Marni Pyke, Southwest Airlines apologizes for delays as it rebounds from outage (Daily Herald, 20 July 2016)
Alexandra Zaslow, Outdated Technology Likely Culprit in Southwest Airlines Outage (NBC News, Oct 12 2015)