Showing posts with label aviation. Show all posts
Showing posts with label aviation. Show all posts

Friday, April 09, 2021

Near Miss

A serious aviation incident in the news today. A plane took off from Birmingham last year with insufficient fuel, because the weight of the passengers was incorrectly estimated. This is being described as an IT error.

As Cathy O'Neil's maxim reminds us, algorithms are opinions embedded in code. The opinion in this case was the assumption that the prefix Miss referred to a female child. According to the official report, published this week, this is how the prefix is used in the country where the system was programmed

In this particular flight, 38 adult women were classified as Miss, so the algorithm estimated their weight as 35 kg instead of 69 kg.

The calculation error was apparently compounded by several human factors.

  • A smaller discrepancy had been spotted and corrected on a previous flight. 
  • The pilot noticed that there seemed to be an unusually high number of children on the flight, but took no action because the pandemic had disrupted normal expectations of passenger numbers.
  • The software was being upgraded, but the status of the fix at the time of the flight was unclear. There were other system-wide changes being implemented at the same time, which may have complicated the fix.
  • Guidance to ground staff to double-check the classification of female passengers was not properly communicated and followed, possibly due to weekend shift patterns.

As Dan Nguyen points out, there have been previous incidents resulting from incorrect assumptions about passenger weight. But I think we need to distinguish between factual errors (what is the average weight of an adult passenger) and classification errors (what exactly does the Miss prefix signify).

There is an important lesson for data management here. You may have a business glossary or data dictionary that defines an attribute called Prefix and provides a list of permitted values. But if different people (different parts of your organization, different external parties) understand and use these values to mean different things, there is still scope for semantic confusion unless you make the meanings explicit.



AAIB Bulletin 4/2021 (April 2021) https://www.gov.uk/government/publications/air-accident-monthly-bulletin-april-2021

Tui plane in ‘serious incident’ after every ‘Miss’ on board was assigned child’s weight (Guardian, 9 April 2021)

For further discussion and related examples, see Dan Nguyen's Twitter thread https://twitter.com/dancow/status/1380188625401434115



Monday, April 22, 2019

When the Single Version of Truth Kills People

@Greg_Travis has written an article on the Boeing 737 Max Disaster, which @jjn1 describes as "one of the best pieces of technical writing I’ve seen in ages". He explains why normal airplane design includes redundant sensors.

"There are two sets of angle-of-attack sensors and two sets of pitot tubes, one set on either side of the fuselage. Normal usage is to have the set on the pilot’s side feed the instruments on the pilot’s side and the set on the copilot’s side feed the instruments on the copilot’s side. That gives a state of natural redundancy in instrumentation that can be easily cross-checked by either pilot. If the copilot thinks his airspeed indicator is acting up, he can look over to the pilot’s airspeed indicator and see if it agrees. If not, both pilot and copilot engage in a bit of triage to determine which instrument is profane and which is sacred."

and redundant processors, to guard against a Single Point of Failure (SPOF).

"On the 737, Boeing not only included the requisite redundancy in instrumentation and sensors, it also included redundant flight computers—one on the pilot’s side, the other on the copilot’s side. The flight computers do a lot of things, but their main job is to fly the plane when commanded to do so and to make sure the human pilots don’t do anything wrong when they’re flying it. The latter is called 'envelope protection'."

But ...

"In the 737 Max, only one of the flight management computers is active at a time—either the pilot’s computer or the copilot’s computer. And the active computer takes inputs only from the sensors on its own side of the aircraft."

As a result of this design error, 346 people are dead. Travis doesn't pull his punches.

"It is astounding that no one who wrote the MCAS software for the 737 Max seems even to have raised the possibility of using multiple inputs, including the opposite angle-of-attack sensor, in the computer’s determination of an impending stall. As a lifetime member of the software development fraternity, I don’t know what toxic combination of inexperience, hubris, or lack of cultural understanding led to this mistake."

He may not know what led to this specific mistake, but he can certainly see some of the systemic issues that made this mistake possible. Among other things, the widespread idea that software provides a cheaper and quicker fix than getting the hardware right, together with what he calls cultural laziness.

"Less thought is now given to getting a design correct and simple up front because it’s so easy to fix what you didn’t get right later."

Agile, huh?


Update: CNN finds an unnamed Boeing spokesman to defend the design.

"Single sources of data are considered acceptable in such cases by our industry".

OMG, does that mean that there are more examples of SSOT elsewhere in the Boeing design!?




How a Single Point of Failure (SPOF) in the MCAS software could have caused the Boeing 737 Max crash in Ethiopia (DMD Solutions, 5 April 2019) - provides a simple explanation of Fault Tree Analysis (FTA) as a technique to identify SPOF.

Mike Baker and Dominic Gates, Lack of redundancies on Boeing 737 MAX system baffles some involved in developing the jet (Seattle Times 26 March 2019)

Curt Devine and Drew Griffin, Boeing relied on single sensor for 737 Max that had been flagged 216 times to FAA (CNN, 1 May 2019) HT @marcusjenkins

George Leopold, Boeing 737 Max: Another Instance of ‘Go Fever”? (29 March 2019)

Mary Poppendieck, What If Your Team Wrote the Code for the 737 MCAS System? (4 April 2019) HT @CharlesTBetz with reply from @jpaulreed

Gregory Travis, How the Boeing 737 Max Disaster Looks to a Software Developer (IEEE Spectrum, 18 April 2019) HT @jjn1 @ruskin147

And see my other posts on the Single Source of Truth.


Updated  2 May 2019

Monday, August 08, 2016

Single Point of Failure (Airlines)

Large business-critical systems can be brought down by power failure. Who knew?

In July 2016, Southwest Airlines suffered a major disruption to service, which lasted several days. It blamed the failure on "lingering disruptions following performance issues across multiple technology systems", apparently triggered by a power outage.
In August 2016 it was Delta's turn.

Then there were major problems at British Airways (Sept 2016) and United (Oct 2016).



The concept of "single point of failure" is widely known and understood. And the airline industry is rightly obsessed by safety. They wouldn't fly a plane without backup power for all systems. So what idiot runs a whole company without backup power?

We might speculate what degree of complacency or technical debt can account for this pattern of adverse incidents. I haven't worked with any of these organizations myself. However, my guess is that some people within the organization were aware of the vulnerability, but this awareness didn't somehow didn't penetrate the management hierarchy. (In terms of orgintelligence, a short-sighted board of directors becomes the single point of failure!) I'm also guessing it's not quite as simple and straightforward as the press reports and public statements imply, but that's no excuse. Management is paid (among other things) to manage complexity. (Hopefully with the help of system architects.)

If you are the boss of one of the many airlines not mentioned in this post, you might want to schedule a conversation with a system architect. Just a suggestion.


American Airlines Gradually Restores Service After Yesterday's Power Outage (PR Newswire, 15 August 2003)

British Airways computer outage causes flight delays (Guardian, 6 Sept 2016)

Delta: ‘Large-scale cancellations’ after crippling power outage (CNN Wire, 8 August 2016)

Gatwick Airport Christmas Eve chaos a 'wake-up call' (BBC News, 11 April 2014)

Simon Calder, Dozens of flights worldwide delayed by computer systems meltdown (Independent, 14 October 2016)

Jon Cox, Ask the Captain: Do vital functions on planes have backup power? (USA Today, 6 May 2013)

Jad Mouawad, American Airlines Resumes Flights After a Computer Problem (New York Times, 16 April 2013)

 Marni Pyke, Southwest Airlines apologizes for delays as it rebounds from outage (Daily Herald, 20 July 2016)

Alexandra Zaslow, Outdated Technology Likely Culprit in Southwest Airlines Outage (NBC News, Oct 12 2015)


Related post Single Point of Failure (Comms) (September 2016), The Cruel World of Paper (September 2016), When the Single Version of Truth Kills People (April 2019)


Updated 14 October 2016. Link added 26 April 2019

Friday, February 19, 2010

Situation Awareness - Airline Example

Looking at an example of Operational Awareness at Southwest Airlines, presented at a TIBCO user conference in 2008 or 2009

Southwest Airlines is venturing into the rules development space with the Early Alert System. EAS enables SWA to have a real-time model of its entire aircraft fleet, tracking such activities as taxi in, taxi out, and in gate turn. It does this by maintaining a data structure representing physical assets and the activities they perform. Incoming data from those assets update the data structure, and rules react to the changes. We hope to use this paradigm going forward to use rules to monitor other aspects of the enterprise, enabling a more agile and efficient response to the airline's daily operating challenges. Our main points will be the Overview, Feature Review, Design, Other Uses of Rules at present by SWA and the future of rules at SWA. Greg Barton: Southwest Airlines, Senior Software Engineer

FlightExplorer was offering situational and operational awareness to airlines and airports back in 2002.

  • real-time flight tracking information
  • keeping the operations staff abreast of pending arrivals and departures
  • delivering improved airfield safety and efficiency, as well as supporting billing

 

Kansas City, Dayton and Grand Rapids - Ford International Airports Take Off With Flight Explorer® to Track Aircraft and Provide Higher Level of Customer Service (Flight Explorer, 16 May 2002)

Thursday, March 27, 2008

Heathrow Terminal 5

Going Live

"... technical difficulties ... staff familiarisation ... teething problems ... technical defect ... brief system fault ... a few minor problems ... time to bed down ..."
A complex bundle of services relocated into a purpose-built space. Mutual recriminations between the collaborating parties (airport and airline). What lessons for SOA design, implemention and deployment here?

Identity, Privacy and Security

"BAA has had to drop controversial plans to fingerprint domestic passengers after the information commissioner expressed concerns about the move. The airport operator said fingerprinting was needed for border security. Instead it will take photographs while the proposal is discussed with the commissioner's office."
Some aspects of the design not approved by key stakeholders and regulators. Apparently the reason for this controversial requirement is that domestic and international travellers are mixed into the same space, rather than being kept separate, and this creates additional security risks. It was assumed that technology (fingerprinting or photography) would substitute for physical barriers. Perhaps BAA chiefs have been reading articles on "deperimeterization".

Meanwhile, one technical solution is quickly substituted for another, although I'm not sure I understand why taking (and presumably transmitting and storing) photographs of passengers is any less of an invasion of privacy than taking fingerprints.

Sources

BBC News, March 27th 2008

Related Posts 

Service-Oriented Security (August 2006)
Travel Hopefully (March 2008)
For Whose Benefit Are Airports Designed? (January 2013)

Tuesday, September 26, 2006

Lost Bags

I am sure I don't want to compete with Redmonk analyst Stephen O'Grady in the travel disruption stakes ...

... but the airlines have managed to lose my bag on three consecutive flights home from the USA. Once from Las Vegas via Los Angeles, and twice from St Louis via Chicago.

Is this a record? And what's it got to do with the service-oriented business?

On two of the three occasions, the problem was a tight connection. Well it wasn't a tight connection when I checked in, but when I got to the gate for the first leg I discovered that the flight was seriously delayed. Helpful ground staff managed to get me onto a different flight on both occasions, in one case as the doors were closing, but of course the bag was already checked in for the delayed flight. So hardly a surprise that the bag got left behind.

Trying to outwit the airlines and their propensity for delay, I booked an extremely long connection at Chicago so that there was no chance of missing my connection. But what about my bag joining me on the same flight? Perhaps having too much time for a process seems to leave room for a different class of error.

Apart from the slight uncertainty about whether and when, it's quite nice not to have to carry a heavy bag home from the airport, and have someone deliver it to you door instead. I'm getting quite used to the procedure.

But I got some interesting glimpses of what happens when a complicated process across multiple organizations goes wrong.

1. Collaboration failures (and consequent mistrust) between airlines. When there is an alliance between two airlines, it's always the other airline's fault when something goes wrong. An employee of airline A couldn't change the status of my ticket on airline B's system; an employee of airline B rolled his eyes when I said that I had flown with airline A for the first part of my journey.

2. Supply chain visibility and trust. When I got onto one plane, I asked whether my bag was on the same flight. The ground staff looked up the tag on the computer and assured me it was. At the other end, a customer service rep said this information might have been unreliable. (Perhaps the ground staff had lied to me in order to get me onto the plane without a fuss - the lost bag would then be someone else's problem.)

3. The baggage recovery system works after a fashion, but it's highly inefficient. It surely can't be economic to have a delivery van to take bags to passengers.

4. Airline schedules are generally designed around hubs. So there are lots of connecting flights. But connections get delayed, and bags get lost. This must surely affect the overall economics of the hub-and-spoke model of air travel.


I might add some more notes later ...

Sunday, February 20, 2005

Electronic Flight Bag

Some initial remarks about the concept of the Electronic Flight Bag, first posted in February 2005.

The electronic flight bag (EFB) is a concept for reducing or replacing an airline pilot’s flight bag with an electronic equivalent. ICM has implemented this concept in the form of an integrated product, for sale to commercial airlines, in cooperation with Intel, Jeppesen and IONA.
IONA marketing sheet (pdf)

I have not looked at the detail of this product, so I cannot comment on it. What I want to explore in this post are some of the more general implications of this kind of product, in an SOA context.

From the product marketing material, I extract three points about the flight bag.
  1. It is a familar object in commercial aviation.
  2. Commercial airlines would benefit from automating this object. ("The aviation industry has long recognized the benefits of replacing paper in the cockpit with an electronic document delivery system.")
  3. However, this is a complex object that has previously resisted automation. (Reference to limitations in technology along with regulatory and standardization issues.)
In previous waves of automation we saw isolated paperwork systems being colonized by the computer, and then we saw links being built between these isolated systems. Complex and highly connected objects like the flight bag missed out on these waves of automation, for several interlinked reasons.

To automate the flight bag, we need to conceptualize it as the living centre of a highly complex system - with secure and reliable links into many other operational systems both inside the airline and within other organizations (airport? plane maintenance? air traffic control? GIS?). The electronic flight bag therefore becomes a centre within a network of centres. It is this kind of automation that this characteristic of the SOA wave.

And this kind of automation calls for a different kind of process. We may start with a familiar concept (the paperless xyz - in this case, the paperless cockpit). But we then need to model the implications of this paperlessness in terms of the multiple functions of paper in a given ecosystem. We then have to form a collaboration between several suppliers to compose a solution, that can then be composed in turn with the context of use in a given airline.

I have a mental image of something like a brain transplant - the transplanted brain will only work if you make all the right connections. Obviously there are technological advances (notably web services) that make this task a bit easier - but you have to think about the organizational implications as well, so it remains a challenge.

Although some of the marketing material describes the product as "complete", this appears to conflict with the statement that it is "designed for built-in integration and growth". From an SOA perspective, I hope the latter is true.

The electronic flight bag is a good step towards an SOA world (continuous network of services) if it satisfies four criteria:
  1. It creates a new service in its own right.
  2. It is composed of smaller services.
  3. It helps to complete one or more larger-scale services.
  4. It hints at future larger-scale services - it opens up future possibilities that we haven't yet fully formulated.


Update (June 2020)

Since I wrote this post, the Electronic Flight Bag has been developed and adopted across many parts of the aviation business. See Wikipedia: Electronic Flight Bag (article first created January 2006)