Sunday, October 03, 2010

Single cause of failure

In May 2010, there was a mysterious plunge in US share prices. At the time an SEC investigation found that no single cause was to blame [BBC News 12 May 2010]. Following further investigation, the SEC now believes that the "flash crash" did indeed have a single cause, which the SEC has traced to an algorithmic trade of around $4 billion, executed by a single trader's computer program [BBC News 1 October 2010]. The authorities have since introduced "circuit breakers", which may help to mitigate the effects of such algorithmic trades in future, and there are hints that further measures could be on the way.

I haven't looked at the detail of these circuit breakers, but my architectural instincts make me uncomfortable with the idea of bolting on an additional mechanism into an already complicated system, in order to mitigate a single point of failure. Surely these circuit breakers will themselves be subject to perverse consequences, as well as being anticipated (and perhaps even deliberately triggered) by ever more sophisticated algorithms.

One of the key principles of distributed systems architecture is the avoidance of a single point of failure. Technical architects tend to focus on technical failure, although security experts often remind them of the equal dangers of socially engineered points of failure in technical systems. Meanwhile, enterprise architects need to pay attention to the possible failure modes of the business and its ecosystem, from a business and sociotechnical perspective.

In the case of the "flash crash", the key question for market regulators and market players is about the resilience and intelligence of the market in the face of certain classes of activity. Although the finger of blame is now pointing to a piece of software, the architectural question here is not software architecture but market architecture - regarding the market as a complex sociotechnical system in which pieces of software interact with other social and economic actors. Architects should beware of thinking that "single point of failure" is merely a technical question.

No comments: