Showing posts with label distributed systems. Show all posts
Showing posts with label distributed systems. Show all posts

Monday, December 22, 2008

SOA in an Offline World

There is a discussion on Linked-In entitled SOA in an Offline World. The discussion has a technical focus: What kind of technology architecture to use with unreliable or intermittent network communications. There are some design patterns that may support intermittent connection, such as decoupling and asynchronous communications, and these patterns may be appropriate in a range of situations including military and medical.

A broader architectural question is whether we can use layering to hide these technical issues from the business-facing services in the layers above. In an ideal world, we would have a disruption-tolerant service platform, and the core business services and applications can then operate as if we had perfect and permanent connectivity.

In the 1990s, Peter Deutsch and James Gosling identified Eight Fallacies of Distributed Computing - invalid assumptions that inexperienced designers make when designing distributed systems. See Arnon Rotem Gal-Oz and Paul Vincent (TIBCO). These include the assumption of perfect and permanent connectivity.

By the way, Tim Bass questions whether anyone nowadays suffers from these fallacies. I think he's got a point, but perhaps the problem here is in the word "fallacy". If you ask a designer if he believes connectivity is going to be perfect, he will almost certainly say no. But if you inspect his designs, however, you may well find that he has failed to allow adequately for imperfect connectivity. Not so much a failure of belief as a failure of attention.

So the important question here is - can we compensate for the imperfections of distributed systems by having a really clever architecture, supported by really clever technology, so that the applications can operate as if we didn't have any of the problems of distribution at all? Some people may well believe that to be possible, either now or in the foreseeable future.

I don't think it's so easy. I have long argued that SOA needs to embrace three-valued logic. If this is done properly, it would make the whole architecture disruption-tolerant, not just the underlying layers. We also need to understand how disruption-tolerance affects the behaviour of the whole business-facing system. Not just technology then.

Wednesday, March 12, 2008

Distributed Event Processing

Went to a good talk at the BCS SPA last week by Paul Vincent of TIBCO entitled "Advanced CEP and EDA - Why the buzz on Wall Street?"
I was particularly interested in Paul's emphasis on the distributed aspects of event processing.
  • Distributed event source - "A series of produced items fails at various QA stages, and their common attribute was a storage location - Multiple suppliers for a subcomponent are reporting delivery delays"
  • Distributed event cache - This is a key component of TIBCO's advanced CEP architecture
  • Distributed event consumption (destination) - Delivering rich situation awareness to field operations. This is related to the military doctrine of "Power to the Edge".
At my prompting, Paul talked briefly and circumspectly about the use of CEP in the military. He averred that at present the military are mostly using hand-built event processing rather than commercial products (but see later comment by Tim Bass).

Saturday, January 19, 2008

How Many Events?

Homeward bound, delayed in Zurich by the consequences of a crashed Boeing at Heathrow, Opher Etzion blogs On events in flight management. It seems to him "that more events related to flights happened relative to previous years".

For regular business travellers, the best thing we can say about a flight is that it was "uneventful". However good the catering, however large and comfortable the seats, however charming and sexy the air staff, none of this can make up for the inconvenience of delays or lost baggage. So when Opher counts the number of events, I assume he is referring to adverse events.

Of course there are countless events in flight management that are completely invisible to passengers - or even to the air crew - unless something goes wrong, and perhaps even then. In a distributed man-machine system, different parts of the system will be paying attention to different types of event, at different levels of granularity. (An air traffic controller deals with the event PlaneAwaitingLandingSlot; his manager deals with the aggregate event NumberOfPlanesAwaitingLandingSlotIsGreaterThanX.) We can think of this in terms of the architecture of attention - this calls for accurate modelling of events, leading to clear system design.

Furthermore, as Opher points out, there is an important distinction between attention (detection) and action. "In some cases ... the detection is very easy, the complexity is in the response." Last week, Opher apparently experienced a failure of response. "The captain told us several times that he is pushing them to send buses, but they are not responsive ..." It is often easy to blame Them, but it isn't always clear who exactly is responsible.

So the other challenge in designing complex event-driven systems-of-systems is to specify the architecture of response. There are many people and systems and organizations involved in flight management, and a complex event may call for a complex and collaborative response.

So is there a universal and homogeneous event model shared by all the participants in flight management? I don't think we can reasonably insist on this. Instead, we have to allow for some kind of amplification and attenuation - where different subsystems may have different event models, and there is some mechanism for translating and coordinating across these different models. I think this approach is more flexible, more robust, and compatible with loosely coupled SOA.

Monday, July 23, 2007

Scale and Self-Organization

Werner Vogels, CTO of Amazon, knows a thing or two about implementing large-scale distributed SOA. So it's worth taking seriously when he tells us that

"For any truly scalable agile environment, self-organization is essential."

In his post Reading References, he goes on to recommend some papers and books on scalability and self-organization.
My own favourite book on self-organization remains Kevin Kelly's book Out of Control, now available online on KK's website. See also the entry on Self-Organization in Principia Cybernetica.

If we accept that self-organization can be effective for addressing some classes of complex problem, next step is to develop practical methods for engineering self-organizing systems. There are some promising research projects in this area, and there's a conference in Leibzig in September (SOAS 2007). I probably won't have time to attend the conference myself, but I shall look out for the proceedings. I wonder whether Amazon will be represented?

See also Werner Vogels on Scalability (April 2006)

Monday, May 21, 2007

Clouds and Clocks 2

Pat Helland regrets ...

Pat Helland has resumed blogging, having recently returned to Microsoft from a sojourn at Amazon. In his latest post SOA and Newton's Universe, he renounces the quasi-Newtonian paradigm of distributed systems to which he adhered for most of his 30-year career, and outlines an alternative paradigm with some resemblance to the Special Theory of Relativity.

The Newtonian paradigm of distributed systems is that we are trying to make many systems appear as the One System. This paradigm may be linked with the idea of the Global Schema or Universal Ontology. Helland contrasts this with an alternative paradigm of distributed systems, in which the systems are entirely dependent on the point of view of the observer, and there is no Universal Ontology. (I'd have wanted to use the term relativistic semantics here, but it has already been bagsied for academic linguistics - see for example Catfood and Buses.)

Helland sees this in terms of a relaxation of consistency. I disagree. Distributed systems-of-systems (including SOA) must follow a consistent logic - but not necessarily a traditional two-valued logic. Flexibility comes from being underdetermined (clouds) rather than overdetermined (clocks).

See my earlier posts Beyond Binary Logic and On Clouds and Clocks. See also Philip Boxer on Modelling Structure-Determining Processes.