Friday, March 14, 2008

Clouds and Clocks 3

Some CEP vendors (Aleri, Coral8, RuleCore) are boasting that their CEP software is "deterministic" or "consistent".

In other words, "clockwork" rather than "cloudlike". Mark Tsimelzon, President & CTO of Coral8, insists that we shouldn't think of an event cloud at all - he prefers the concept of event stream.

But what do "determinism" and "consistency" actually mean here, and why are they important?

Both Aleri and Coral8 define “determinacy” to mean that a set of input data produces the same output data.

Aleri adds that the order in which the data arrives doesn’t matter. This only follows from the definition of determinism if we assume that "set" means a pure set, without sequence. Perhaps a better word for this property is "commutativity". There is also the related property of idempotence, in which it doesn't matter if the same event or message appears more than once. Aleri also defines “consistency” to mean that the output is predictable from the input - which sounds equivalent to their definition of "determinacy" - but they go on to say that consistency means that "the results are the ones we’d expect" - which sounds more like correctness to me.

Meanwhile, Mark of Coral8 introduces three degrees of determinism: non-deterministic, single-stream and multiple-stream determinism. He hopes his post has demystified the notion of determinism a little. Er, thanks Mark.

But why does determinism matter? Mark explains that this has to do with testing, especially regression testing. If we cannot compare results from two runs of the same stream of events, then our traditional approach to software testing breaks down. At least, that's the only reason he mentions in his blog.

This is rather like the value of the controlled experiment in science. A laboratory provides a controlled environment, in which most of the variables can be fixed, so that the cause-and-effect can be isolated. Software testing also requires a controlled environment, so that any unexpected variation in results can be reliably traced to a specific cause - such as a software bug.

But complex systems are not particularly amenable to laboratory experiments, because the requisite complexity cannot be properly contained and controlled. And complex internet systems raise a similar challenge.

Determinism may be useful from a software engineering perspective, but it seems to deny some of the technological potential of complex event processing, including machine learning. Surely the whole point of machine learning is that the system doesn't always produce the same results on the same input, but is capable of producing improved results on the same input.

So I'd like to see a wider debate about the limits of determinism. I've been talking about this for a while in an SOA context - see my earlier post on Determinism.



Consistency and Determinacy (Aleri March 2008)

Marco Seiriƶ, Context Management (RuleCore June 2009)

Mark Tsimelzon, Determinism in CEP (Coral8 October 2007)

Mark Tsimelzon, Unclouding and streamlining your thinking about CEP use cases (Coral8 October 2007)


Links via WayBack Machine. Updated 2 December 2014.

No comments: