Wednesday, November 26, 2008

From Complex Events to Predictive Analytics

Beth Gold-Bernstein (eBizQ) reckons that predictive analytics, which she describes as "a capability based on complex event processing (CEP)" is A Good Investment for a Down Economy.

Hans Gilde is scornful. In Predictive analytics, CEP and your local mechanic, he points out that the key capability for an effective predictive system is "a solid foundation for ongoing critical analysis of the effectiveness of your analytics".

I would extend his scorn to anyone who uses the term "real-time" in a sloppy and confused manner. I keep reading about "real-time complex event processing", but in many cases it looks as if much of the complex processing is actually done at design-time, leaving the real-time portion fairly quick and simple. And that's probably as much as the current state-of-the-art technology can manage.

To do predictive analytics properly, as Hans says, you need critical analysis - what is sometimes called double-loop learning. How does a CEP system learn new patterns? How does a CEP get recalibrated as the environment evolves? How do we control (and reduce) the level of false positives and false negatives? And how much of this analysis does it make sense to even attempt in real-time?

So I looked at a recent paper in the IBM Systems Journal, Generating real-time complex event-processing applications (Volume 47, Number 2, 2008).

"Complex event processing (CEP) enables the extraction of meaningful and actionable information from event streams. The CEP technology is intended to provide applications with a flexible and scalable mechanism for constructing condensed, refined views of the data. CEP correlates the data (viewed as event streams) in order to detect and report meaningful predefined patterns, thus supplying the application with an effective view of the accumulated incoming data (events), and allowing the application to react to the detections by executing actions."

The IBM paper talks about some of the challenges of achieving genuine real-time data extraction based on predefined patterns, and talks about the possibility of real-time CEP applications, but is careful not to make any larger claims.

Other writers are not so careful, and you can find lots of websites promising real-time predictive analytics or real-time decision-making. But there are not only technological but conceptual limits to what you can do in real-time.


Marco Seiriƶ said...

I think many CEP vendors have this as their next item on the todo list. Many customers tell us that the really don't know what rules to create, the just "want to know when something unexpected" happens.

I have been looking into ideas from military information fusion, Neural Networks, Kalman filters and other kinds of useful tools to automatically create and tune rules based on historical data and the current situation.

The good news is that there are lots of mature technology that you can use in the CEP context to solve this problem, provided that you gave this some thought when designing you CEP product in the first place.

I'm not aware of any CEP vendor with this capability yet, but I'm pretty sure there's more to expect in this area in the coming years.

On other area which I hope will see more action is formal verification or model checking of the rules/queries in the CEP tools. With a couple of thousand of rules it's not very easy to verify that everything is working as you'd expect.

Hans said...

Hey there, well that post came out a little more scornful than I intended.

A couple comments on your post:

When it comes to analytics, the only distinction that matters is some performance criteria. For prediction, it will have something to do with metrics on residuals. For optimization, it can be tougher, but there are various techniques.

But is having more complex processing done in real-time really a goal? I would think that it's only a goal to the extent that it does a better job. And who says that will be the case? This seems like an unjustified generalization.

Also about Marcos comments on self-maintaining algorithms. We must remember that that there is no solution or algorithm that will work for every problem. A prediction algorithm that stays efficient and unbiased for one problem, can go radically wrong for another.

And unfortunately, understanding why an algorithm works in one case but not in another, takes a lot of training. Any product that comes with learning algorithms will take a lot of training to use. No one knows whether there will ever be a product that just sucks up your data and spits out good stuff without requiring a degree to operate properly.

Richard Veryard said...

Hans says

But is having more complex processing done in real-time really a goal? I would think that it's only a goal to the extent that it does a better job. And who says that will be the case? This seems like an unjustified generalization.

Absolutely. So why this fetish about "real-time complex event processing" - mostly from people who don't seem to know or care what "real-time" really means.

My point is that analytics is something that people do with the aid of tools (a sociotechnical system) rather than something that tools do for you. I'd like to see good tools making analytics accessible to a broader range of intelligent users with a relatively small amount of training, rather than being restricted to a limited number of highly trained specialists. But I don't believe that any interesting class of analytical problems can do without human intelligence for significant periods of time. Putting even the most sophisticated systems onto autopilot introduces risk, as we saw in the recent meltdown in the financial markets, and I have seem no evidence to support the naive claim that this meltdown would have been avoided if only the automatic trading systems had been faster and cleverer.

Richard Veryard said...

Marco says

Many customers tell us that they really don't know what rules to create, they just "want to know when something unexpected" happens.

Maybe that's what they are asking for, but that doesn't mean it is possible, However, incoherent or logically impossible requirements can be an excellent starting point for further analysis.

John Dobson and I once wrote a paper on Third Order Requirements Engineering. See brief extract on slideshare

Reasoning about the unexpected is a fascinating challenge, and I should be delighted to talk to any vendor who wants to build better tools for intelligent users.

Hans said...

Yes, I think I agree with you then.

Note that the analytics causing the meltdown were not in the trading systems. They were in the risk systems that (a) improperly judged certain risks of default and (b) failed to consider the fact that no matter how good the debt is, if it's priced by the market then it's only worth what the market will pay.

If by "clever" you meant "not ignoring obvious problems" then these analytics systems (also known as Excel spread sheets) could have prevented the meltdown AFAIK.

Richard Veryard said...

Hans, you are repeating the claim that some kind of systems could have prevented the meltdown. But where's the evidence for this? Crashes like this happen from time to time in economic history: how are a few spreadsheets going to stop them? I'm sorry, but that's a bit like saying that the French Revolution could have been prevented if only the French aristocracy had had Twitter.

Even if you could produce a computer system that perfectly calculated the True Value of every asset in the world, traders wouldn't use them because they wouldn't make enough money. What traders want is a recursive system that calculates what every other trader thinks the value is, in order to spot market movements a few moments before everyone else. But why would you expect the widespread possession of such systems to make the global economy any less unstable?

Hans said...

I meant that if people had not deluded themselves into ignoring obvious risks, they would have put more realistic calculations in their spreadsheets, resulting in better judgment that would, AFAIK, have prevented the credit crisis. Although it would have also had other effects, so who can say which path was better.

As you say, the decision of always up to people.