Wednesday, September 03, 2008

Analyzing the Rusty Lawnmower

In my previous post on Responding to Uncertainty, I imagined a garden equipment company responding to an event, based on the appearance (via satellite image) of a rusty lawnmower in John's garden. In that post I was addressing a conceptual question about the nature of the event. In this post I want to discuss a more practical question about the structure and formation of an appropriate event processing network for the garden equipment company.

My starting point is the JDL model of information fusion, via Tim Bass, which would indicate the following event flow architecture.

0. Data collection.
    Obtain satellite images.

1. Situation Picture/Event Refinement.
    Identify and track some objects, provisionally labelling them "garden", "grass" and "rusty lawnmower".

2. Situation Refinement.
    By comparing the map references of these objects with John's address, we associate these objects with John. Furthermore, we can look at the history of these objects over an extended series of images, to see how long they've been there. We can also look at John's history, to see how long he has been living at this address and whether he has bought any gardening stuff in recent years. At this point, we may revise the object labels, because we realise that the rusty object could possibly equate to some other item John bought three years ago.

3. Opportunity/Impact Assessment.
    Infer John's intentions about gardening, and the possibility of selling him a new lawnmower.

4. Process Refinement.
    Monitor how many lawnmowers we have sold on this basis. Ideally, the process refinement should have some basis for monitoring false negatives (where we didn't spot the rusty lawnmower because it was half-hidden behind an overgrown bush) as well as false positives (where the rusty object was actually an expensive item of sculpture).

The original JDL model refers to these stages as "levels", but the Data Fusion website suggests that this creates confusion because "a hierarchy does not exist from a conceptual point of view". (They might possibly be "levels" in the cybernetic sense, similar to how the term was used by Bateson.)

The model is interesting for several reasons.
  • The model supposedly reflects a "natural" cognitive process, including both realtime and historical situational processing.
  • The event flows can converge at each stage: thus we might have many (possibly heterogeneous) sources of data flowing into one situation picture, or many (possibly contradictory) situation pictures flowing into one impact assessment.
  • Each stage uses different quantities and qualities of knowledge and interpretation.
  • Therefore each stage has different degrees of mission sensitivity. An organization may be comfortable (or may have no choice) with using third party services as event sources, and may be willing to share basic situation pictures with its partners, but may regard the impact assessment as highly confidential.
From a system development perspective, we now have two views of the event processing network - a component oriented view (featuring the concept of the rusty lawnmower, together with patterns and rules for recognizing and responding to instances of this concept) and the flow-oriented view (featuring the stages of the data fusion model). Opher Etzion talks about these two views in his post On Flow-Oriented and Component-Oriented Development. But we probably also need a business/mission view, which drives the requirements in the first place.

The garden equipment company probably didn't start the analysis from the concept of a rusty lawnmower. One possibility is that they started from the idea of some highly generic class of events that would prompt sales and marketing activity. This class of events is then decomposed (top-down) to identify a range of detailed events, some of which might be recognized from satellite images. Another possibility is that they worked backwards from the history of successful lawnmower sales, and then analyzed the relevant satellite images to try and identify any patterns that could be statistically correlated (bottom-up) with this sales history.

In both cases, we need to decouple the data from the explanation of the data. Let's say we find a visual pattern that correlates to lawnmower sales. Our hypothesis is that the pattern indicates a rusty lawnmower, but even if our hypothesis turns out to be incorrect the pattern still somehow works! So we don't throw away the pattern, we just have to look for a different explanation.

This is of course how science has always worked. Many important scientific advances have been based on incorrect explanations. Galileo misunderstood how the telescope worked, but that didn't stop him making some important discoveries. Business networks can be very complex, and sometimes we don't fully understand what is going on, but we still need to build systems that respond as intelligently as possible to what is going on. The point is not to construct a universal theory of lawn-mowers, the point is to sell more lawn-mowers.

Thanks to Opher and Tim for private discussion.

No comments: