Architecture, Data and Intelligence: semantics

Showing posts with label semantics. Show all posts

Monday, March 06, 2023

Trusting the Schema

A long time ago, I did some work for a client that had an out-of-date and inflexible billing system. The software would send invoices and monthly statements to the customers, who were then expected to remit payment to clear the balance on their account.

The business had recently introduced a new direct debit system. Customers who had signed a direct debit mandate no longer needed to send payments.

But faced with the challenge of introducing this change into an old and inflexible software system, the accounts department came up with an ingenious and elaborate workaround. The address on the customer record was changed to the address of the internal accounts department. The computer system would print and mail the statement, but instead of going straight to the customer it arrived back at the accounts department. The accounts clerk used a rubber stamp PAID BY DIRECT DEBIT, and would then mail the statement to the real customer address, which was stored in the Notes field on the customer record.

Although this may be an extreme example, there are several important lessons that follow from this story.

Firstly, business can't always wait for software systems to be redeveloped, and can often show high levels of ingenuity in bypassing the constraints imposed by an unimaginative design.

Secondly, the users were able to take advantage of a Notes field that had been deliberately underdetermined to allow for future expansion.

Furthermore, users may find clever ways of using and extending a system that were not considered by the original designers of the system. So there is a divergence between technology-as-designed and technology-in-use.

Now let's think what happens when the IT people finally get around to replacing the old billing system. They will want to migrate customer data into the new system. But if they simply follow the official documentation of the legacy system (schema etc), there will lots of data quality problems.

And by documentation, I don't just mean human-generated material but also schemas automatically extracted from program code and data stores. Just because a field is called CUSTADDR doesn't mean we can guess what it actually contains.

Here's another example of an underdetermined data element, which I presented at a DAMA conference in 2008. SOA Brings New Opportunities to Data Management.

In this example, we have a sales system containing a Business Type called SALES PROSPECT. But the content of the sales system depends on the way it is used - the way SALES PROSPECT is interpreted by different sales teams.

Sales Executive 1 records only the primary decision-maker in the prospective organization. The decision-maker’s assistant is recorded as extra information in the NOTES field.
Sales Executive 2 records the assistant as a separate instance of SALES PROSPECT. There is a cross-reference between the assistant and the boss

Now both Sales Executives can use the system perfectly well - in isolation. But we get interoperability problems under various conditions.

When we want to compare data between executives
When we want to reuse the data for other purposes
When we want to migrate to new sales system

(And problems like these can occur with packaged software and software as a service just as easily as with bespoke software.)

So how did this mess happen? Obviously the original designer / implementer never thought about assistants, or never had the time to implement or document them properly. Is that so unusual?

And this again shows the persistent ingenuity of users - finding ways to enrich the data - to get the system to do more than the original designers had anticipated.

And there are various other complications. Sometimes not all the data in a system was created there, some of it was brought in from an even earlier system with a significantly different schema. And sometimes there are major data quality issues, perhaps linked to a post before processing paradigm.

Both data migration and data integration are plagued by such issues. Since the data content diverges from the designed schemas, it means you can't rely on the schemas of the source data but you have to inspect the actual data content. Or undertake a massive data reconstruction exercise, often misleadingly labelled "data cleansing".

There are several tools nowadays that can automatically populate your data dictionary or data catalogue from the physical schemas in your data store. This can be really useful, provided you understand the limitations of what this is telling you. So there a few important questions to ask before you should trust the physical schema as providing a complete and accurate picture of the actual contents of your legacy data store.

Was all the data created here, or was some of it mapped or translated from elsewhere?
Is the business using the system in ways that were not anticipated by the original designers of the system?
What does the business do when something is more complex than the system was designed for, or when it needs to capture additional parties or other details?
Are classification types and categories used consistently across the business? For example, if some records are marked as "external partner" does this always mean the same thing?
Do all stakeholders have the same view on data quality - what "good data" looks like?
And more generally, is there (and has there been through the history of the system) a consistent understanding across the business as to what the data elements mean and how to use them?

Related posts: Post Before Processing (November 2008), Ecosystem SOA 2 (June 2010), Technology in Use (March 2023)

Saturday, September 04, 2021

Metadata as a Process

Data sharing and collaboration between different specialist areas requires agreement and transparency about the structure and meaning of the data. This is one of the functions of metadata.

I've been reading a paper (by Professor Paul Edwards and others) about the challenges this poses in interdisciplinary scientific research. They identify four characteristic features of scientific metadata, noting that these features can be found within a single specialist discipline as well as cross-discipline.

Fragmentation - many people contributing, no overall control
Divergent - multiple conflicting versions (often in Excel spreadsheets)
Iterative - rarely right first time, lots of effort to repair misunderstandings and mistakes
Localized - each participant is primarily focused on their own requirements rather than the global picture

They make two important distinctions, which will be relevant to enterprise data management as well.

Firstly between product and process. Instead of trying to create a static, definitive set of data definitions and properties, which will completely eliminate the need for any human interaction between the data creator and data consumer, assume that an ongoing channel of communication will be required to resolve emerging issues dynamically. (Some of the more advanced data management tools can support this.)

Secondly between precision and lubrication. Tight coupling between two systems requires exact metadata, but interoperability might also be achievable with inexact metadata plus something else to reduce any friction. (Metadata as the new oil, perhaps?)

Finally, they observe that metadata typically falls into the category of almost standards.

Everyone agrees they are a good idea, most have some such standards, yet few deploy them completely or effectively.

Does that sound familiar?

J Bates, The politics of data friction (Journal of Documentation, 2017)

Paul Edwards, A Vast Machine (MIT Press 2010). I haven't read this book yet, but I found a review by Danny Yee (2011)

Paul Edwards, Matthew Mayernik, Archer Batcheller, Geoffrey Bowker and Christine Borgman, Science Friction: Data, Metadata and Collaboration (Social Studies of Science 41/5, October 2011), pp. 667-690.

Martin Thomas Horsch, Silvia Chiacchiera, Welchy Leite Cavalcanti and Björn Schembera, Research Data Infrastructures and Engineering Metadata. In Data Technology in Materials Modelling (Springer 2021) pp 13-30

Jillian Wallis, Data Producers Courting Data Reusers: Two Cases from Modeling Communities (International Journal of Digital Curation, 2014, 9/1, 2014) pp 98–109

Friday, April 09, 2021

Near Miss

A serious aviation incident in the news today. A plane took off from Birmingham last year with insufficient fuel, because the weight of the passengers was incorrectly estimated. This is being described as an IT error.

As Cathy O'Neil's maxim reminds us, algorithms are opinions embedded in code. The opinion in this case was the assumption that the prefix Miss referred to a female child. According to the official report, published this week, this is how the prefix is used in the country where the system was programmed.

In this particular flight, 38 adult women were classified as Miss, so the algorithm estimated their weight as 35 kg instead of 69 kg.

The calculation error was apparently compounded by several human factors.

A smaller discrepancy had been spotted and corrected on a previous flight.

The pilot noticed that there seemed to be an unusually high number of children on the flight, but took no action because the pandemic had disrupted normal expectations of passenger numbers.

The software was being upgraded, but the status of the fix at the time of the flight was unclear. There were other system-wide changes being implemented at the same time, which may have complicated the fix.

Guidance to ground staff to double-check the classification of female passengers was not properly communicated and followed, possibly due to weekend shift patterns.

As Dan Nguyen points out, there have been previous incidents resulting from incorrect assumptions about passenger weight. But I think we need to distinguish between factual errors (what is the average weight of an adult passenger) and classification errors (what exactly does the Miss prefix signify).

There is an important lesson for data management here. You may have a business glossary or data dictionary that defines an attribute called Prefix and provides a list of permitted values. But if different people (different parts of your organization, different external parties) understand and use these values to mean different things, there is still scope for semantic confusion unless you make the meanings explicit.

AAIB Bulletin 4/2021 (April 2021) https://www.gov.uk/government/publications/air-accident-monthly-bulletin-april-2021

Tui plane in ‘serious incident’ after every ‘Miss’ on board was assigned child’s weight (Guardian, 9 April 2021)

For further discussion and related examples, see Dan Nguyen's Twitter thread https://twitter.com/dancow/status/1380188625401434115

Friday, May 15, 2015

BankSpeak

#WorldBank #DataModel I recently went through a data modelling exercise, underlining and classifying the nouns in a set of functional design documents for a large client project. So I was interested to read an article based on an analysis of World Bank reports over the last fifty years, based on a similar technique. Some of the authors' key findings resonated with me, because I have seen similar trends in the domain of enterprise architecture.

The article looks at the changes in language and style during the history of the World Bank. For the first couple of decades, its reports were factual and concrete, and the nouns were specific - investments created assets and produced measurable outcomes, grounded in space and time. The dominant note is of factual precision - demarcating past accomplishments, current actions, necessary policies and future projects - with a clear sense of cause and effect.

"A clear link is established between empirical knowledge, money flows and industrial constructions: knowledge is associated with physical presence in situ, and with calculations conducted in the Bank's headquarters; money flows involve the negotiation of loans and investments with individual states; and the construction of ports, energy plants, etc., is the result of the whole process. In this eminently temporal sequence, a strong sense of causality links expertise, loans, investments, and material realizations."

In recent decades, the Bank's language has changed, becoming more abstract, more distant from concrete social life. The focus has shifted from physical assets (hydroelectric dams) to financial ones (loans guarantees), and from projects to 'strategies'. Both objectives (such as 'poverty reduction') and solutions (such as 'education', 'structural adjustment') are disengaged from any specificity: they are the same for everybody, everywhere. The authors refer to this as a 'bureaucratization' of the Bank’s discourse.

"This recurrent transmutation of social forces into abstractions turns the World Bank Reports into strangely metaphysical documents, whose protagonists are often not economic agents, but principles—and principles of so universal a nature, it's impossible to oppose them. Levelling the playing field on global issues: no one will ever object to these words (although, of course, no one will ever be able to say what they really mean, either). They are so general, these ideas, they're usually in the singular: development, governance, management, cooperation. ... There is only one way to do things: one development path; one type of management; one form of cooperation."

I have seen architectural documents that could be described in similar terms - full of high-level generalizations and supposedly universal principles, which provide little real sense of the underlying business and its requirements. Of course, there is sometimes a need for models that abstract away from the specifics of space and time: for example, a global organization may wish to establish a global set of capabilities and common services, which will support local variations in market conditions and business practices. But architects are not always immune to the lure of abstract bureaucracy.

In Bankspeak, causality and factuality is replaced by an accumulation of what the authors (citing Boltanski and Chiapello) call management discourse. For example, the term 'poverty' is linked to terms you might expect: 'population', 'employment', 'agriculture' and 'resources'. However the term 'poverty reduction' is linked with a flood of management terms: 'strategies', 'programmes', 'policies', 'focus', 'key', 'management', 'report', 'goals', 'approach', 'projects', 'frameworks', 'priorities', 'papers'.

We could doubtless find a similar flood of management terms in certain enterprise architecture writings. However, while these management terms do have a proper role in architectural discourse, we must be careful not to let them take precedence over the things that really matter. We need to pay attention to business goals, and not just to the concept of "business goal".

Franco Moretti and Dominique Pestre, BankSpeak - The Language of World Bank Reports (New Left Review 92, March-April 2015)

Related post: Deconstructing the Grammar of Business (June 2009)

Thursday, June 18, 2009

Deconstructing The Grammar of Business

@JohnIMM (John Owens) trots out a familiar piece of advice about data modelling today.

"Want to know what data entities your business needs? Start with the nouns in the business function names."

Starting with the nouns is a very old procedure. I can remember sitting through courses where the first exercise was to underline the nouns in a textual description of some business process. So when I started teaching data modelling, I decided to make this procedure more interesting. I took an extract from George Orwell's essay on Hop-Picking, and got the students to underline the nouns. Then we worked out what these nouns actually signified. For example, some of them were numbers and units of measure, some of them were instances, and some of them were reifications. (I'll explain shortly what I mean by reification.) Only a minority of the nouns in this passage passed muster as data entities. Another feature of the extract was that it used a lot of relatively unfamiliar terms - few of us had experience measuring things in bushels, for example - and I was able to show how this analytical technique provided a way of getting into the unfamiliar terminology of a new business area. I included this example in my first book, Pragmatic Data Analysis, published in 1984 and long out of print.

One problem with using this procedure in a training class is that it gives a false impression of what modelling is all about. Modelling is not about translating a clear written description into a clear diagrammatic structure; in the real world you don't have George Orwell doing your observation and writing up your interview notes for you.

Now let me come on to the problem of reification. The Zachman camp has started to use this word (in my view incorrectly) as an synonym of realisation - in other words, the translation and transformation of Ideas into Reality. (They claim this notion can be traced back to the ancient Greeks, but they do not provide any references to support this claim. As far as I am aware, this is a mediaeval notion; it can for example be found in the work of the Arab philosopher ibn Arabi, who talks about entification in apparently this sense.) However, modern philosophers of language use the word "reification" to refer the elevation of abstract ideas (such as qualities) to Thingness. One of the earliest critics of reification was Ockham, who objected to the mediaeval habit of multiplying abstract ideas and reified universals; his principle of simplicity is now known as Ockham's Razor.

In our time, Quine showed how apparently innocent concepts often contained hidden reification, and my own approach to information modelling has been strongly influenced by Quine. For example, I am wary of taking "customer" as a simple concept, and prefer to deconstruct it into a bundle of bits of intentionality and behaviour and other stuff. (See my post on Customer Orientation.) As for business concepts like "competitor" or "prospect", I generally regard these these as reifications resulting from business intelligence.

Reification tends to obscure the construction processes - tempting us to fall into the fallacy of regarding the reifications as if they directly reflected some real world entities. (See my posts on Responding to Uncertainty 1 and 2.) So I like to talk about ratification as a counterbalance to reification - making the construction process explicit.

Of course, John Owens is right insofar as the grammar of the data model should match the grammar of the process model. And of course for service-oriented modelling, the grammar of the capabilities must match that of the core business services. But what is the grammar of the business itself? Merely going along with the existing nouns and verbs may leave us short of discovering the deep structural patterns.

Update May 2024. The distinction I'm making here between reification and its opposite, which I've called ratification, can be compared to Simondon's distinction between ontology and ontogenesis, so I shall need to write more about that. Meanwhile, I now acknowledge the possibility that some notion of reification might be found among the Neoplatonists but that's several hundred years after Plato himself.

Related posts: Reification and Ratification November 2003), Business Concepts and Business Types (May 2009), Business Rule Concepts (December 2009), The Topography of Enterprise Architecture (September 2011), Conceptual Modelling - Why Theory (November 2011), From AS-IS to TO-BE (October 2012), BankSpeak (May 2015), Mapping out the entire world of objects (July 2020)

Sunday, May 03, 2009

Business Concepts and Business Types

cross-posted from SOA Process blog

One thing I keep getting asked why the CBDI Service Architecture and Engineering method distinguishes between the Business Concept Model (describing things as they are in the real world) and the Business Type Model (describing things as they are represented within the enterprise systems).

Here's a simple example. As I mention in my post Will Libraries Survive? there is a conceptual difference between book title and book copy. Some libraries identify each physical copy individually, while other libraries merely count the physical copies of a given book title. The physical copies are distinct in the real world, but may be indistinguishable in the library systems. Information strategy includes making this kind of choice.

Let's suppose a library buys ten copies of a book. To start with these copies are physically indistinguishable. The library then attaches some identifying marks to the book, including an ID number. This marking depend on the information strategy - the library could choose to give all ten copies the same number or could choose to give each copy a different number.

If we assume that this ID number is used throughout the library systems, then it is clearly important to know whether the ID number identifies the book title or the book copy. So this is a critical element of the Business Type Model, which reflects a whole series of strategic decisions of this nature, and then feeds into the identification of core business services.

One important form of business improvement is to increase the differentiation in the Business Type Model. For example, a library that previously didn't distinguish between physical copies could decide to make this distinction in future. But this business improvement doesn't change the underlying reality of books, so the Business Concept Model stays the same.

Extract from Business Information Improvement (CBDI Journal, May 2009)

Tuesday, February 24, 2009

Modelling Complex Classification

Andrea Westerinen (Microsoft) posts some modelling guidelines, and she was told that some people's heads exploded when reading them.

She identifies three fundamental modelling concepts, which she draws from the work of Guarino and Welty.

Essence - these are properties that are true for all instances of a class and that "define" the semantics of the class, such as the "property" of "being human"
Identity - properties that determine the equality of instances
Unity - properties that define where the boundary of an instance is, or distinguish the parts of a "whole"

In my work on information modelling (for example in my 1992 book) I have long emphasized the importance of understanding semantic identity (how does something count as being "the same again") and semantic unity (which I tend to call membership - how does something count as inside or outside).

But I have been critical of the assumption that we always define a class in terms of essential properties. This is known as monothetic classification, and can be contrasted with polythetic classification, which defines a class in terms of characteristic properties. As I teach in my information modelling workshops, many important objects of business attention are not amenable to simple monothetic classification (for example how does a commercial firm decide who counts as a COMPETITOR, how does the police decide who counts as a SUSPECT) and require a more complex logic.

If you are just building transaction-based systems and services, you may be able to fudge these semantic issues. But if you want information services to support business intelligence and decision-support as well as transaction systems, then you have to get the semantics right.

Of course I can see why Guarino, Welty and Westerinen want to insist on monothetic classification (which they call rigidity). But then how can they model the more fluid and fuzzy (and I think more interesting) business requirements?

(Sometimes the practitioners of "ontology" like to think that they are dealing with supremely abstract and generalized stuff, but if they make too many simplifying assumptions of that kind then their ontologies aren't sufficiently generalized after all.)

Rodney Needham, Polythetic Classification: Convergence and Consequences (Man, 10:3, September 1975), pp. 349-369.

Richard Veryard, Information Modelling - Practical Guidance (Prentice-Hall 1992) pp 99-100

Stanford Encyclopedia of Philosophy: Wittgenstein on Family Resemblance

Wikipedia: Family Resemblance

Sunday, January 04, 2009

Market Predictions - CEP

What lies in store?

Complex Event Processing (CEP) guru David Luckham asks What are your predictions for 2009?

What challenges?

Prediction is hard, especially for experts. What is easier is to identify some of the challenges that have to be addressed. For example:

Detection (Tim Bass)
Interoperability (Opher Etzion, Tim Bass)
Scope and Terminology (Opher Etzion, Paul Vincent)

CEP Products or Services?

David's starting point was a Forrester estimate of the market for CEP software and services. But what kind of services are these? Is Forrester just talking about conventional systems integration services - e.g. paying consultants to build and install your CEP systems? Or are we starting to see a genuine service economy based around the trading and collaborative processing of events?

A certain amount of this kind of thing goes on in the security world, with specialist firms performing what is essentially collective event processing for a number of customers (Monitoring-as-a-Service). But apart from that, I haven't seen much evidence of a distributed economy of complex events.

But why might this kind of thing be particularly interesting in 2009? Because the prevailing economic environment may make it harder to justify people going it alone, building large and complex event-processing applications for their own use.

Shortening the lead

Most people are expecting 2009 to be a tough year. In such circumstances, there is a widespread reluctance to invest scarce resources on remote gains; any vendors trying to sell solutions to business will need to find innovative ways of shortening and tightening the lead between investment and return.

For example, instead of vendors offering an elaborate set of CEP products, together with consultants skilled in wiring them together, there may be demand for out-of-the-box hosted solutions.

Shared Event Semantics

But in order to share events and event processing, firms will need to have a common event model. Which brings us back to some of the challenges identified by Opher, Paul and Tim ...

And Not Just Complex Event Processing ...

These arguments don't just apply to complex event processing, but to many areas of new technology. I'll look at some other areas in subsequent posts ...

Monday, December 22, 2008

How many events?

Marco writes about Event Models - Identity, and points out that there may not be a simple mapping between events and the reporting of events. A single collision may result in multiple notifications. He concludes:

"Something as simple as a event id could have a rather complex semantics and you should be aware of this when designing your event model. Or is it just me complicating things?"

I agree that event semantics may be complex. If you have a three-car pile-up, does that count as one collision or two, given that the third car hits a few seconds after the first two? Or three collisions, if the third car hits both of the first two cars?

So you have to have some semantic model that provides a basis for counting how many collisions there are in the real world, let alone how many event reports are received. There is a semantic rule (sometimes called an identity rule, after Frege) as to when two things are the same, or the-same-again.

An identity rule (together with a membership rule - - which things count as collisions at all) should be part of your definition of “collision”.

See my earlier posts: How many products? and The Semantics of Security

Monday, September 01, 2008

Semantic Ambiguity

Here's a neat little example of semantic ambiguity.

The UK supermarket Tesco has bowed to a campaign by grammatical purists, who object to people saying "less" when they should say "fewer". Tesco boss Sir Terry Leahy has written to the Plain English Campaign announcing the intention to replace checkout signs reading "Ten Items Or Less" [Tesco Checks Out Wording Change, BBC News 31st August 2008].

The BBC claims that this will "avoid any linguistic dispute".

Actually, the biggest ambiguity remains unaddressed, because this is not in the choice of "less"/"fewer", but in the word "item". The Plain English spokesman uses plain english apples as an example. Tesco sells plain english apples by weight. So if I buy a dozen loose apples, does this count as one item (allowing me to go to the "up to ten items" checkout) or twelve items (forcing me to queue behind someone with a full trolley)?

I am hopeful that the humans in Tesco would take the common sense view, and allow me to regard a dozen apples as a single item. But would the pedants at the Plain English Campaign, or for that matter a fully computerized system, take the same view?

What's particularly interesting here is the fact that even the plain english campaigners can't spot a simple ambiguity. So what chance for the rest of us?

(Meanwhile, the BBC has apparently given up distinguishing between "less" and "fewer". See my recent comment Follow Me Follow on Twitter)

Monday, June 23, 2008

Homogeneous Business Vocabulary?

Nick Malik asks whether a common vocabulary is a blessing or curse.

Who benefits from a common vocabulary - whose agenda is it? IT architects tend to like a common vocabulary because it means they can more easily use the same systems and data stores; bureaucrats tend to like a common vocabulary because it means they can impose the same kinds of procedures and performance targets.

Let's look at the bureaucratic agenda first. In the old days, the bus company had passengers, hospitals had patients, prisons had prisoners, and universities had students. Now they are all supposed to be regarded as "customers", and driven by the same things: "choice" and "customer satisfaction". This reframing has had mixed results - perhaps a few beneficial effects in places, but also some damaging or absurd consequences.

BBC News: Students: Customers or learners?
Trisha Torrey: Patient as Customer

Meanwhile, IT organizations want to deploy similar solutions across a range of business domains. Here are a few examples picked at random.

Unisys: Customer Loyalty Solution
Anari Intelligence: Passenger and Customer Profiling (hedging their bets there!)

Meanwhile, IT architects are often lumpers rather than splitters, and so they like to produce information models with a relatively small number of highly generalized objects like PARTY, which mean absolutely nothing to a real business person.

So in some enterprises, and especially in the public sector, the IT architects may be aligned with the central bureaucrats against the line-of-business. Maybe sometimes there really is a good reason for the diversity of business vocabulary, not just idiot managers being obstinate.

With a stratified Service-Oriented Architecture, it becomes possible to get the best of both worlds - building some highly generic services in one layer, which support a range of different specialized and context-specific ontologies in the layer above. So it becomes possible to accommodate a broader range of requirements without imposing a common vocabulary. Of course this raises some complexity issues, which many IT architects would prefer not to have to deal with. For more on these complexity issues, see the Asymmetric Design blog.

Sunday, March 09, 2008

Animation

There really is something on the Internet for everyone. While my son was laughing his head off at the singing kittens on RatherGood.com, I was watching a simple and elegant animation explaining how SOA benefits the education and research world, courtesy of JISC/e-Framework (via Jack van Hoof).

Jack reckons the animation "perfectly shows the principles and benefits of a Canonical Data Model". Well, up to a point. It shows how SOA supports information sharing and interoperability, but an innocent viewer might interpret this as merely a question of syntax (represented by different templates in the animation) and might not appreciate the importance of semantics. Jack himself has written well on semantics in his blog, for example in his post How to mediate semantics in an EDA.

But for a 5-minute animated film - if we don't expect too much from it - excellent!

Monday, November 26, 2007

The Semantics of Security

Adam Shostack (Is 2,100 breaches of security a lot?) raises some good questions about the latest security breach.

When I hear that HMRC has had 2,100 breaches reported, I'm forced to ask, "is that a lot?" To put the number in context, we need three things:
What is a breach? Does it include, for example, leaving your screen unlocked when you go to the restroom? We can't understand what 2,100 breaches mean without knowing what is being counted.
How big is the department? If it's 10 people, then that's a breach a day. If it's 2,100 people, then it's a breach a year. ...
How does this compare to other organizations? ... That seems lower than the US Government reported rate of one per hour, but actually, 2,100 breaches is about one per hour per business day for HMRC. So does HMRC leak at the same rate as all of the US government, or are we seeing different definitions of breaches?

Our ability to count things is a good indicator whether we know what we are talking about. This is an important element of semantics - having a membership rule (is this a breach or not) and an identity rule (is this the same breach as the one we've already counted or a new one). So we need a semantics of security.

Sunday, June 17, 2007

Just Tidying Up?

In a recent post on my other blog, Gordon Brown, Enterprise Architect, I pointed out the resemblence between an enterprise architecture and a written constitution, and suggested that the new UK prime minister might be open to both of these initiatives.

However, the current UK foreign secretary may not be with him on this one. Mrs Beckett has spoken out against formal constitutional reform for the EU, and appears to believe that nothing more is required than tidying up the rule book. [BBC News, June 17th, 2007]

There are of course semantic issues here, as my friend Robin Wilton pointed out in his blog: Semantics Invictus. There is a political incentive to avoid anything that counts as constitutional change because this would have governance implications (specifically, a referendum would be required).

How does this apply to Service-Oriented Architecture? Clearly there are some people within the IT world who would take the Margaret Beckett line. They believe they don't need a full enterprise architecture, together with all the governance that implies - there are just a few systems that need a bit of tidying-up.

For example, the CEO of a company called Health Watch Technologies avers that "the first step toward many innovations is simply tidying up systems" [Special Report on Medicaid 2006]. And a recent FT article describes Huge Benefits from Tidying Up.

But the examples quoted in the FT article go way beyond just tidying up, and include the work that is going on at BT to develop a service-oriented architecture, with some impressive results already.

There may be some resistance to formal enterprise architecture in some organizations, but there is no arguing with the benefits of doing things properly.

Monday, May 21, 2007

Clouds and Clocks 2

Pat Helland regrets ...

Pat Helland has resumed blogging, having recently returned to Microsoft from a sojourn at Amazon. In his latest post SOA and Newton's Universe, he renounces the quasi-Newtonian paradigm of distributed systems to which he adhered for most of his 30-year career, and outlines an alternative paradigm with some resemblance to the Special Theory of Relativity.

The Newtonian paradigm of distributed systems is that we are trying to make many systems appear as the One System. This paradigm may be linked with the idea of the Global Schema or Universal Ontology. Helland contrasts this with an alternative paradigm of distributed systems, in which the systems are entirely dependent on the point of view of the observer, and there is no Universal Ontology. (I'd have wanted to use the term relativistic semantics here, but it has already been bagsied for academic linguistics - see for example Catfood and Buses.)

Helland sees this in terms of a relaxation of consistency. I disagree. Distributed systems-of-systems (including SOA) must follow a consistent logic - but not necessarily a traditional two-valued logic. Flexibility comes from being underdetermined (clouds) rather than overdetermined (clocks).

See my earlier posts Beyond Binary Logic and On Clouds and Clocks. See also Philip Boxer on Modelling Structure-Determining Processes.

Sunday, November 12, 2006

Cross-Purposes

Seth Godin reports an interesting juxtaposition.

Pain reliever recalled after metal found in pills.
Google finds a matching image to illustrate the story.

Google is trying to be helpful here - repurposing an image (from elsewhere) to add value to a service.

Unfortunately, it's the right drug but the wrong brand. For many purposes, Google's semantics would be adequate. For recall purposes, however, the difference between DRUG and BRAND is very important.

This kind of semantic mismatch is what causes some of the trickiest interoperability risks.

Technorati Tags: Google interoperability SOA semantics service-oriented

Thursday, November 02, 2006

Semantic Coupling

In a recent post, Semantic Coupling, the Elephant in the SOA Room, Rocky Lhotka identified semantic coupling as one of the challenges of SOA. Udi Dahan agrees that semantic coupling is harder, but adds that in his view SOA is all about addressing this issue. Meanwhile Fergal Somers, chief architect at Cape Clear, doesn't think it is so hard in practice, although he acknowledges that the relevant standards are not yet mature.

Any systems that are linked together as part of a broader workflow involves semantic-coupling as defined above, but so what? We have been building these systems for some time.

Although I wouldn't go as far as saying SOA is all about any one thing in particular (see my earlier post on Ambiguity), I also agree that semantic coupling (and semantic interoperability) are important.

Rocky's argument is based on a manufacturing analogy.

In simple manufacturing, the economics of scale involves long production runs, so that you can spread the setup costs across a large volume.

In agile manufacturing, the economics of scope involves minimizing the setup costs, so that you can have shorter production runs without affecting the economics of scale.

I interpret Rocky's argument as saying that a major element of the setup costs for services involves matching the semantics.

Part of the economic argument for SOA is that it can deliver economics of scope (adaptability, repurposing) as well as economics of scale (productivity).

But there's more. If we combine SOA with some other management innovations, we may also be able to improve the economics of alignment. I don't think this is illustrated by Rocky's manufacturing analogy.

However, Kenneth LeFebvre reads more into Rocky's post than I did.

There is meaning to the interaction between a consumer and a service. What does this mean? SOA is all about making the connections between applications using “services” but it does not bridge the gap between the real world of business and the “virtual” world that runs within our software. This is precisely the problem object-oriented design was intended to solve, and was just beginning to do so, until too much of the development population abandoned it in search of the next holy grail: SOA.

At my request, Kenneth has elaborated on this statement in a subsequent post SOA OOA and Bridging the Gap. I agree with him that the rhetoric of OO was as he describes. But I still don't see much evidence that "it was just beginning to do so", and I remain unconvinced by his argument that some things are better represented by objects than by services. (More concrete examples please Kenneth.)

For a definition of the economics of scale, scope and alignment, see Philip Boxer's post Creating Economies of Alignment (October 2006).

Note: earlier material used the term Economics of Governance. For various reasons, we now prefer the term Economics of Alignment.

Updated 25 October 2013

Tuesday, February 07, 2006

Context and Purpose

Adam Shostack's latest post reminds us that It Depends What The Meaning of "Credit Report" Is.

For what purpose were social security numbers originally created - was it perhaps something to do with social security?

Social security numbers have been widely reused and repurposed as general personal identifiers, especially in the context of financial services. For this reason, many people are thinking of identity theft as something executed for the purposes of financial fraud.

But someone called Pablo is apparently using Margaret's social security number for an entirely different purpose - to pose as a legal migrant. This interferes (not surprisingly) with Margaret's ability to claim unemployment benefit.

Any piece of data - and especially an identifier - changes its meaning when it is used for a different purpose in a different context. This is of course nothing new - but the opportunities to repurpose data are hugely amplified by the latest service-oriented protocols including XML and web services.

This story should remind us that we need to be purpose-agnostic, not just when we are designing service-oriented data systems, but also when we are thinking of security threats against such systems.

See also

Purpose-Agnostic (July 2005)
Collaboration and Context (January 2006)
Context and Presence (Category)

Saturday, July 09, 2005

Purpose-Agnostic

Sean McGrath (Propylon) thinks purpose-agnosticism to be one of the really useful bits of SOA. He refers to a posting he made to the Yahoo SOA group in June 2003, where he wrote:

The real trick with EAI I think, is to get purpose-agnostic data representations of business level concepts like person, invoice, bill of lading etc., flowing around processing nodes.

The purpose-agnostic bit is the most important bit. OO is predicated on a crystal ball - developers will have sufficient perfect foresight to expose everything you might need via an API.

History has shown that none of us have such crystal balls.

Now if I just send you the data I hold, and you just send me the data you hold - using as thin an API as we can muster - we don't have to doubleguess each other.

Propylon has been implementing this idea in some technologies for the Irish Government.

However, Sean believes that purpose agnosticism can only be pushed so far. He cites the service example “test for eligibility for free telephone allowance”, which is evidently purpose-specific. Thus there are some areas where very prescriptive messages (which Sean calls "perlocutionary") are more appropriate.

Let's push this example back a little. Why does B need to know if a particular customer is entitled to a free telephone allowance? Only because it alters the way B acts in relation to this customer. We should be able to derive what B needs to know from (a model of) the required variety of B's behaviour. Let's suppose B delivers a telephone to the customer and also produces an invoice. And let's suppose the free telephone allowance affects the invoice but not the delivery. Then we can decompose B into B1 and B2, where only B2 needs to know about the free telephone allowance, allowing us to increase the decoupling between B1 and A. Furthermore, the invoice production may draw upon a range of possible allowances and surcharges, of which the free telephone allowance is just one instance.

On a cursory view, it appears that the technologies Sean has been building for the Irish Government support a programme of deconfliction - using some aspects of SOA (or whatever you want to call it) to separate sociotechnical systems into loosely coupled subsystems. If this is done right, it should deliver both IT benefits (productivity, reuse) and business benefits. This is all good stuff.

Joined-Up Services - Trust and Semantics

But the deconfliction agenda leads to a new set of questions about joined-up services - what emerges when the services interact, sometimes in complex ways? How do you impose constraints on the possible interactions? For example, there may be system-level requirements relating to privacy and trust.

One aspect of trust is that you only give data to people who aren't going to abuse it. Look at the current fuss in the US about data leakage and identity theft involving such agencies as ChoicePoint. The problem with Helland's notion of trust is that it deals with people who come into your systems to steal data, but doesn't deal with people who steal data elsewhere. So you have to start thinking about protecting data (e.g. by encryption) rather than protecting systems. (This is an aspect of what the Jericho Forum calls deperimeterization.) Even without traditional system integration, an SOA solution such as PSB will presumably still need complex trust relationships (to determine who gets the key to which data), but this won't look like the Helland picture.

A further difficulty comes with the semantics of the interactions. If we take away the assumption that everyone in the network uses the same reference model (universal ontology), then we have to allow for translation/conversion between local reference models. As Sean points out elsewhere, this is far from a trivial matter.

Technology Adoption

My expectation is that the deployment of technologies such as the Public Service Broker will experience organizational resistance to the extent that certain kinds of problem emerge out of the business requirements. In my role as a technology change consultant, I am particularly concerned with the question of technology adoption of SOA and related technologies, and how this is aligned with the business strategy of the target organization. I invite readers of this blog to tell me (in confidence) about the adoption of SOA in their organizations, or their customers' organizations, and the specific barriers to adoption it has encountered.

See also

Collaboration and Context (January 2006) Context and Purpose (February 2006)
Context and Presence (Category)

Friday, May 06, 2005

Repurposing Data and Services

Does it make sense to talk about reusing services, or should we talk instead about repurposing?

The word repurpose is largely being pushed from the data/metadata side, especially the XML/XSL crowd. XML is certainly relevant to technical reformatting and interoperability, but may also support data being put to new uses.

IBM defines repurposing in terms of metadata, with specific reference to XML
The Role of XSL in e-business Solutions (May 2001), by Mark Colan of IBM
Xerox Research on XML Schema Management

Meanwhile, some examples of data repurposing look like old-fashioned data sharing. Look at this abstract, which (when you strip away the fashionable technology such as intelligent agents) is just finding new uses for existing data.

Using Intelligent Agents to Repurpose Administrative Data ... (Jan 2004) (abstract)

The word also makes the bandwagon-jumping antics of some product vendors explicit. For example, following 9/11, Siebel repurposed its CRM software to deal with Homeland Security. (Government Computer News, Sept 2002) What's terrorism got to do with customer relationship management, I hear you ask. Well it does, in the same sense that an FBI agent might say "He's a tricky customer."

XML is very good for this kind of repurposing, because it operates at a level of semantic vagueness where it doesn't really matter whether "customer" means "customer" or "terrorist". To my mind this is both a strength and a weakness of XML. It seems to me that if we want to promote the repurposing of services, we need to explain how to design services that can operate with a calculated lack of semantic specificity, with weak preconditions. (But strong postconditions.)

See also Reuse or Repurpose (May 2005)

Pages