Architecture, Data and Intelligence: classification

Showing posts with label classification. Show all posts

Thursday, August 12, 2021

On the performativity of data

The philosopher J.L. Austin observed that sometimes words didn't merely describe reality, they enacted something. A commonly cited example is that when a suitably authorized person pronounces a couple married, it is the speaking of these words that makes them true. Austin called this a performative utterance; later writers usually refer to this as performativity.

In this post, I want to explore some ways in which data and information may be performative.

In my previous post on Data as Pictures, I mentioned the self-fulfilling power of labels. For example, when a person is labelled and treated as a potential criminal, this may make it more difficult for them to live as a law-abiding citizen, and they are therefore steered towards a life of crime. Thus the original truth of the data becomes almost irrelevant, because the data creates its own truth. Or as Bowker and Star put it, "classifications ... have material force in the world" (p39).

Many years ago, I gave a talk at King's College London which included some half-formed thoughts on the philosophy of information. I included some examples where it might seem rational to use information even if you don't believe it.

Keynes attributed the waves of optimism and pessimism that sweep through a market to something he called animal spirits. Where there is little real information, even false information may be worth acting upon. So imagine that a Wall Street astrologer publishes a daily star chart of the US president, and this regularly affects the stock market. Not because many people actually believe in astrology, but because many people want to be one step ahead of the few people who do believe in astrology. Even if nobody takes astrology seriously, but they all think that other people might take it seriously, then they will collectively act as if they do take it seriously. Fiction functioning as truth.

(There was an astrologer in the White House during the Reagan administration, so this example didn't seem so far-fetched at that time. And I have now found a paper that suggests a correlation between astrology and stock markets.)

For my second example, I imagined the head of a sugar corporation going on television to warn the public about a possible shortage of sugar. Consumers typically respond to this kind of warning by stockpiling, leaving the supermarket shelves empty of sugar. So this is another example of a self-fulfilling prophecy - a speech act that created its own truth.

I then went on to imagine the converse. Suppose the head of the sugar corporation went on television to reassure the public that there was no possibility of a sugar shortage. A significant number of consumers could reason either that the statement is false, or that even if the statement is true many consumers won't believe it. So to be on the safe side, better buy a few extra bags of sugar. Result - sugar shortage.

So here we seem to have a case where two opposite statements can appear to produce exactly the same result.

Back in the 1980s I was talking about opinions, from a person with a known status or reputation, published or broadcast in what we now call traditional media. So what happens when these opinions are disconnected from the person and embedded in dashboards and algorithms?

It's not difficult to find examples where data produces its own reality. If a recommendation algorithm identifies a new item as a potential best-seller, this item will be recommended to a lot of people and - not surprisingly - it becomes a best-seller. Obviously this doesn't work all the time, but it is hard to deny that these algorithms contribute significantly to the outcomes that they appear to predict. Meanwhile YouTube identifies people who may be interested in extreme political content, some of whom then become interested in extreme political content. And then there's Facebook's project to "connect the world". There are real-world effects here, generated by patterns of data.

Another topic to consider is the effects produced by measurement and targets. On the one hand, there is a view that measuring performance helps to motivate improvements, which is why you often see performance dashboards prominantly displayed in offices. On the other hand, there is a widespread concern that excessive focus on narrowly defined targets ("target culture") distorts or misdirects performance - for example, teachers teaching to the test. Hannah Fry's article contains several examples of this, which is sometimes known as Goodhart's Law. Either way, there is an expectation that measuring something has a real-world effect, whether positive or negative.

If you can think of any other examples of the performativity of data, please comment below.

Geoffrey Bowker and Sarah Leigh Star, Sorting Things Out (MIT Press, 1999)

Hannah Fry, What Data Can't Do (New Yorker, 22 March 2021)

Wilfred M. McClay, Performative- How the meaning of a word became corrupted (Hedgehog Review 23/2, Summer 2021)

Aurora Murgea, Mercury Retrograde Effect in Capital Markets: Truth or Illusion? (Timisoara Journal of Economics and Business, 13 October 2016)

Richard Veryard, Speculation and Information: The Epistemology of Stock Market Fluctuations (Invited presentation, King's College London, 16 November 1988). Warning - the theory needs a complete overhaul, but the examples are interesting.

Wikipedia: Animal Spirits, Goodhart's Law, Performativity, Target Culture

Stanford Encyclopedia of Philosophy: J.L. Austin, Speech Acts

Related posts: Target Setting: What You Measure Is What You Get (April 2005), Ethical Communication in a Digital Age (November 2018), Algorithms and Governmentality (July 2019), Data as Pictures (August 2021), Can Predictions Create Their Own Reality (August 2021), Does the algorithm have the last word? (February 2022). Rob Barratt of Bodmin kindly contributed a poem on target culture in the comments below my Target Setting post.

Links added 27 August 2021, astrology link added 3 April 2022

Friday, April 09, 2021

Near Miss

A serious aviation incident in the news today. A plane took off from Birmingham last year with insufficient fuel, because the weight of the passengers was incorrectly estimated. This is being described as an IT error.

As Cathy O'Neil's maxim reminds us, algorithms are opinions embedded in code. The opinion in this case was the assumption that the prefix Miss referred to a female child. According to the official report, published this week, this is how the prefix is used in the country where the system was programmed.

In this particular flight, 38 adult women were classified as Miss, so the algorithm estimated their weight as 35 kg instead of 69 kg.

The calculation error was apparently compounded by several human factors.

A smaller discrepancy had been spotted and corrected on a previous flight.

The pilot noticed that there seemed to be an unusually high number of children on the flight, but took no action because the pandemic had disrupted normal expectations of passenger numbers.

The software was being upgraded, but the status of the fix at the time of the flight was unclear. There were other system-wide changes being implemented at the same time, which may have complicated the fix.

Guidance to ground staff to double-check the classification of female passengers was not properly communicated and followed, possibly due to weekend shift patterns.

As Dan Nguyen points out, there have been previous incidents resulting from incorrect assumptions about passenger weight. But I think we need to distinguish between factual errors (what is the average weight of an adult passenger) and classification errors (what exactly does the Miss prefix signify).

There is an important lesson for data management here. You may have a business glossary or data dictionary that defines an attribute called Prefix and provides a list of permitted values. But if different people (different parts of your organization, different external parties) understand and use these values to mean different things, there is still scope for semantic confusion unless you make the meanings explicit.

AAIB Bulletin 4/2021 (April 2021) https://www.gov.uk/government/publications/air-accident-monthly-bulletin-april-2021

Tui plane in ‘serious incident’ after every ‘Miss’ on board was assigned child’s weight (Guardian, 9 April 2021)

For further discussion and related examples, see Dan Nguyen's Twitter thread https://twitter.com/dancow/status/1380188625401434115

Friday, October 08, 2010

Defining Enterprise Architecture

#entarch Another big discussion on Twitter yesterday about the correct definition (yawn) of Enterprise Architecture (EA). This one was whether EA should be defined as "Architecture of Enterprise" or "Architecture of Enterprise Technology". I mostly saw Tweets voting for the former, so it looked to me like a pretty one-sided discussion but maybe that's just because of the selection of people I follow on Twitter.

Clearly the difference between these two definitions is in the word "Technology". I wonder how many people who contributed to this discussion thought much about this word.

A narrow definition of "technology" from an IT perspective might limit the term to computer hardware and software; a broader definition might include everything from production technologies (e.g. Kanban) to accounting technologies (e.g. double-entry book-keeping), covering the strategic choices made in everything from transportation (oil tanker versus pipeline) to warfare (air bombardment versus ground forces). Followers of Lewis Mumford might define technology even more broadly (see for example his "Myth of the Machine") while followers of Bruno Latour would have a more subtle take on the whole subject. For my part, I have no wish to produce a single definition of technology; I merely wish to point out that the boundaries of "technology" may be as debatable as the boundaries of "enterprise architecture". The game of definition has no end.

But in any case, I wonder about the purpose and usefulness of this kind of definition. People often put an extraordinary amount of energy into definitions as if they thought that the simple act of defining something made it true. There are different forms of authority underpinning such definitions - for example the reputation of a well-known writer or organization, the emerging consensus of a group or community, or the negotiated standards of some industry body - and we may sometimes be able to achieve some kind of convergence as to what some term is supposed to mean.

But even if we could achieve a universal definition of what the term "enterprise architecture" is supposed to mean, that would be a pretty empty victory if the definition failed to reflect reality - either what enterprise architects actually do, or what they are capable of doing.

The reason I think this kind of definition can never satisfactorily reflect reality is that it is monothetic - in other words, it defines a concept in terms of specific features it must or mustn't have. Inspired by Wittgenstein, the anthropologist Rodney Needham introduced the concept of polythetic definition - defining a concept in terms of characteristic features it might have. Thus instead of debating endlessly whether enterprise architecture should be either A or B, and whether to adopt a narrow or broad definition of technology, we can start to make useful (and hopefully less dogmatic) statements about enterprise architecture and technology in the real world and the relationship between them.

Rodney Needham, Polythetic Classification: Convergence and Consequences (Man, 10:3, September 1975), pp. 349-369.

Richard Veryard, Information Modelling - Practical Guidance (Prentice-Hall 1992) pp 99-100

Stanford Encyclopedia of Philosophy: Wittgenstein on Family Resemblance

Wikipedia: Family Resemblance

Tuesday, February 24, 2009

Modelling Complex Classification

Andrea Westerinen (Microsoft) posts some modelling guidelines, and she was told that some people's heads exploded when reading them.

She identifies three fundamental modelling concepts, which she draws from the work of Guarino and Welty.

Essence - these are properties that are true for all instances of a class and that "define" the semantics of the class, such as the "property" of "being human"
Identity - properties that determine the equality of instances
Unity - properties that define where the boundary of an instance is, or distinguish the parts of a "whole"

In my work on information modelling (for example in my 1992 book) I have long emphasized the importance of understanding semantic identity (how does something count as being "the same again") and semantic unity (which I tend to call membership - how does something count as inside or outside).

But I have been critical of the assumption that we always define a class in terms of essential properties. This is known as monothetic classification, and can be contrasted with polythetic classification, which defines a class in terms of characteristic properties. As I teach in my information modelling workshops, many important objects of business attention are not amenable to simple monothetic classification (for example how does a commercial firm decide who counts as a COMPETITOR, how does the police decide who counts as a SUSPECT) and require a more complex logic.

If you are just building transaction-based systems and services, you may be able to fudge these semantic issues. But if you want information services to support business intelligence and decision-support as well as transaction systems, then you have to get the semantics right.

Of course I can see why Guarino, Welty and Westerinen want to insist on monothetic classification (which they call rigidity). But then how can they model the more fluid and fuzzy (and I think more interesting) business requirements?

(Sometimes the practitioners of "ontology" like to think that they are dealing with supremely abstract and generalized stuff, but if they make too many simplifying assumptions of that kind then their ontologies aren't sufficiently generalized after all.)

Rodney Needham, Polythetic Classification: Convergence and Consequences (Man, 10:3, September 1975), pp. 349-369.

Richard Veryard, Information Modelling - Practical Guidance (Prentice-Hall 1992) pp 99-100

Stanford Encyclopedia of Philosophy: Wittgenstein on Family Resemblance

Wikipedia: Family Resemblance

Saturday, July 12, 2008

It's Not All About

Following my posts criticizing simplistic accounts of what SOA is “all about” (Ambiguity, Zapthink Bashing Microsoft), Burton Group blogger Chris Haddad kindly added a comment pointing to a couple of posts by his colleague Anne Thomas Manes, which trace back via Nick Gall to Andrew McAfee.

If I keep looking, I can probably find almost as many people saying what SOA (and related stuff) isn't all about as people saying what it is all about. (if you find any more good ones, please add them to the comments)

There is a “negative” tradition within theology saying that God can never be described or defined, so the best we can do is make statements about what God isn’t. I don’t think SOA is quite as deep and mystical as God (although you could easily get the wrong impression from some of the more inflated commentators).

Meanwhile, many people (especially in IT) believe that a class or category must have hard boundaries – there is some specific feature or feature-set that everything in the class possesses and everything outside the class lacks. Anthropologists call this monothetic classification.

But as Wittgenstein pointed out, there are many familiar concepts and classes that don’t have such hard-and-fast boundaries. Wittgenstein’s best-known example was the concept of “game”. There are various features that characterize games: all games have some of these features, but hardly any games have all of these features. Defining a class in terms of characteristic features is known as polythetic classification.

(I talk about this in my information modelling workshops, and it is included in my 1992 book on Information Modelling).

So instead of finding the single feature whose presence or absence defines SOA, I think it makes more sense to look for the characteristic features of SOA and not-SOA. Which leads us to another debate between Dave Linthicum and Joe McKendrick on how to tell it’s not SOA, based loosely on James Governor's post about cloud computing. I also found a post by Robert McIlree suggesting ten ways to tell it's not architecture.

Fifteen ways to tell it's not cloud computing

What "Architecture" Is Not

Tell-tale clues include the following words or phases: “agility”, “application”, “consultant”, “enterprise architecture”, “paradigm” and “vendor”. Each of those is worth a separate post ...

Rodney Needham, Polythetic Classification: Convergence and Consequences (Man, 10:3, September 1975), pp. 349-369.

Richard Veryard, Information Modelling - Practical Guidance (Prentice-Hall 1992) pp 99-100

Stanford Encyclopedia of Philosophy: Wittgenstein on Family Resemblance

Wikipedia: Family Resemblance, Negative Theology

Architecture, Data and Intelligence

Pages

Thursday, August 12, 2021

On the performativity of data

Friday, April 09, 2021

Near Miss

Friday, October 08, 2010

Defining Enterprise Architecture

Tuesday, February 24, 2009

Modelling Complex Classification

Saturday, July 12, 2008

It's Not All About

Blog Archive

Creative Commons

or by email

Pages

Thursday, August 12, 2021

On the performativity of data

Friday, April 09, 2021

Near Miss

Friday, October 08, 2010

Defining Enterprise Architecture

Tuesday, February 24, 2009

Modelling Complex Classification

Saturday, July 12, 2008

It's Not All About

Blog Archive

Creative Commons

Subscribe

or by email