Architecture, Data and Intelligence

Sunday, February 19, 2023

Customer Multiple

A friend of mine shares an email thread from his organization discussing the definition of CUSTOMER, disagreeing as to which categories of stakeholder should be included and which should be excluded.

Why is this important? Why does it matter how the CUSTOMER label is used? Well, if you are going to call yourself a customer-centric organization, improve customer experience and increase customer satisfaction, it would help to know whose experience, whose satisfaction matters. And how many customers are there actually?

The organization provides services to A, which are experienced by B and paid for by C, based on a contractual agreement with D. This is a complex network of actors with overlapping roles, and the debate is about which of these count as customers and which don't. I have often seen similar confusion elsewhere.

My friend asks: Am I supposed to have a different customer definition for different teams (splitter), or one customer definition across the whole business (lumper)? As an architect, my standard response to this kind of question is: it depends.

One possible solution is to prefix everything - CONTRACT CUSTOMER, SERVICE CUSTOMER, and so on. But although that may help sort things out, the real challenge is to achieve a joined-up strategy across the various capabilities, processes, data, systems and teams that are focused on the As, the Bs, the Cs and the Ds, rather than arguing as to which of these overlapping groups best deserves the CUSTOMER label.

Sometimes there is no correct answer, but a best fit across the board. That's architecture for you!

Many business concepts are not amenable to simple definition but have fuzzy boundaries. In my 1992 book, I explain the difference between monothetic classification (here is a single defining characteristic that all instances possess) and polythetic classification (here is a set of characteristics that instances mostly possess). See also my post Modelling Complex Classification (February 2009).

But my friend's problem is a slightly different one: how to deal with multiple conflicting monothetic definitions. One possibility is to lump all the As, Bs, Cs and Ds into a single overarching CUSTOMER class, and then provide different views (or frames) for different teams. But this still leaves some important questions open, such as which of these types of customer should be included in the Customer Satisfaction Survey, whether they all carry equal weight in the overall scores, and whose responsibility is it to improve these scores.

In her book on medical ontology, Annemarie Mol develops Marilyn Strathern's notion of partial connections as a way of overcoming an apparent fragmentation of identity - in our example, between the Contract Customer and the Service Customer - when these are sometimes the same person.

Being one shapes and informs the other while they are also different identities. ... Not two different persons or one person divided into two. But they are partially connected, more than one, and less than two. Mol pp 80-82

Mol argues that frictions are vital elements of wholes,

... a tension that comes about inevitably from the fact that, somehow, we have to share the world. There need not be a single victor as soon as we do not manage to smooth all our differences away into consensus. Mol p 114

Mol's book is about medical practice rather than commercial business, but much of what she says about patients and their conditions applies also to customers. For example, there are some elements that generally belong to "the patient", and although in some cases there may be a different person (for example a parent or next-of-kin) who stands proxy for the patient and speaks on their behalf, it is usually not considered necessary to mention this complication except when it is specifically relevant.

Similar complexities can be found in commercial organizations. Let's suppose most customers pay their own bills but some customers have more complicated arrangements. It should be possible to hide this kind of complexity most of the time.

Human beings can generally cope with these elisions, ambiguities and tensions in practice, but machines (by which I mean bureaucracies as well as algorithms) not so well. Organizations tend to impose standard performance targets, monitored and controlled through standard reports and dashboards, which fail to allow for these complexities. My friend's problem is then ultimately a political one, how is responsibility for "customers" distributed and governed, who needs to see what, and what consequences may follow.

(As it happens, I was talking to another friend yesterday, a doctor, about the way performance targets are defined, measured and improved in the National Health Service. Some related issues, which I may try to cover in a future post.)

Annemarie Mol, The Body Multiple: Ontology in Medical Practice (Duke University Press 2002)

Richard Veryard, Information Modelling - Practical Guidance (Prentice-Hall 1992)

Polythetic Classification Section 3.4.1 pp 99-100
Lumpers and Splitters Section 6.3.1, pp 169-171

Sunday, September 11, 2022

Pitfalls of Data-Driven

@Jon_Ayre questions whether an organization's being data-driven drives the right behaviours. He identifies a number of pitfalls.

It's all too easy to interpret data through a biased viewpoint
Data is used to justify a decision that has already been made
Data only tells you what happens in the existing environment, so may have limited value in predicting the consequences of making changes to this environment

In a comment below Jon's post, Matt Ballentine suggests that this is about evidence-based decision making, and notes the prevalence of confirmation bias. Which can generate a couple of additional pitfalls.

Data is used selectively - data that supports one's position is emphasized, while conflicting data is ignored.
Data is collected specifically to provide evidence for the chosen position - thus resulting in policy-based evidence instead of evidence-based policy.

A related pitfall is availability bias - using data that is easily available, or satisfies some quality threshold, and overlooking the possibility that other data (so-called dark data) might reveal a different pattern. In science and medicine, this can take the form of publication bias. In the commercial world, this might mean analysing successful sales and ignoring interrupted or abandoned transactions.

It's not difficult to find examples of these pitfalls, both in the corporate world and in public affairs. See my analysis of Mrs May's Immigration Targets. See also Jonathan Wilson's piece on the limits of a data-driven approach in football, in which he notes low sample size, the selective nature of the data, and an absence of nuance.

One of the false assumptions that leads to these pitfalls is the idea that the data speaks for itself. (This idea was asserted by the editor of Wired Magazine in 2008, and has been widely criticized since. See my post Big Data and Organizational Intelligence.) In which case, being data driven simply means following the data.

During the COVID pandemic, there was much talk about following the data, or perhaps following the science. But given that there was often disagreement about which data, or which science, some people adopted an ultra-sceptical position, reluctant to accept any data or any science. Or they felt empowered to do their own research. (Francesca Tripodi sees parallels between the idea that one should research a topic oneself rather than relying on experts, and the Protestant ethic of bible study and scriptural inference. See my post Thinking with the majority - a new twist.)

But I don't think being data-driven entails simply blindly following some data. There should be space for critical evaluation and sense-making, questioning the strength and relevance of the data, open to alternative interpretations of the data, and always hungry for new sources of data that might provide new insight or a different perspective. Experiments, tests.

Jon talks about Amazon running experiments instead of relying on historical data alone. And in my post Rhyme or Reason I talked about the key importance of A/B testing at Netflix. If Amazon and Netflix don't count as data-driven organizations, I don't know what does.

So Matt asks if we should be talking about "experiment-driven" instead. I agree that experiment is important and useful, but I wouldn't put it in the driving seat. I think we need multiple tools for situation awareness (making sense of what is going on and where it might be going) and action judgement (thinking through the available action paths), and experimentation is just one of these tools.

Jonathan Wilson, Football tacticians bowled over by quick-fix data risk being knocked for six (Guardian, 17 September 2022)

Related posts: From Dodgy Data to Dodgy Policy - Mrs May's Immigration Targets (March 2017), Rhyme or Reason (June 2017). Big Data and Organizational Intelligence (November 2018), Dark Data (February 2020), Business Science and its Enemies (November 2020), Thinking with the majority - a new twist (May 2021), Data-Driven Reasoning (COVID) (April 2022)

My new book on Data Strategy now available on LeanPub: How To Do Things With Data.

Wednesday, August 03, 2022

From Data to Doing

One of the ideas running through my work on #datastrategy is to see data as a means to an end, rather than an end in itself. As someone might once have written,

Data scientists have only interpreted the world in various ways. The point however is to change it.

Many people in the data world are focussed on collecting, processing and storing data, rendering and analysing the data in various ways, and making it available for consumption or monetization. In some instances, what passes for a data strategy is essentially a data management strategy.

I agree that this is important and necessary, but I don't think it is enough.

I am currently reading a brilliant book by Annemarie Mol on Medical Ontology. In one chapter, she describes the uses of test data by different specialists in a hospital. The researchers in the hospital laboratory want to understand a medical condition in great detail - what causes it, how it develops, what it looks like, how to detect it and measure its progress, how it responds to various treatments in different kinds of patient. The clinicians on the other hand are primarily interested in interventions - what can we do to help this patient, what are the prospects and risks.

In the corporate world, senior managers often use data as a monitoring tool - screening the business for areas that might need intervention. Highly aggregated data can provide them with a thin but panoramic view of what is going on, but may not provide much guidance on corrective or preventative action. See my post on OrgIntelligence in the Control Room (October 2010).

Meanwhile, what if your data strategy calls for a 360 view of key data domains, such as CUSTOMER and PRODUCT. If these initiatives are to be strategically meaningful to the business, and not merely exercises in technical plumbing, they need to be closely aligned with the business strategy - for example delivering on customer centricity and/or product leadership.

In other words, it's not enough just to have a lot of good quality data and generating a lot of analytic insight. Hence the title of my new book - How To Do Things With Data.

Annemarie Mol, The Body Multiple: Ontology in Medical Practice (Duke University Press 2002)

My book on Data Strategy is now available in beta version. https://leanpub.com/howtodothingswithdata/

Monday, August 01, 2022

New Book - How to do things with data

#datastrategy My latest book has been published by @Leanpub

This is a beta version, and I intend to add more material as well as responding to feedback from readers and making general improvements. Subscribers will always have access to the latest version.

Sunday, July 31, 2022

COVID-19 - Anarchy or Panarchy?

In September 2005, we had reason to worry about the ability of a tightly coupled world to withstand shocks. At that time this included Hurricane Katrina and SARS. More recent crises, including the COVID-19 pandemic and the war in Ukraine have arguably outshocked these.

In his analysis of the economic sanctions imposed against Russia following its 2022 invasion of Ukraine, Simon Jenkins comments that the interdependence of the world’s economies, so long seen as an instrument of peace, has been made a weapon of war.

As the global economy becomes more tightly coupled, the chances of one event having a catastrophic impact on the entire system increase notes an account called @Forrest. Forrest, billed as anti-tech, right-wing dissident thought, uses this statement as part of an argument against geoengineering remedies to climate change, on two grounds. Firstly, because meddling with complex systems is likely to have unforeseen consequences, and secondly because these supposed remedies represent a further power-shift towards global technological elites. (Bill Gates obviously, who else?)

Other thinkers see this as an opportunity to shift from deterministic systems to more adaptive and resilient systems (Wieland) or to shift from technological capitalism to a different sociopolitical system (Zhang).

Tony Dutzik, Defusing a rigged to blow economy: Rebuilding resilience in a suddenly fragile world (Frontier Group, 30 March 2020) reprinted (Strong Towns, 1 April 2020)

Nick Gall, Panarchitecture: Architecting a Network of Resilient Renewal (Gartner, 24 January 2011)

Tim Harford, Why the crisis is a test of our capacity to adapt (Financial Times, 20 March 2020)

Simon Jenkins, The rouble is soaring and Putin is stronger than ever - our sanctions have backfired (The Guardian, 29 July 2022)

Andreas Wieland, Dancing the Supply Chain: Toward Transformative Supply Chain Management (The Journal of Supply Chain Management. 2021 Jan; 57(1): 58–73.

Yanzhu Zhang, Is panarchy relevant in the COVID-19 pandemic times? (Blavatnik School of Government, 10 June 2020)

Related blogposts: Efficiency and Robustness - On Tight Coupling (September 2005)

Updated 18 February 2023

Friday, July 29, 2022

Testing Multiples

From engineering to medicine, professionals are often forced to rely on tests that are not always completely accurate. In this post, I shall look at the tests that millions of people were obliged to use during the pandemic, to check whether they had COVID-19. The two most common tests were Lateral Flow and PCR. Lateral Flow was quicker and more convenient, while PCR took longer (because the sample had to be sent to a lab) and was supposedly more accurate.

There was also a difference in the data collected from these tests. Whereas all the results from the PCR tests should have been available in the labs, the results from the lateral flow tests were only reported under certain circumstances. There was no obligation to report a negative test unless you needed access to something, and people sometimes chose not to report positive tests because of the restrictions that might follow. And of course people only took the tests when they had to, or wanted to. When people had to pay for the tests, this obviously made a big difference.

To compensate for these limitations, some random screening was carried out, which was designed to produce more reliable and representative datasets. However, these datasets were much smaller.

So what can we do with this kind of data? Firstly, it tells us something about the disease - whether it is distributed evenly across the country or concentrated in certain places, how quickly it is spreading. If we can combine the test results with other information about the test subjects, we may be able to get some demographic information - for example, how is the disease affecting people of different age, gender or race, how is it affecting different job categories. And if we have information from the health service, we can estimate how many of those testing positive end up in hospital.

This kind of information allows us to make predictions - for example, future demand for hospital beds, possible shortages of key workers. It also allows us to assess the effects of various protective measures - for example, to what extent does mask-wearing, social distancing and working from home reduce the rate of transmission.

Besides telling us about the disease, the data should also be able to tell us something about the tests. And the accuracy of the predictions provides a feedback loop, which may enable us to reassess either the test data or the predictive models.

In her book The Body Multiple, Annemarie Mol discusses the differences between two alternative tests for atherosclerosis, and describes how clinicians deal with cases where the two tests appear to provide conflicting results, as well as cases where there may be other reasons to question the test results. Instead of having a single view of the disease, she talks about its multiplicity or manyfoldedness.

But questioning the test results in a particular case, or highlighting particular issues with a given test, does not mean denying the overall value of the test. Most of the time we can continue to regard a test as useful, even as we are considering ways of improving it.

If and when we introduce a new or improved test, we may then wish to translate data between tests. In other words, if test A produced result X, then we would have expected test B to produce result Y. While this kind of translation may be useful for statistical purposes, we need to be careful about its use in individual cases.

For many people, the second discourse appears to undermine the first discourse. If we can't always trust the data, can we ever trust the data? During the COVID pandemic, many rival interpretations of the data emerged; some people chose interpretations that confirmed their preconceptions, while others turned away from any kind of data-driven reasoning.

The COVID pandemic became a politically contentious field, so what if we look at other kinds of testing? In safety engineering, components and whole products are subjected to a range of tests, which assess the risk of certain kinds of failure. Obviously there are manufacturers and service providers with a commercial interest in how (and by whom) these tests are carried out, and there may be regulators and researchers looking at how these tests can be improved, or to detect various forms of cheating, but ordinary consumers don't generally spend hours on YouTube complaining about their accuracy and validity.

Meanwhile even basic corporate reporting may be subject to this kind of multiplicity, as illustrated in my recent post on Data Estimation (July 2022).

So there is a level of complexity here, which not all data users may feel comfortable with, but which data professionals may not feel comfortable about hiding. In a traditional report, these details are often pushed into footnotes, and in an online dashboard there may be symbols inviting the user to drill down for further detail. But is that good enough?

Annemarie Mol, The Body Multiple: Ontology in Medical Practice (Duke University Press 2002)

Wikipedia: COVID-19 testing

Related posts: Data-Driven Reasoning - COVID (April 2022), Data Estimation (July 2022)

Wednesday, July 20, 2022

Boundary Objects and Artful Integration

I once did some data architecture and modelling for NHS Blood and Transplant. This is a UK-wide agency responsible for blood, organs and other body parts, managing transfers from donors to recipients.

One of the interesting challenges for this kind of organization is the need for collaboration between different specialist disciplines. Some teams are responsible for engaging with potential and regular donors, encouraging and arranging donation sessions for blood and plasma. Meanwhile there are other teams who need an extremely precise biomedical profile of each donor, to ensure safety as well as identifying people with rare blood types. While there is a conceptual boundary between these two sets of concerns, the teams need to collaborate effectively and reliably across this boundary.

So in terms of data and interoperability, we have an entity (in this case the donor) that is viewed in significantly different ways, but with a common identity. In the past, I've talked about two-faced entities or hinge entities, but the term that is generally used nowadays is Boundary Object.

We define boundary object as those objects that both inhabit several communities of practice and satisfy the informational requirements of each of them. Bowker and Star p 16

Boundary objects are the canonical forms of all objects in our built and natural environments. Bowker and Star p 307

In general, such boundary objects tend to be weakly structured, while being linked to much stronger structures in each separate domain. While boundary objects are considerably more than just data objects, they raise important questions for the data architect, who needs a critical eye for possible multiplicity or misfit that might compromise interoperability.

Simple multiplicity occurs when there are different levels of granularity each side of the boundary - one side lumping things together, the other side splitting them apart. In a library, for example, the people responsible for the catalogue may understand BOOK to refer to the title, while the people responsible for managing loans may want each physical copy to be represented as a separate instance of BOOK. And while people outside a warehouse may be happy with a notion of STOCK LOCATION that simply points to the warehouse as a single location, people inside the warehouse will want a more fine-grained notion, telling them more precisely where the stock can be found - for example, which shelf in which aisle.

Misfits can occur when there are competing notions of inclusion or classification - for example, does PRODUCT include the products and services of our partners as well as our own, does it include legacy products we no longer sell, does it include products that haven't been launched yet?

And with people, it may not be clear whether we are interested in the person or the role.

One way of detecting these issues is simply to ask how many there are. If you get widely different answers, this is a pretty good indicator that people aren't talking about the same thing. But sometimes the discrepancies can be more subtle and harder to detect.

And resolving (brokering) these issues can often be a political challenge, as Kimble et al argue, not merely a technical one. Specialists may be reluctant to share even a highly simplified version of their view of an object, for fear that this information might be misunderstood and misapplied. And yet there may be some value in sharing some of this information. Furthermore, there may be considerable resistance to relaxing any of the strong constraints on either side of the boundary. So the data architect needs to negotiate exactly what the boundary object will be and how much it should contain.

In his earlier writings, Étienne Wenger described this as a broker role.

The job of brokering is complex. It involves processes of translation, coordination and alignment between perspectives. It requires enough legitimacy to influence the development of a practice ... it also requires the ability to link practices by facilitating transactions between them and to cause learning by introducing into a practice, elements of another. Wenger 1998, p 109

He now talks more generally about system convening, which combines and reinterprets several different roles, including that of broker.

If a group needs some scaffolding and enabling, call a facilitator. A broker is ideal for helping to translate ideas from one practice to another. A weaver will join the dots, strategically connecting people into new networks. An inspiring visionary with charisma is not necessarily a systems convener. Nor is a person who convenes an event or manages systems change or multi-stakeholder processes. None of these roles in themselves are systems convening, although systems conveners often play some of them and it is quite possible that a person reinterpreting one of these roles ends up adopting a systems-convening approach. Wenger-Trayner 2021, p 28

The creation of boundary objects seems to require a combination of these roles. Lucy Suchman talks about artful integration, attempting to shift the frame of design practice and its objects from the figure of the heroic designer and associated next new thing, to ongoing, collective practices of sociomaterial configuration, and reconfiguration in use.

Geoffrey Bowker and Sarah Leigh Star, Sorting Things Out (MIT Press, 1999) - online extract

Mary L Darking, Integrating on line learning technologies into higher education (LSE 2004)

Chris Kimble, Corinne Grenier and Karine Goglio-Primard, Innovation and Knowledge Sharing Across Professional Boundaries: Political Interplay between Boundary Objects and Brokers (International Journal of Information Management, October 2010)

Lucy Suchman, Located Accountabilities in Technology Production, (Centre for Science Studies, Lancaster University, 2000-2003)

Etienne Wenger, Communities of Practice: Learning, Meaning, and Identity (New York: Cambridge University Press, 1998)

Etienne and Beverly Wenger-Trayner, System Convening: A crucial form of leadership for the 21st century (Social Learning Lab, 2021)

Wikipedia: Boundary Object

Architecture, Data and Intelligence

Pages

Sunday, February 19, 2023

Customer Multiple

Sunday, September 11, 2022

Pitfalls of Data-Driven

Wednesday, August 03, 2022

From Data to Doing

Monday, August 01, 2022

New Book - How to do things with data

Sunday, July 31, 2022

COVID-19 - Anarchy or Panarchy?

Friday, July 29, 2022

Testing Multiples

Wednesday, July 20, 2022

Boundary Objects and Artful Integration

Blog Archive

Creative Commons

or by email