Architecture, Data and Intelligence

Saturday, February 22, 2025

Data-Driven Data Strategy

One of the things I've been pushing for years is the idea that data strategy should itself be data driven. In other words, if we are claiming that all these expensive data and analytics initiatives are driving business improvement, let's see the evidence, and let's have a feedback loop that allows us to increase the cost-effectiveness of these initiatives. This is becoming increasingly important as people start to pay attention to the environmental cost as well as the monetary cost.

This idea can be found in my ebook How To Do Things With Data and my articles for the Cutter Journal, as well as on this blog.

I doubt anyone will be surprised by Gartner's recent survey, showing that although over 90% of the respondents acknowledged the importance of being value-focused and outcome-focused, only 22% were measuring business impact. So they clearly aren't eating their own dog food.

And the same thing applies to the current hype around AI. Tech journalist @LindsAI Clark asks will we be back in another 10 years wondering who is measuring the business value of all that AI in which organizations have invested billions?

I think we already know the answer to that one.

Lindsay Clark, Data is very valuable, just don't ask us to measure it, leaders say (The Register, 21 Feb 2025)

Richard Veryard, How To Do Things With Data (LeanPub)

Richard Veryard, Understanding the Value of Data (Cutter Business Technology Journal, 11 May 2020)

Wikipedia: Eating your own dog food

Monday, April 01, 2024

As Shepherds Watched

We can find a useful metaphor for data ethics in Tolkein's Lord of the Rings. Palantíri are indestructable stones or crystal balls that enable events to be seen from afar. They also allow communication between two stones. The word comes from Tolkein's invented language Quenya - palan means far, tir means to watch over.

The stones are powerful but dangerously unreliable. Even in the hands of an evil wizard such as Sauron, the stones cannot present a completely false image, but they can conceal enough to mislead, and at one point in the story Sauron himself is deceived.

This links to my oft-repeated point about data and dashboards: along with the illusion that what the data tells you is true, there are two further illusions: that what the data tells you is important, and that what the data doesn't tell you is not important. (See my eBook How To Do Things With Data.)

Joseph Pearce notes the parallel between palantíri and another device whose name also denotes watching from afar - television. "The palantir stones, the seeing stones employed by Sauron, the Dark Lord, to broadcast propaganda and sow the seeds of despair among his enemies, are uncannily similar in their mode of employment to the latest technology in mass communication media" Pearce p244.

The big data company Palantir Technologies was named after Tolkein's stones. It has pitched itself as "providing the power to see the world, without becoming corrupted by that power" Maus. Not everyone is convinced.

Denis Campbell, NHS England gives key role in handling patient data to US spy tech firm Palantir (Guardian, 20 November 2023)

Maus Strategic Consulting, A (Pretty) Complete History of Palantir (27 April 2014)

Joseph Pearce, Catholic Literary Giants: A Field Guide to the Catholic Literary Landscape (Ignatius Press 2014)

Wikipedia: Palantír, Palantir Technologies

Thursday, November 30, 2023

Dynamic Pricing Update

The concept of Dynamic Pricing has been around for at least 25 years. I first encountered it in Kevin Kelly's 1998 book, New Rules for the New Economy.

Five years ago, when I was consulting to a large UK supermarket, this concept was starting to be taken seriously. The old system of sending people around the store putting yellow stickers on items that were reaching their sell-by date was seen as labour-intensive and error-prone. There were also some trials with electronic shelf-edge labels to address the related challenge of managing special offers and discounts for a specific stock-keeping unit (SKU). At the time, however, they were not ready to invest in implementing these technologies across the whole business.

The BBC reports that these systems are now widely used in other European countries, and there are further trials in the UK. This is being promoted as a way of reducing food waste. According to Matthias Guffler of EY Germany, around 400,000 tonnes of food is wasted every year, costing German retailers over 2 billion euros. Obviously no system will completely eliminate waste, but even reducing this by 10% would represent a significant saving.

Saving for whom? Clearly some consumers will benefit from a more efficient system of marking down items for quick sale, but there are concerns that other consumers will be disadvantaged by the potential uncertainty and lack of transparency, especially if retailers start using this technology also to increase prices in response to high demand.

Update October 2024

Dynamic pricing has recently hit the headlines following some extreme examples of surge pricing for concert tickets. See commentary from Hannah Downes of the Consumers' Association.

Mabel Banfield-Nwachi, Happy hour in reverse: where dynamic pricing may creep further (Guardian18 November 2024)

MaryLou Costa, Why food discount stickers may be a thing of the past (BBC News, 30 November 2023)

Hannah Downes, Oasis tickets: Ticketmaster's 'in demand' pricing could be in breach of consumer law (Which? 10 September 2024) Dynamic pricing: how does it work and is it legal? (Which? 2 October 2024)

Matthias Guffler, Wie der Handel das Problem der Lebensmittelverschwendung lösen kann (EY-Parthenon, 28 March 2023)

Related posts: Dynamic Pricing (April 2006), The Price of Ev,erything (May 2017)

Wikipedia: Dynamic Pricing

Wednesday, June 21, 2023

Documenting Business Requirements

A couple of questions came in on my phone, and I thought I'd share my answers here.

When is a Business Requirements Document good enough?

Here are a few considerations to start with

Fit for purpose - What is the document going to be used for - planning, estimating, vendor selection, design or whatever? Does it contain just enough for this purpose, or does it dive (prematurely) into solution design or technical detail?

SMART - are the requirements specific, measurable, and so on, or are they just handwaving? How will you know when (if) the requirements have been satisfied?

Transparency and governance - is it clear whose requirements are being prioritized (FOR WHOM)? Who wrote it and what is their agenda? (Documents written by a vendor or consultancy may suit their interets more than yours.)

Future-proof - are these requirements short-term and tactical, are all the urgent requirements equally important?

Complete - ah, but what does that mean?

When is a Business Requirements Document complete?

Scope completeness - covering a well-defined area of the business in terms of capability, process, org structure, ...

Viewpoint completeness - showing requirements for people, process, data, technology, ...

Constraint completeness - identifying requirements relating to privacy, security, compliance, risk, ...

Transition plan - not just data migration but also process change, big bang versus phased, ...

Quality and support plan - verification, validation and testing, monitoring and improvement. What is critical before going live versus what can be refined later.

RAID - risks, assumptions, issues and dependencies

Important note - this is not to say that a business requirements document must always be complete in all respects, as it's usually okay for requirements to develop and evolve over time. But it is important to be reasonably clear about those aspects of the requirements that are assumed to be more or less complete and stable within a time horizon appropriate for the stated purpose, and to make these assumptions explicit. For example, if you are using this document to select a COTS product, and that product needs to be a good fit for requirements for at least n years, then you would want to have a set of requirements that is likely to be broadly valid for this length of time. (And avoiding including design decisions tied to today's technology.)

Saturday, June 03, 2023

Netflix and Algorithms

Following my previous posts on Netflix, I have been reading a detailed analysis in Ed Finn's book, What Algorithms Want (2017).

Finn's answer to my question Does Big Data Drive Netflix Content? is no, at least not directly. Although Netflix had used data to commission new content as well as recommend existing content (Finn's example was House of Cards) it had apparently left the content itself to the producers, and then used data and algorithmic data to promote it.

After making the initial decision to invest in House of Cards, Netflix was using algorithms to micromanage distribution, not production. Finn p99

Obviously something written in 2017 doesn't say anything about what Netflix has been doing more recently, but Finn seems to have been looking at the same examples as the other pundits I referenced in my previous post.

Finn also makes some interesting points about the transition from the original Cinematch algorithm to what he calls Algorithm 2.0.

The 1.0 model gave way to a more nuanced, ambiguity-laden analytical environment, a more reflexive attempt to algorithmically comprehend Netflix as a culture machine. ... Netflix is no longer constructing a model of abstract relationships between movies based on ratings, but a model of live user behavior in their various apps Finn p90-91

The coding system relies on a large but hidden human workforce, hidden to reinforce the illusion of pure algorithmic recommendations (p96) and perfect personalization (p107). As Finn sees it, algorithm 1.0 had a lot of data but no meaning, and was not able to go from data to desire (p93). Algorithm 2.0 has vastly more data, thanks to this coding system - but even the model of user behaviour still relies on abstraction. So exactly where is the data decoded and meaning reinserted (p96)?

As Netflix executives acknowledge, so-called ghosts can emerge (p95), revealing a fundamental incompleteness (lack) in symbolic agency (p96).

Ed Finn, What Algorithms Want: Imagination in the Age of Computing (MIT Press, 2017)

Alexis C. Madrigal, How Netflix Reverse-Engineered Hollywood (Atlantic, 2 January 2014)

Previous posts: Rhyme or Reason - The Logic of Netflix (June 2017), Does Big Data Drive Netflix Content? (January 2021)

Monday, March 06, 2023

Trusting the Schema

A long time ago, I did some work for a client that had an out-of-date and inflexible billing system. The software would send invoices and monthly statements to the customers, who were then expected to remit payment to clear the balance on their account.

The business had recently introduced a new direct debit system. Customers who had signed a direct debit mandate no longer needed to send payments.

But faced with the challenge of introducing this change into an old and inflexible software system, the accounts department came up with an ingenious and elaborate workaround. The address on the customer record was changed to the address of the internal accounts department. The computer system would print and mail the statement, but instead of going straight to the customer it arrived back at the accounts department. The accounts clerk used a rubber stamp PAID BY DIRECT DEBIT, and would then mail the statement to the real customer address, which was stored in the Notes field on the customer record.

Although this may be an extreme example, there are several important lessons that follow from this story.

Firstly, business can't always wait for software systems to be redeveloped, and can often show high levels of ingenuity in bypassing the constraints imposed by an unimaginative design.

Secondly, the users were able to take advantage of a Notes field that had been deliberately underdetermined to allow for future expansion.

Furthermore, users may find clever ways of using and extending a system that were not considered by the original designers of the system. So there is a divergence between technology-as-designed and technology-in-use.

Now let's think what happens when the IT people finally get around to replacing the old billing system. They will want to migrate customer data into the new system. But if they simply follow the official documentation of the legacy system (schema etc), there will lots of data quality problems.

And by documentation, I don't just mean human-generated material but also schemas automatically extracted from program code and data stores. Just because a field is called CUSTADDR doesn't mean we can guess what it actually contains.

Here's another example of an underdetermined data element, which I presented at a DAMA conference in 2008. SOA Brings New Opportunities to Data Management.

In this example, we have a sales system containing a Business Type called SALES PROSPECT. But the content of the sales system depends on the way it is used - the way SALES PROSPECT is interpreted by different sales teams.

Sales Executive 1 records only the primary decision-maker in the prospective organization. The decision-maker’s assistant is recorded as extra information in the NOTES field.
Sales Executive 2 records the assistant as a separate instance of SALES PROSPECT. There is a cross-reference between the assistant and the boss

Now both Sales Executives can use the system perfectly well - in isolation. But we get interoperability problems under various conditions.

When we want to compare data between executives
When we want to reuse the data for other purposes
When we want to migrate to new sales system

(And problems like these can occur with packaged software and software as a service just as easily as with bespoke software.)

So how did this mess happen? Obviously the original designer / implementer never thought about assistants, or never had the time to implement or document them properly. Is that so unusual?

And this again shows the persistent ingenuity of users - finding ways to enrich the data - to get the system to do more than the original designers had anticipated.

And there are various other complications. Sometimes not all the data in a system was created there, some of it was brought in from an even earlier system with a significantly different schema. And sometimes there are major data quality issues, perhaps linked to a post before processing paradigm.

Both data migration and data integration are plagued by such issues. Since the data content diverges from the designed schemas, it means you can't rely on the schemas of the source data but you have to inspect the actual data content. Or undertake a massive data reconstruction exercise, often misleadingly labelled "data cleansing".

There are several tools nowadays that can automatically populate your data dictionary or data catalogue from the physical schemas in your data store. This can be really useful, provided you understand the limitations of what this is telling you. So there a few important questions to ask before you should trust the physical schema as providing a complete and accurate picture of the actual contents of your legacy data store.

Was all the data created here, or was some of it mapped or translated from elsewhere?
Is the business using the system in ways that were not anticipated by the original designers of the system?
What does the business do when something is more complex than the system was designed for, or when it needs to capture additional parties or other details?
Are classification types and categories used consistently across the business? For example, if some records are marked as "external partner" does this always mean the same thing?
Do all stakeholders have the same view on data quality - what "good data" looks like?
And more generally, is there (and has there been through the history of the system) a consistent understanding across the business as to what the data elements mean and how to use them?

Related posts: Post Before Processing (November 2008), Ecosystem SOA 2 (June 2010), Technology in Use (March 2023)

Sunday, February 19, 2023

Customer Multiple

A friend of mine shares an email thread from his organization discussing the definition of CUSTOMER, disagreeing as to which categories of stakeholder should be included and which should be excluded.

Why is this important? Why does it matter how the CUSTOMER label is used? Well, if you are going to call yourself a customer-centric organization, improve customer experience and increase customer satisfaction, it would help to know whose experience, whose satisfaction matters. And how many customers are there actually?

The organization provides services to A, which are experienced by B and paid for by C, based on a contractual agreement with D. This is a complex network of actors with overlapping roles, and the debate is about which of these count as customers and which don't. I have often seen similar confusion elsewhere.

My friend asks: Am I supposed to have a different customer definition for different teams (splitter), or one customer definition across the whole business (lumper)? As an architect, my standard response to this kind of question is: it depends.

One possible solution is to prefix everything - CONTRACT CUSTOMER, SERVICE CUSTOMER, and so on. But although that may help sort things out, the real challenge is to achieve a joined-up strategy across the various capabilities, processes, data, systems and teams that are focused on the As, the Bs, the Cs and the Ds, rather than arguing as to which of these overlapping groups best deserves the CUSTOMER label.

Sometimes there is no correct answer, but a best fit across the board. That's architecture for you!

Many business concepts are not amenable to simple definition but have fuzzy boundaries. In my 1992 book, I explain the difference between monothetic classification (here is a single defining characteristic that all instances possess) and polythetic classification (here is a set of characteristics that instances mostly possess). See also my post Modelling Complex Classification (February 2009).

But my friend's problem is a slightly different one: how to deal with multiple conflicting monothetic definitions. One possibility is to lump all the As, Bs, Cs and Ds into a single overarching CUSTOMER class, and then provide different views (or frames) for different teams. But this still leaves some important questions open, such as which of these types of customer should be included in the Customer Satisfaction Survey, whether they all carry equal weight in the overall scores, and whose responsibility is it to improve these scores.

In her book on medical ontology, Annemarie Mol develops Marilyn Strathern's notion of partial connections as a way of overcoming an apparent fragmentation of identity - in our example, between the Contract Customer and the Service Customer - when these are sometimes the same person.

Being one shapes and informs the other while they are also different identities. ... Not two different persons or one person divided into two. But they are partially connected, more than one, and less than two. Mol pp 80-82

Mol argues that frictions are vital elements of wholes,

... a tension that comes about inevitably from the fact that, somehow, we have to share the world. There need not be a single victor as soon as we do not manage to smooth all our differences away into consensus. Mol p 114

Mol's book is about medical practice rather than commercial business, but much of what she says about patients and their conditions applies also to customers. For example, there are some elements that generally belong to "the patient", and although in some cases there may be a different person (for example a parent or next-of-kin) who stands proxy for the patient and speaks on their behalf, it is usually not considered necessary to mention this complication except when it is specifically relevant.

Similar complexities can be found in commercial organizations. Let's suppose most customers pay their own bills but some customers have more complicated arrangements. It should be possible to hide this kind of complexity most of the time.

Human beings can generally cope with these elisions, ambiguities and tensions in practice, but machines (by which I mean bureaucracies as well as algorithms) not so well. Organizations tend to impose standard performance targets, monitored and controlled through standard reports and dashboards, which fail to allow for these complexities. My friend's problem is then ultimately a political one, how is responsibility for "customers" distributed and governed, who needs to see what, and what consequences may follow.

(As it happens, I was talking to another friend yesterday, a doctor, about the way performance targets are defined, measured and improved in the National Health Service. Some related issues, which I may try to cover in a future post.)

Annemarie Mol, The Body Multiple: Ontology in Medical Practice (Duke University Press 2002)

Richard Veryard, Information Modelling - Practical Guidance (Prentice-Hall 1992)

Polythetic Classification Section 3.4.1 pp 99-100
Lumpers and Splitters Section 6.3.1, pp 169-171

Architecture, Data and Intelligence

Pages

Saturday, February 22, 2025

Data-Driven Data Strategy

Monday, April 01, 2024

As Shepherds Watched

Thursday, November 30, 2023

Dynamic Pricing Update

Wednesday, June 21, 2023

Documenting Business Requirements

When is a Business Requirements Document good enough?

When is a Business Requirements Document complete?

Saturday, June 03, 2023

Netflix and Algorithms

Monday, March 06, 2023

Trusting the Schema

Sunday, February 19, 2023

Customer Multiple

Blog Archive

Creative Commons

or by email

Pages

Saturday, February 22, 2025

Data-Driven Data Strategy

Monday, April 01, 2024

As Shepherds Watched

Thursday, November 30, 2023

Dynamic Pricing Update

Wednesday, June 21, 2023

Documenting Business Requirements

When is a Business Requirements Document good enough?

When is a Business Requirements Document complete?

Saturday, June 03, 2023

Netflix and Algorithms

Monday, March 06, 2023

Trusting the Schema

Sunday, February 19, 2023

Customer Multiple

Blog Archive

Creative Commons

Subscribe

or by email