Showing posts with label DataDriven. Show all posts
Showing posts with label DataDriven. Show all posts

Saturday, February 22, 2025

Data-Driven Data Strategy

One of the things I've been pushing for years is the idea that data strategy should itself be data driven. In other words, if we are claiming that all these expensive data and analytics initiatives are driving business improvement, let's see the evidence, and let's have a feedback loop that allows us to increase the cost-effectiveness of these initiatives. This is becoming increasingly important as people start to pay attention to the environmental cost as well as the monetary cost.

This idea can be found in my ebook How To Do Things With Data and my articles for the Cutter Journal, as well as on this blog.

I doubt anyone will be surprised by Gartner's recent survey, showing that although over 90% of the respondents acknowledged the importance of being value-focused and outcome-focused, only 22% were measuring business impact. So they clearly aren't eating their own dog food.

And the same thing applies to the current hype around AI. Tech journalist @LindsAI Clark asks will we be back in another 10 years wondering who is measuring the business value of all that AI in which organizations have invested billions?

I think we already know the answer to that one.

Lindsay Clark, Data is very valuable, just don't ask us to measure it, leaders say (The Register, 21 Feb 2025)

Richard Veryard, How To Do Things With Data (LeanPub) 

Richard Veryard, Understanding the Value of Data (Cutter Business Technology Journal, 11 May 2020)

Wikipedia: Eating your own dog food

 

Sunday, September 11, 2022

Pitfalls of Data-Driven

@Jon_Ayre questions whether an organization's being data-driven drives the right behaviours. He identifies a number of pitfalls.

  • It's all too easy to interpret data through a biased viewpoint
  • Data is used to justify a decision that has already been made
  • Data only tells you what happens in the existing environment, so may have limited value in predicting the consequences of making changes to this environment

In a comment below Jon's post, Matt Ballentine suggests that this is about evidence-based decision making, and notes the prevalence of confirmation bias. Which can generate a couple of additional pitfalls.

  • Data is used selectively - data that supports one's position is emphasized, while conflicting data is ignored.
  • Data is collected specifically to provide evidence for the chosen position - thus resulting in policy-based evidence instead of evidence-based policy.

A related pitfall is availability bias - using data that is easily available, or satisfies some quality threshold, and overlooking the possibility that other data (so-called dark data) might reveal a different pattern. In science and medicine, this can take the form of publication bias. In the commercial world, this might mean analysing successful sales and ignoring interrupted or abandoned transactions.

It's not difficult to find examples of these pitfalls, both in the corporate world and in public affairs. See my analysis of Mrs May's Immigration Targets. See also Jonathan Wilson's piece on the limits of a data-driven approach in football, in which he notes low sample size, the selective nature of the data, and an absence of nuance.

One of the false assumptions that leads to these pitfalls is the idea that the data speaks for itself. (This idea was asserted by the editor of Wired Magazine in 2008, and has been widely criticized since. See my post Big Data and Organizational Intelligence.) In which case, being data driven simply means following the data.

During the COVID pandemic, there was much talk about following the data, or perhaps following the science. But given that there was often disagreement about which data, or which science, some people adopted an ultra-sceptical position, reluctant to accept any data or any science. Or they felt empowered to do their own research. (Francesca Tripodi sees parallels between the idea that one should research a topic oneself rather than relying on experts, and the Protestant ethic of bible study and scriptural inference. See my post Thinking with the majority - a new twist.)

But I don't think being data-driven entails simply blindly following some data. There should be space for critical evaluation and sense-making, questioning the strength and relevance of the data, open to alternative interpretations of the data, and always hungry for new sources of data that might provide new insight or a different perspective. Experiments, tests.

Jon talks about Amazon running experiments instead of relying on historical data alone. And in my post Rhyme or Reason I talked about the key importance of A/B testing at Netflix. If Amazon and Netflix don't count as data-driven organizations, I don't know what does.

So Matt asks if we should be talking about "experiment-driven" instead. I agree that experiment is important and useful, but I wouldn't put it in the driving seat. I think we need multiple tools for situation awareness (making sense of what is going on and where it might be going) and action judgement (thinking through the available action paths), and experimentation is just one of these tools.

 


Jonathan Wilson, Football tacticians bowled over by quick-fix data risk being knocked for six (Guardian, 17 September 2022)

Related posts: From Dodgy Data to Dodgy Policy - Mrs May's Immigration Targets (March 2017), Rhyme or Reason (June 2017). Big Data and Organizational Intelligence (November 2018), Dark Data (February 2020), Business Science and its Enemies (November 2020), Thinking with the majority - a new twist (May 2021), Data-Driven Reasoning (COVID) (April 2022)

My new book on Data Strategy now available on LeanPub: How To Do Things With Data.

Friday, January 01, 2021

Does Big Data Drive Netflix Content?

One thing that contributes to the success of Netflix is its recommendation engine, originally based on an algorithm called CineMatch. I discussed this in my earlier post Rhyme or Reason (June 2017).

But that's not the only way Netflix uses data. According to several pundits (Bikker, Dans, Delger, FrameYourTV, Selerity), Netflix also uses big data to create content. However, it's not always clear to what extent these assertions are based on inside information rather than just intelligent speculation.

According to Enrique Dans
The latest Netflix series is not being made because a producer had a divine inspiration or a moment of lucidity, but because a data model says it will work.
Craig Delger's example looks pretty tame - analysing the intersection between existing content to position new content. 

The data collected by Netflix indicated there was a strong interest for a remake of the BBC miniseries House of Cards. These viewers also enjoyed movies by Kevin Spacey, and those directed by David Fincher. Netflix determined that the overlap of these three areas would make House of Cards a successful entry into original programming.

This is the kind of thing risk-averse producers have always done, and although data analytics might enable Netflix to do this a bit more efficiently, it doesn’t seem to represent a massive technological innovation. Thomas Davenport and Jeanne Harris discuss some more advanced use of data in the second edition of their book Competing on Analytics.

Netflix ... has used analytics to predict whether a TV show will be a hit with audiences. ... It has used attribute analysis ... to predict whether customers would like a series, and has identified as many as seventy thousand attributes of movies and TV shows, some of which it drew on for the decision whether to create it.

One of the advantages of a content delivery platform is that you can track the consumption of your content. Amazon used the Kindle to monitor how many chapters people actually read, at what times of day, where and when they get bored. Games platforms (Nintendo, PlayStation, X-Box) can track how far people get with the games, where they get stuck, and where they might need some TLC or DLC. So Netflix knows where you pause or give up, which scenes you rewind to watch again. Netflix can also experiment with alternative trailers for the same content.

In theory, this kind of information can then be used not just by Netflix to decide where to invest, but also by content producers to produce more engaging content. But it's difficult to get clear evidence how much influence this actually has on content creation.

How much other (big) data does Netflix actually collect about its consumers. Craig Delger assumes they operate much like most other data-hungry companies.

Netflix user account data provides verified personal information (sex, age, location), as well as preferences (viewing history, bookmarks, Facebook likes).

 However, in a 2019 interview (reported by @dadehayes), Ted Sarandos denied this.

We don’t collect your data. I don’t know how old you are when you join Netflix. I don’t know if you’re black or white. We know your credit card, but that’s just for payment and all that stuff is anonymized.

Sarandos, who is Chief Content Officer at Netflix, also downplayed the role that data (big or otherwise) played in driving content.

Picking content and working with the creative community is a very human function. The data doesn’t help you on anything in that process. It does help you size the investment. … Sometimes we’re wrong on both ends of that, even with this great data. I really think it’s 70, 80% art and 20, 30% science.

But perhaps that's what you'd expect him to say, given that Netflix has always tried to attract content producers with the promise of complete creative freedom. Amazon Studios has made similar claims. See report by Roberto Baldwin.

While there may be conflicting narratives about the difference data makes to content creation, there are some observations that seem relevant if inconclusive.

Firstly, the long tail argument. The orginal business model for Amazon and Netflix was based on having a vast catalogue, in which most of the entries are of practically no interest to anyone, because the cost of adding something to the catalogue was trivial. Even if the tail doesn't actually contribute as much revenue as the early proponents of the long tail theory suggested, it helps to mitigate uncertainty and risk - not knowing in advance which are going to be hits.

But this effect is countered by the trend towards vertical integration. Amazon and Netflix have gone from distribution to producing their own content, while Disney has moved into streaming. This encourages (but doesn't prove) the hypothesis that there may be some data synergies as well as commercial synergies.

And finally, an apparent preference for conventional non-disruptive content, as noted by Alex Shephard, which is pretty much what we would expect from a data-driven approach.

Netflix is content to replicate television as we know it—and the results are deliberately less than spectacular.

Update (June 2023)

I have been reading a detailed analysis in Ed Finn's book, What Algorithms Want (2017).

Finn's answer to my question about data-driven content is no, at least not directly. Although Netflix had used data to commission new content as well as recommend existing content (Finn's example was House of Cards) it had apparently left the content itself to the producers, and then used data and algorithmic data to promote it. 

After making the initial decision to invest in House of Cards, Netflix was using algorithms to micromanage distribution, not production. Finn p99

Obviously that doesn't say anything about what Netflix has been doing more recently, but Finn seems to have been looking at the same examples as the other pundits I referenced above.


Roberto Baldwin, With House of Cards, Netflix Bets on Creative Freedom (Wired, 1 February 2013)

Yannick Bikker, How Netflix Uses Big Data to Build Mountains of Money (7 July 2020)

Enrique Dans, How Analytics Has Given Netflix The Edge Over Hollywood (Forbes, 27 May 2018), Netflix: Big Data And Playing A Long Game Is Proving A Winning Strategy (Forbes, 15 January 2020)

Thomas Davenport and Jeanne Harris, Competing on Analytics (Second edition 2017) - see extract here https://www.huffpost.com/entry/how-netflix-uses-analytics-to-thrive_b_5a297879e4b053b5525db82b

Ed Finn, What Algorithms Want: Imagination in the Age of Computing (MIT Press, 2017)

FrameYourTV, How Netflix uses Big Data to Drive Success via Inside BigData (20 January 2018) 

Daniel G. Goldstein and Dominique C. Goldstein, Profiting from the Long Tail (Harvard Business Review, June 2006)

Dade Hayes, Netflix’s Ted Sarandos Weighs In On Streaming Wars, Agency Production, Big Tech Breakups, M+A Outlook (Deadline, 22 June 2019)

Alexis C. Madrigal, How Netflix Reverse-Engineered Hollywood (Atlantic, 2 January 2014)

Selerity, How Netflix used big data and analytics to generate billions (5 April 2019)

Alex Shephard, What Netflix’s Obama Deal Says About the Future of Streaming (New Republic 23 May 2018)

Related posts: Competing on Analytics (May 2010), Rhyme or Reason - the Logic of Netflix (June 2017)

Sunday, November 08, 2020

Business Science and its Enemies

#FollowingTheScience As politicians around the world struggle to contain and master the Covid-19 pandemic, the complex role of science in guiding decision and policy has been brought into view. Not only the potential tension between science and policy, but also the tension between different branches of science. (For example, medical science versus behavioural science.)

In this post, I want to look at the role of science in guiding business decisions and policies. Professor Donoho traces the idea of data science back to a paper by John Tukey in the early 1960s, and the idea of management science, which Stafford Beer described as the business use of operations research is at least as old as that. More recently, people have started talking about business science. These sciences are all described as interdisciplinary.

Operations research itself is even older. It was originally established during the second world war as a multi-disciplinary exercise, perhaps similar to what is now being called business science, but it lost its way in the years after the war and was eventually reduced to a set of computer programming techniques with no real impact on organization and culture. 

In a recent webinar on Business Science, Steve Fox asked what business science enabled leaders to do better, and identified three main areas. 

Firstly system-related - to anticipate requirements and resources, identify issues, including risk and compliance issues, and fix problems. 

And secondly people-related - to tell the story, influence stakeholders and negotiate improvements. Focusing on message and communications to the various audiences we need to influence is a key part of business science. 

And thirdly, thinking-related. When business science is applied correctly, it changes the way we think. 

I agree with these three, but I'd like to add a fourth: organizational learning and agility. This is an essential component of my approach to organizational intelligence, which is based on the premise that serious business challenges require a combination of human/social intelligence and machine intelligence.


Steve Fox also stated that the biggest obstacles to creating data-driven business aren't technical; they're cultural and behavioural. So in this post, I also want to look at some of the obstacles of following the science in the context of business and organizational management.

  • Poor Data - Inadequate Measurement and Testing - Ignoring Critical Signals
  • Too Much Data - Overreliance on Technology - Abdication
  • Silo Culture - Someone Else's Problem
  • Linear Thinking - Denial of Complexity
  • Premature Attempts to Eliminate Uncertainty
  • Quantity becomes Quality

After I had initially drawn up this list, I went back to Tukey's original paper and found many of them clearly stated in there. 



Poor Data - Inadequate Measurement and Testing - Ignoring Critical Signals

Empirical science relies heavily on a combination of observation, experiment and measurement. 


Too Much Data - Overreliance on Technology - Abdication

Tukey: Danger only comes from mathematical optimizing when the results are taken too seriously.

Adrian Chiles reminds us that all the data in the world is no use to you if don’t know what to do with it. He quotes Aron F. Sørensen (via Chris Baraniuk) Maybe today there’s a bit of a fixation on instruments.

And in many situations, people overrely on algorithms. For example, judges relying on algorithms to decide probation or sentencing, without applying any of their own judgement or common sense. If a judge doesn't bother doing any actual judging, and lets the algorithm do all the work, what exactly are we paying them for?


Linear Thinking - Denial of Complexity

Tukey: If it were generally understood that the great virtue of neatness was that it made it easier to make things complex again, there would be little to say against a desire for neatness.

One of the best-known examples of linear thinking was a false theory about the vulnerabilities of aircraft during the second world war, based on the location of holes in planes that returned to base. People assumed that the vulnerabilities were where the holes were, and this led to efforts to reinforce planes at those points.

Non-linear thinking turns this theory on its head. If a plane makes it back to base with a hole at a particular location, this should be taken as evidence that the plane was NOT vulnerable at that point. What you really want to know is the location of the holes in the planes that did NOT make it back to base.

In 1979, C West Churchman wrote a book called The Systems Approach and its Enemies, about how people and organizations resist the ways of thinking that Churchman and others were championing. Among other things, he noted the way people often preferred to latch onto simplistic one-dimensional/linear solutions rather than thinking holistically.



Chris Baraniuk, Why it’s not surprising that ship collisions still happen (BBC News 22nd August 2017)

Christa Case Bryant and Story Hinckley, In a polarized world, what does follow the science mean? (Christian Science Monitor, 12 August 2020)

Adrian Chiles, In a data-obsessed world, the power of observation must not be forgotten (The Guardian, 5 November 2020)

C West Churchman, The Systems Approach and its Enemies (1979)

David Donoho, 50 years of Data Science (18 September 2015)

John Dupré, Following the science in the COVID-19 pandemic (Nuffield Council of Bioethics, 29 April 2020)

Faye Flam, Follow the Science Isn’t a Covid-19 Strategy: Policy makers can follow the same facts to different conclusions (Bloomberg, 10 September 2020)

Steve Fox, A better framework is needed: From Data Science to Business Science (Consider.Biz, 17 September 2020) via YouTube

Matt Mathers, Ministers using following the science defence to justify decision-making during pandemic, says Prof Brian Cox (Independent, 19 May 2020) 

Megan Rosen, Fighting the COVID-19 Pandemic Through Testing (Howard Hughes Medical Institute, 18 June 2020)

John Tukey, The future of data analysis (Annals of Mathematical Statistics, 33:1, 1962)

Wikipedia: Data Science, Management Science 


Related posts: Enemies of Intelligence (May 2010), Changing how we think (May 2010), Data-Driven Reasoning - COVID (April 2022)

Monday, August 03, 2020

A Cybernetics View of Data-Driven

Cybernetics helps us understand dynamic systems that are driven by a particular type of data. Here are some examples:

  • Many economists see markets as essentially driven by price data.
  • On the Internet (especially social media) we can see systems that are essentially driven by click data.
  • Stan culture, where hardcore fans gang up on critics who fail to give the latest album a perfect score

In a recent interview with Alice Pearson of CRASSH, Professor Will Davies explains the process as follows:

For Hayek, the advantage of the market was that it was a space in which stimulus and response could be in a constant state of interactivity: that prices send out information to people, which they respond to either in the form of consumer decisions or investment decisions or new entrepreneurial strategies.

Davies argued that this is now managed on screens, with traders on Wall Street and elsewhere constantly interacting with (as he says) flashing numbers that are rising and falling.

The way in which the market is visualized to people, the way it presents itself to people, the extent to which it is visible on a single control panel, is absolutely crucial to someone's ability to play the market effectively.

Davies attributes to cybernetics a particular vision of human agency: to think of human beings as black boxes which respond to stimuluses in particular ways that can be potentially predicted and controlled. (In market trading, this thought leads naturally to replacing human beings with algorithmic trading.)

Davies then sees this cybernetic vision encapsulated in the British government approach to the COVID-19 pandemic.

What you see now with this idea of Stay Alert ... is a vision of an agent or human being who is constantly responsive and constantly adaptable to their environment, and will alter their behaviour depending on what types of cues are coming in from one moment to the next. ... The ideological vision being presented is of a society in which the rules of everyday conduct are going to be constantly tweaked in response to different types of data, different things that are appearing on the control panels at the Joint Biosecurity Centre.

The word alert originally comes from an Italian military term all'erta - to the watch. So the slogan Stay Alert implies a visual idea of agency. But as Alice Pearson pointed out, that which is supposed to be the focus of our alertness is invisible. And it is not just the virus itself that is invisible, but (given the frequency of asymptomatic carriers) which people are infectious and should be avoided.

So what visual or other signals is the Government expecting us to be alert to? If we can't watch out for symptoms, perhaps we are expected instead to watch out for significant shifts in the data - ambiguous clues about the effectiveness of masks or the necessity of quarantine. Or perhaps significant shifts in the rules.

Most of us only see a small fraction of the available data - Stafford Beer's term for this is attenuation, and Alice Pearson referred to hyper-attenuation. So we seem to be faced with a choice between on the one hand a shifting set of rules based on the official interpretation of the data - assuming that the powers-that-be have a richer set of data than we do, and a more sophisticated set of tools for managing the data - and on the other hand an increasingly strident set of activists encouraging people to rebel against the official rules, essentially setting up a rival set of norms in which for example mask-wearing is seen as a sign of capitulation to a socialist regime run by Bill Gates, or whatever.
 
Later in the interview, and also in his New Statesman article, Davies talks about a shifting notion of rules, from a binding contract to mere behavioural nudges.

Rules morph into algorithms, ever-more complex sets of instructions, built around an if/then logic. By collecting more and more data, and running more and more behavioural tests, it should in principle be possible to steer behaviour in the desired direction. ... The government has stumbled into a sort of clumsy algorithmic mentality. ... There is a logic driving all this, but it is one only comprehensible to the data analyst and modeller, while seeming deeply weird to the rest of us. ... To the algorithmic mind, there is no such thing as rule-breaking, only unpredicted behaviour.

One of the things that differentiates the British government from more accomplished practitioners of data-driven biopower (such as Facebook and WeChat) is the apparent lack of fast and effective feedback loops. If what the British government is practising counts as cybernetics at all, it seems to be a very primitive and broken version of first-order cybernetics.

When Norbert Wiener introduced the term cybernetics over seventy years ago, describing thinking as a kind of information processing and people as information processing organisms, this was a long way from simple behaviourism. Instead, he emphasized learning and creativity, and insisted on the liberty of each human being to develop in his freedom the full measure of the human possibilities embodied in him.
 
In a talk on the entanglements of bodies and technologies, Lucy Suchman draws on an article by Geoff Bowker to describe the universal aspirations of cybernetics.
 
Cyberneticians declared a new age in which Darwin's placement of man as one among the talks about how animals would now be followed by cybernetics' placement of man as one among the machines.
 
However, as Suchman reminds us
 
Norbert Wiener himself paid very careful attention to questions of labour, and actually cautioned against the too-broad application of models that were designed in relation to physical or computational systems to the social world.

Even if sometimes seeming outnumbered, there have always been some within the cybernetics community who are concerned about epistemology and ethics. Hence second-order (or even third-order) cybernetics.



Footnote (July 2021)

Thanks to Claire Song, I have been looking at Wiener's 1943 paper on Behaviour, Purpose and Teleology, and now realise I should be more precise about behaviourism. While I still hold that Wiener does not subscribe to classical behaviourism, he does seem to follow a form of teleological behaviourism, although this term is nowadays associated with Howard Rachlin.
 
Having also been looking at Simondon and Deleuze, I'm also noticing how Will Davies is hinting at the notion of modulation. But this is a topic for another day.



Ben Beaumont-Thomas, Hardcore pop fans are abusing critics – and putting acclaim before art (The Guardian, 3 August 2020)

Geoffrey Bowker, How to be universal: some cybernetic strategies, 1943-1970 (Social Studies of Science 23, 1993) pp 107-127
 
Philip Boxer & Vincent Kenny, The economy of discourses - a third-order cybernetics (Human Systems Management, 9/4 January 1990) pp 205-224
 

Will Davies, Coronavirus and the Rise of Rule-Breakers (New Statesman, 8 July 2020)
 
 
Arturo Rosenblueth, Norbert Wiener and Julian Bigelow, Behaviour, Purpose and Teleology (Philosophy of Science, Vol. 10, No. 1, Jan 1943) pp. 18-24

Lucy Suchman, Restoring Information’s Body: Remediations at the Human-Machine Interface (Medea, 20 October 2011) Recording via YouTube
 
Norbert Wiener, The Human Use of Human Beings (1950, 1954)

Stanford Encyclopedia of Philosophy: A cybernetic view of human nature

Saturday, December 07, 2019

Developing Data Strategy

The concepts of net-centricity, information superiority and power to the edge emerged out of the US defence community about twenty years ago, thanks to some thought leadership from the Command and Control Research Program (CCRP). One of the routes of these ideas into the civilian world was through a company called Groove Networks, which was acquired by Microsoft in 2005 along with its founder, Ray Ozzie. The Software Engineering Institute (SEI) provided another route. And from the mid 2000s onwards, a few people were researching and writing on edge strategies, including Philip Boxer, John Hagel and myself.

Information superiority is based on the idea that the ability to collect, process, and disseminate an uninterrupted flow of information will give you operational and strategic advantage. The advantage comes not only from the quantity and quality of information at your disposal, but also from processing this information faster than your competitors and/or fast enough for your customers. TIBCO used to call this the Two-Second Advantage.

And by processing, I'm not just talking about moving terabytes around or running up large bills from your cloud provider. I'm talking about enterprise-wide human-in-the-loop organizational intelligence: sense-making (situation awareness, model-building), decision-making (evidence-based policy), rapid feedback (adaptive response and anticipation), organizational learning (knowledge and culture). For example, the OODA loop. That's my vision of a truly data-driven organization.

There are four dimensions of information superiority which need to be addressed in a data strategy: reach, richness, agility and assurance. I have discussed each of these dimensions in a separate post:





Philip Boxer, Asymmetric Leadership: Power to the Edge

Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017) 

John Hagel III and John Seely Brown, The Agile Dance of Architectures – Reframing IT Enabled Business Opportunities (Working Paper 2003)

Vivek Ranadivé and Kevin Maney, The Two-Second Advantage: How We Succeed by Anticipating the Future--Just Enough (Crown Books 2011). Ranadivé was the founder and former CEO of TIBCO.

Richard Veryard, Building Organizational Intelligence (LeanPub 2012)

Richard Veryard, Information Superiority and Customer Centricity (Cutter Business Technology Journal, 9 March 2017) (registration required)

Wikipedia: CCRP, OODA Loop, Power to the Edge

Related posts: Microsoft and Groove (March 2005), Power to the Edge (December 2005), Two-Second Advantage (May 2010), Enterprise OODA (April 2012), Reach Richness Agility and Assurance (August 2017)

Sunday, December 01, 2019

Data Strategy - Reach

This is one of a series of posts looking at the four key dimensions of Data and Information that must be addressed in a data strategy - reach, richness, agility and assurance.



Data strategy nowadays is dominated by the concept of big data, whatever that means. Every year our notions of bigness are being stretched further. So instead of trying to define big, let me talk about reach.

Firstly, this means reaching into more sources of data. Instead of just collecting data about the immediate transactions, enterprises now expect to have visibility up and down the supply chain, as well as visibility into the world of the customers and end-consumers. Data and information can be obtained from other organizations in your ecosystem, as well as picked up from external sources such as social media. And the technologies for monitoring (telemetrics, internet of things) and surveillance (face recognition, tracking, etc) are getting cheaper, and may be accurate enough for some purposes.

Obviously there are some ethical as well as commercial issues here. I'll come back to these.

Reach also means reaching more destinations. In a data-driven business, data and information need to get to where they can be useful, both inside the organization and across the ecosystem, to drive capabilities and processes, to support sense-making (also known as situation awareness), policy and decision-making, and intelligent action, as well as organizational learning. These are the elements of what I call organizational intelligence. Self-service (citizen) data and intelligence tools, available to casual as well as dedicated users, improve reach; and the tool vendors have their own reasons for encouraging this trend.

In many organizations, there is a cultural divide between the specialists in Head Office and the people at the edge of the organization. If an organization is serious about being customer-centric, it needs to make sure that relevant and up-to-date information and insight reaches those dealing with awkward customers and other immediate business challenges. This is the power-to-the-edge strategy.

Information and insight may also have value outside your organization - for example to your customers and suppliers, or other parties. Organizations may charge for access to this kind of information and insight (direct monetization), may bundle it with other products and services (indirect monetization), or may distribute it freely for the sake of wider ecosystem benefits.

And obviously there will be some data and intelligence that must not be shared, for security or other reasons. Many organizations will adopt a defensive data strategy, protecting all information unless there is a strong reason for sharing; others may adopt a more offensive data strategy, seeking competitive advantage from sharing and monetization except for those items that have been specifically classified as private or confidential.

How are your suppliers and partners thinking about these issues? To what extent are they motivated or obliged to share data with you, or to protect the data that you share with them? I've seen examples where organizations lack visibility of their own assets, because they have outsourced the maintenance of these assets to an external company, and the external company fails to provide sufficiently detailed or accurate information. (When implementing your data strategy, make sure your contractual agreements cover your information sharing requirements.)

Data protection introduces further requirements. Under GDPR, data controllers are supposed to inform data subjects how far their personal data will reach, although many of the privacy notices I've seen have been so vague and generic that they don't significantly constrain the data controller's ability to share personal data. Meanwhile, GDPR Article 28 specifies some of the aspects of data sharing that should be covered in contractual agreements between data controllers and data processors. But compliance with GDPR or other regulations doesn't fully address ethical concerns about the collection, sharing and use of personal data. So an ethical data strategy should be based on what the organization thinks is fair to data subjects, not merely what it can get away with.

There are various specific issues that may motivate an organization to improve the reach of data as part of its data strategy. For example:
  • Critical data belongs to third parties
  • Critical business decisions lacking robust data
  • I know the data is in there, but I can't get it out.
  • Lack of transparency – I can see the result, but I don’t know how it has been calculated.
  • Analytic insight narrowly controlled by a small group of experts – not easily available to general management
  • Data and/or insight would be worth a lot to our customers, if only we had a way of getting it to them.
In summary, your data strategy needs to explain how you are going to get data and intelligence
  • From a wide range of sources
  • Into a full range of business processes at all touchpoints
  • Delivered to the edge – where your organization engages with your customers


Next post Richness

Related posts

Power to the Edge (December 2005)
Reach, Richness, Agility and Assurance (August 2017)
Setting off towards the data-driven business (August 2019)
Beyond Trimodal - Citizens and Tourists (November 2019)

Monday, September 30, 2019

Towards Data Model Harmony

Last week, someone asked me how I go about producing a data model. I found out afterwards that my answer was considered too brief. So here's a longer version of my answer.


The first thing to consider is the purpose of the modelling. Sometimes there is a purely technical agenda - for example, updating or reengineering some data platform - but usually there are some business requirements and opportunities - for example to make the organization more data-driven. I prefer to start by looking at the business model - what services does it provide to its customers, what capabilities or processes are critical to the organization, what decisions and policies need to be implemented, and what kind of evidence and feedback loops can improve things. From all this, we can produce a high-level set of data requirements - what concepts, how interconnected, at what level of granularity, etc. - and working top-down from conceptual data models to produce more detailed logical and physical models.


But there are usually many data models in existence already - which may be conceptual, logical or physical. Some of these may be formally documented as models, whether using a proper data modelling tool or just contained in various office tools (e.g. Excel, PowerPoint, Visio, Word). Some of them are implicit in other documents, such as written policies and procedures, or can be inferred ("reverse engineered") from existing systems and from the structure and content of data stores. Some concepts and the relationships between them are buried in people's heads and working practices, and may need to be elicited.

And that's just inside the organization. When we look outside, there may be industry models and standards, such as ACORD (insurance) and GS1 (groceries). There may also be models pushed by vendors and service/platform providers - IBM has been in this game longer than most. There may also be models maintained by external stakeholders - e.g., suppliers, customers, regulators.

There are several points to make about this collection of data models.
  • There will almost certainly be conflicts between these models - not just differences in scope and level/ granularity, but direct contradictions.
  • And some of these models will be internally inconsistent. Even the formal ones may not be perfectly consistent, and the inferred/ elicited ones may be very muddled. The actual content of a data store may not conform to the official schema (data quality issues).
  • You probably don't have time to wade through all of them, although there are some tools that may be able to process some of these automatically for you. So you will have to be selective, and decide which ones are more important.
  • In general, your job is not simply to reproduce these models (minus the inconsistencies) but to build models that will support the needs of the business and its stakeholders. So looking at the existing models is necessary but not sufficient.


So why do you need to look at the "legacy" models at all?  Here are the main reasons.
  • Problems and issues that people may be experiencing with existing systems and processes can often be linked to problems with the underlying data models.
  • Inflexibility in these data models may constrain future business strategies and tactics.
  • New systems and processes typically need to transition from existing ones - not just data migration but also conceptual migration (people learning and adopting a revised set of business concepts and working practices) - and/or interoperate with them (data integration, joined-up business).
  • Some of the complexity in the legacy models may be redundant, but some of it may provide clues about complexity in the real world. (The fallacy of eliminating things just because you don't understand why they're there is known as Chesterton's Fence. See my post on Low-Hanging Fruit.) The requirements elicitation process typically finds a lot of core requirements, but often misses many side details. So looking at the legacy models provides a useful completeness check.

If your goal is to produce a single, consistent, enterprise-wide data model, good luck with that. I'll check back with you in ten years to see how far you've got. Meanwhile, the pragmatic approach is to work at multiple tempos in parallel - supporting short term development sprints, refactoring and harmonizing in the medium term, while maintaining steady progress towards a longer-term vision. Accepting that all models are wrong, and prioritizing the things that matter most to the organization.

The important issues tend to be convergence and unbundling. Firstly, while you can't expect to harmonize everything in one go, you don't want things to diverge any further. And secondly, where two distinct concepts have been bundled together, trying to tease them apart - at least for future systems and data stores - for the sake of flexibility.

Finally, how do I know whether the model is any good? On the one hand, I need to be able to explain it to the business, so it had better not be too complicated or abstract. On the other hand, it needs to be able to reflect the real complexity of the business, which means testing it against a range of scenarios to make sure I haven't embedded any false or simplistic assumptions.



Longer answers are also available. Would you like me to run a workshop for you?



Wikipedia: All Models are Wrong, Chesterton's Fence

Related posts

How Many Products? (October 2004), Modelling Complex Classification (February 2009), Deconstructing the Grammar of Business (June 2009), Conceptual Modelling - Why Theory (November 2011)


Declaration of interest - in 2008(?) I wrote some white papers for IBM concerning the use of their industry models.

Thursday, August 22, 2019

Setting off Towards the Data-Driven Business

In an earlier post Towards the Data-Driven Business, I talked about the various roles that data and intelligence can play in the business. But where do you start? In this post, I shall talk about the approach that I have developed and used in a number of large organizations.


To build a roadmap that takes you into the future from where you are today, you need three things.


Firstly an understanding of the present. This includes producing AS-IS models of your current (legacy) systems, what data have you got and how are you currently managing and using it. We need to know about the perceived pain points, not because we only want to fix the symptoms, but because these will help us build a consensus for change. Typically we find a fair amount of duplicated and inconsistent data, crappy or non-existent interfaces, slow process loops and data bottlenecks, and general inflexibility.

This is always complicated by the fact that there are already numerous projects underway to fix some of the problems, or to build additional functionality, so we need to understand how these projects are expected to alter the landscape, and in what timescale. It sometimes becomes apparent that these projects are not ideally planned and coordinated from a data management perspective. If we find overlapping or fragmented responsibility in some critical data areas, we may need to engage with programme management and governance to support greater consistency and synergy.


Secondly a vision of the future opportunities for data and intelligence (and automation based on these). In general terms, these are outlined in my earlier post. To develop a vision for a specific organization, we need to look at their business model - what value do they provide to customers and other stakeholders, how is this value delivered (as business services or otherwise), and how do the capabilities and processes of the organization and its partners support this.

For example, I worked with an organization that had done a fair amount of work on modelling their internal processes and procedures, but lacked the outside-in view. So I developed a business service architecture that showed how the events and processes in their customers' world triggered calls on their services, and what this implied for delivering a seamless experience to their customers.

Using a capability-based planning approach, we can then look at how data, intelligence and automation could improve not only individual business services, processes and underlying capabilities, but also the coordination and feedback loops between these. For example in a retail environment, there are typically processes and capabilities associated with both Buying and Selling, and you may be able to use data and intelligence to make each of them more efficient and effective. But more importantly, you can improve the alignment between Buying and Selling.

(In some styles of business capability model, coordination is shown explicitly as a capability in its own right, but this is not a common approach.)

The business model also identifies which areas are strategically important to the business. At one organization, when we mapped the IT costs against the business model, we found that a disproportionate amount of effort was being devoted to non-strategic stuff, and surprisingly little effort for the customer-facing (therefore more strategically important) activities. (A colour-coded diagram can be very useful in presenting such issues to senior management.)

Most importantly, we find that a lot of stakeholders (especially within IT) have a fairly limited vision about what is possible, often focused on the data they already have rather than the data they could or should have. The double-diamond approach to design thinking works here, to combine creative scenario planning with highly focused practical action. I've often found senior business people much more receptive to these kind of discussions than the IT folk.

We should then be able to produce a reasonably future-proof and technology independent TO-BE data and information architecture, which provides a loosely-coupled blueprint for data collection, processing and management.


Thirdly, how to get from A to B. In a large organization, this is going to take several years. A complete roadmap cannot just be a data strategy, but will usually involve some elements of business process and organizational change, as well as application, integration and technology strategy. It may also involve outside stakeholders - for example, providing direct access to suppliers and business partners via portals and APIs, and sharing data and intelligence with them, while obtaining consent from data subjects and addressing any other privacy, security and compliance issues. There are always dependencies between different streams of activity within the programme as well as with other initiatives, and these dependencies need to be identified and managed, even if we can avoid everything being tightly coupled together.

Following the roadmap will typically contain a mix of different kinds of project. There may need to be some experimental ("pioneer") projects as well as larger development and infrastructure ("settler", "town planner") projects.

To gain consensus and support, you need a business case. Although different organizations may have different ways of presenting and evaluating the business case, and some individuals and organizations are more risk-averse than others, a business case will always involve an argument that the benefits (financial and possibly non-financial) outweigh the costs and risks.

Generally, people like to see some short-term benefits ("quick wins" or the dreaded "low-hanging fruit") as well as longer-term benefits. A well-balanced roadmap spreads the benefits across the phases - if you manage to achieve 80% of the benefits in phase 1, then your roadmap probably wasn't ambitious enough, so don't be surprised if nobody wants to fund phase 2. 


Finally, you have to implement your roadmap. This means getting the funding and resources, kicking off multiple projects as well as connecting with relevant projects already underway, managing and coordinating the programme. It also means being open to feedback and learning, responding to new emerging challenges (such as regulation and competition), maintaining communication with stakeholders, and keeping the vision and roadmap alive and up-to-date.



Related posts

See also

Saturday, August 03, 2019

Towards the Data-Driven Business

If we want to build a data-driven business, we need to appreciate the various roles that data and intelligence can play in the business - whether improving a single business service, capability or process, or improving the business as a whole. The examples in this post are mainly from retail, but a similar approach can easily be applied to other sectors.


Sense-Making and Decision Support

The traditional role of analytics and business intelligence is helping the business interpret and respond to what is going on.

Once upon a time, business intelligence always operated with some delay. Data had to be loaded from the operational systems into the data warehouse before they could be processed and analysed. I remember working with systems that generated management information based on yesterday's data, or even last month's data. Of course, such systems don't exist any more (!?), because people expect real-time insight, based on streamed data.

Management information systems are supposed to support individual and collective decision-making. People often talk about actionable intelligence, but of course it doesn't create any value for the business until it is actioned. Creating a fancy report or dashboard isn't the real goal, it's just a means to an end.

Analytics can also be used to calculate complicated chains of effects on a what-if basis. For example, if we change the price of this product by this much, what effect is this predicted to have on the demand for other products, what are the possible responses from our competitors, how does the overall change in customer spending affect supply chain logistics, do we need to rearrange the shelf displays, and so on. How sensitive is Y to changes in X, and what is the optimal level of Z?

Analytics can also be used to support large-scale optimization - for example, solving complicated scheduling problems.

 
Automated Action

Increasingly, we are looking at the direct actioning of intelligence, possibly in real-time. The intelligence drives automated decisions within operational business processes, often without a human-in-the-loop, where human supervision and control may be remote or retrospective. A good example of this is dynamic retail pricing, where an algorithm adjusts the prices of goods and services according to some model of supply and demand. In some cases, optimized plans and schedules can be implemented without a human in the loop.

So the data doesn't just flow from the operational systems into the data warehouse, but there is a control flow back into the operational systems. We can call this closed loop intelligence.

(If it takes too much time to process the data and generate the action, the action may no longer be appropriate. A few years ago, one of my clients wanted to use transaction data from the data warehouse to generate emails to customers - but with their existing architecture there would have been a 48 hour delay from the transaction to the email, so we needed to find a way to bypass this.)


Managing Complexity

If you have millions of customers buying hundreds of thousands of products, you need ways of aggregating the data in order to manage the business effectively. Customers can be grouped into segments, products can be grouped into categories, and many organizations use these groupings as a basis for dividing responsibilities between individuals and teams. However, these groupings are typically inflexible and sometimes seem perverse.

For example, in a large supermarket, after failing to find maple syrup next to the honey as I expected, I was told I should find it next to the custard. There may well be a logical reason for this grouping, but this logic was not apparent to me as a customer.

But the fact that maple syrup is in the same product category as custard doesn't just affect the shelf layout, it may also mean that it is automatically included in decisions affecting the custard category and excluded from decisions affecting the honey category. For example, pricing and promotion decisions.

A data-driven business is able to group things dynamically, based on affinity or association, and then allows simple and powerful decisions to be made for this dynamic group, at the right level of aggregation.

Automation can then be used to cascade the action to all affected products, making the necessary price, logistical and other adjustments for each product. This means that a broad plan can be quickly and consistently implemented across thousands of products.


Experimentation and Learning

In a data-driven business, every activity is designed for learning as well as doing. Feedback is used in the cybernetic sense - collecting and interpreting data to control and refine business rules and algorithms.

In a dynamic world, it is necessary to experiment constantly. A supermarket or online business is a permanent laboratory for testing the behaviour of its customers. For example, A/B testing where alternatives are presented to different customers on different occasions to test which one gets the best response. As I mentioned in an earlier post, Netflix declares themselves "addicted" to the methodology of A/B testing.

In a simple controlled experiment, you change one variable and leave everything else the same. But in a complex business world, everything is changing. So you need advanced statistics and machine learning, not only to interpret the data, but also to design experiments that will produce useful data.


Managing Organization

A traditional command-and-control organization likes to keep the intelligence and insight in the head office, close to top management. An intelligent organization on the other hand likes to mobilize the intelligence and insight of all its people, and encourage (some) local flexibility (while maintaining global consistency). With advanced data and intelligence tools, power can be driven to the edge of the organization, allowing for different models of delegation and collaboration. For example, retail management may feel able to give greater autonomy to store managers, but only if the systems provide faster feedback and more effective support. 


Transparency

Related to the previous point, data and intelligence can provide clarity and governance to the business, and to a range of other stakeholders. This has ethical as well as regulatory implications.

Among other things, transparent data and intelligence reveal their provenance and derivation. (This isn't the same thing as explanation, but it probably helps.)




Obviously most organizations already have many of the pieces of this, but there are typically major challenges with legacy systems and data - especially master data management. Moving onto the cloud, and adopting advanced integration and robotic automation tools may help with some of these challenges, but it is clearly not the whole story.

Some organizations may be lopsided or disconnected in their use of data and intelligence. They may have very sophisticated analytic systems in some areas, while other areas are comparatively neglected. There can be a tendency to over-value the data and insight you've already got, instead of thinking about the data and insight that you ought to have.

Making an organization more data-driven doesn't always entail a large transformation programme, but it does require a clarity of vision and pragmatic joined-up thinking.


Related posts: Rhyme or Reason: The Logic of Netflix (June 2017), Setting off towards the Data-Driven Business (August 2019)


Updated 13 September 2019

Thursday, June 29, 2017

Rhyme or Reason - The Logic of Netflix

@GuyLongworth, who teaches philosophy at Warwick, is puzzled by the Netflix recommendation algorithm, linking Annie Hall with Son of Saul.


Philosopher Guy's appeal to rhyme rather than reason seems to be based on the view that the two films have nothing else in common. But this is rather contradicted by the fact that he has actually seen both. Netflix has correctly surmised that people like Guy might possibly be interested in both films.

The first thing to understand about recommendation algorithms is that they are not solely (if at all) based on the intrinsic similarity of two products, but on what we might call relational similarity. If I tell you that people who like pizza also like ice-cream, that is primarily a statement about the "people who like". You might try to explain this statement by observing that pizza and ice-cream both have a high fat content, but then so do lots of other foods.

And when someone has just eaten a pizza, it is perhaps more likely that they will go on to eat ice-cream next, rather than eating another pizza straightaway.



The second thing to understand is that recommendation algorithms work by trial and error. Netflix wants to know if Guy will accept its suggestion to re-watch Annie Hall, and this feedback will add to its knowledge of Guy as well as its knowledge of relational similarity between films.

Trial and error works better if you have a diverse range of trials. If you watch a couple of films in a particular genre, and then Netflix only ever shows you suggestions within that genre, it will never discover that you might be interested in a completely different genre as well. And you will never discover the full range of Netflix offerings, which could result in your abandoning Netflix altogether.

Diversity of suggestion adds to the richness of the experimental data that are generated. How many members of the "people like Guy" category respond positively to suggestion A, and how many to suggestion B? Todd Yellin, Netflix VP of Product, told journalists in March that "we are addicted to the methodology of A/B testing".

What is genre anyway? In the past, genres (in book publishing, music, film, video games) were defined by the industry or by experts. In 2013, Netflix employed over 40 people hand-tagging TV shows and movies. But a data-driven approach allows genres to emerge organically from the patterns of consumption. Netflix (and Amazon and the rest) will be much more interested in data-defined genres than in industry-defined genres.

In her rant against the Netflix algorithm, @mehreenkasana makes two apparently contrary complaints. On the one hand, Netflix offers her content that is nothing like anything she has ever watched. She dismisses one suggestion with the words "I’ve never watched a show in a remotely similar vein." On the other hand, she doesn't see how Netflix can offer her challenging experiences. "Intensely curated experiences, whether you’re looking to explore movies or to meet people to date, remove one of the most critical aspects of a rich experience: risk, as in going out of your comfort zone."

But as @larakiara explains, "personalization is key to ensuring users keep coming back. But there's also the problem of over-personalization, so Netflix has to introduce variants."

Thus we can see Netflix as an embodiment of at least three of @kevin2kelly's Nine Laws of God.
  • Control from the bottom up
  • Maximize the fringes
  • Honor your errors
"A trick will only work for a while, until everyone else is doing it." (Remember Blockbuster.)




Mehreen Kasana, Netflix’s recommendation algorithm sucks (The Outline, 24 March 2017)

Kevin Kelly, Nine Laws of God. Chapter 24 of Out of Control (1994)

Lara O'Reilly, Netflix lifted the lid on how the algorithm that recommends you titles to watch actually works (Business Insider, 26 February 2016)

Janko Roettgers, Netflix Replacing Star Ratings With Thumbs Ups and Thumbs Downs (Variety, 16 March 2017), How Netflix tests Netflix: The story behind the service’s new two-thumbs-up feature (Protocol, 11 April 2022)

Tom Vanderbilt, The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next (Wired, 7 August 2013)

Wikipedia: A/B Testing

Related posts: Competing on Analytics (May 2010), Emergent Similarity (February 2012), The Nature of Platforms (July 2017), Towards the Data-Driven Business (August 2019), Does Big Data Drive Netflix Content? (January 2021)

Sunday, January 11, 2015

From Coincidensity to Consilience

In my post From Convenience to Consilience - “Technology Alone Is Not Enough"  (October 2011), I praised Steve Jobs for his role in the design of the Pixar campus, whose physical layout was intended to bring different specialists together in serendipitous interactions.

Thanks to @jhagel and @CoCreatr, I have just read a blogpost by @StoweBoyd commenting on a related project at Google to build a new Googleplex. Because this is Google, this is a bottom-up data-driven project: it is based on a predicted metric of coincidensity, which is sometimes defined as the likelihood of serendipity.

With the right technology (for example, electronic monitoring of the corridors and/or tagging of employees), a corporation like Google can easily monitor and control “casual collisions of the work force”.

But as Ilkka Kakko (@Serendipitor) points out, such measures of coincidensity cannot be equated with true serendipity. I wonder whether Google will be able to correlate casual meetings with enhanced knowledge and understanding, and measure the consequent quantity and quality of innovation? And then reconfigure the campus to improve the results? Hm.


However, the principle of designing physical spaces for human activity rather than for visual elegance is a good one, as is the notion of evidence-based design. Form following function.



Stowe Boyd, Building From The Inside Out (February 2013)

Paul Goldberger, Exclusive Preview: Google’s New Built-from-Scratch Googleplex (Vanity Fair, February 2013)

Ilkka Kakko, Are we reducing the magic of serendipity to the logic of coincidence? (April 2013)