Showing posts with label DataStrategy. Show all posts
Showing posts with label DataStrategy. Show all posts

Saturday, February 22, 2025

Data-Driven Data Strategy

One of the things I've been pushing for years is the idea that data strategy should itself be data driven. In other words, if we are claiming that all these expensive data and analytics initiatives are driving business improvement, let's see the evidence, and let's have a feedback loop that allows us to increase the cost-effectiveness of these initiatives. This is becoming increasingly important as people start to pay attention to the environmental cost as well as the monetary cost.

This idea can be found in my ebook How To Do Things With Data and my articles for the Cutter Journal, as well as on this blog.

I doubt anyone will be surprised by Gartner's recent survey, showing that although over 90% of the respondents acknowledged the importance of being value-focused and outcome-focused, only 22% were measuring business impact. So they clearly aren't eating their own dog food.

And the same thing applies to the current hype around AI. Tech journalist @LindsAI Clark asks will we be back in another 10 years wondering who is measuring the business value of all that AI in which organizations have invested billions?

I think we already know the answer to that one.

Lindsay Clark, Data is very valuable, just don't ask us to measure it, leaders say (The Register, 21 Feb 2025)

Richard Veryard, How To Do Things With Data (LeanPub) 

Richard Veryard, Understanding the Value of Data (Cutter Business Technology Journal, 11 May 2020)

Wikipedia: Eating your own dog food

 

Wednesday, August 03, 2022

From Data to Doing

One of the ideas running through my work on #datastrategy is to see data as a means to an end, rather than an end in itself. As someone might once have written, 

Data scientists have only interpreted the world in various ways. The point however is to change it.

Many people in the data world are focussed on collecting, processing and storing data, rendering and analysing the data in various ways, and making it available for consumption or monetization. In some instances, what passes for a data strategy is essentially a data management strategy.

I agree that this is important and necessary, but I don't think it is enough.

I am currently reading a brilliant book by Annemarie Mol on Medical Ontology. In one chapter, she describes the uses of test data by different specialists in a hospital. The researchers in the hospital laboratory want to understand a medical condition in great detail - what causes it, how it develops, what it looks like, how to detect it and measure its progress, how it responds to various treatments in different kinds of patient. The clinicians on the other hand are primarily interested in interventions - what can we do to help this patient, what are the prospects and risks.

In the corporate world, senior managers often use data as a monitoring tool - screening the business for areas that might need intervention. Highly aggregated data can provide them with a thin but panoramic view of what is going on, but may not provide much guidance on corrective or preventative action. See my post on OrgIntelligence in the Control Room (October 2010).

Meanwhile, what if your data strategy calls for a 360 view of key data domains, such as CUSTOMER and PRODUCT. If these initiatives are to be strategically meaningful to the business, and not merely exercises in technical plumbing, they need to be closely aligned with the business strategy - for example delivering on customer centricity and/or product leadership.

In other words, it's not enough just to have a lot of good quality data and generating a lot of analytic insight. Hence the title of my new book - How To Do Things With Data.



Annemarie Mol, The Body Multiple: Ontology in Medical Practice (Duke University Press 2002)

My book on Data Strategy is now available in beta version. https://leanpub.com/howtodothingswithdata/

Monday, August 01, 2022

New Book - How to do things with data

#datastrategy My latest book has been published by @Leanpub 

 

Book cover: How to do things with data


This is a beta version, and I intend to add more material as well as responding to feedback from readers and making general improvements. Subscribers will always have access to the latest version.

Saturday, March 26, 2022

Information Superiority in Ukraine

In my work on Data Strategy, I have drawn heavily on the concept of information superiority / information advantage, which was originally developed in a military / defence context. Like many other innovations, there is significant potential for peaceful / civilian use of this concept.

There are now some early hints of information advantage / disadvantage emerging from the war in Ukraine. One obvious area is Information, Surveillance, Target Acquisition and Reconnaisance (ISTAR).

By utilizing its intelligence and surveillance assets in Eastern Europe, the United States was able to build a picture of Russia’s movements and strategically release information about Russia’s plans. US source quoted in Insinna

Another area where information is important is logistics. As already predicted before the invasion started (for example Vershinin), Russia's troops appear to have suffered significant problems in this area, at least initially.

In other areas, however, predictive computer models of the conflict seem to have been way off the mark, mainly because they failed to adequately represent the human factor.

John Naughton wonders why Russia appears to have abandoned the Gerasimov doctrine of new generation (nonlinear) warfare, which appears to have been executed successfully in the occupation of Crimea in 2014, and reverted to an older style of warfare. Ukraine, on the other hand, appears to have applied the Gerasimov doctrine more successfully, protecting against cyber attack while easily managing to intercept insecure Russian communications. By a strange historical irony, it is reported that one of the Russian generals killed after his geolocation was intercepted by the Ukranians was called Vitaly Gerasimov. Wikipedia advises us not to confuse him with Valery Gerasimov, the author of the doctrine.


No doubt there will be more clues emerging from this horrible conflict.


 

Valerie Insinna, Top American generals on three key lessons learned from Ukraine (Breaking Defense, 11 March 2022) 

Martin Murphy, Understanding Russia’s Concept for Total War in Europe (Heritage Foundation, 12  September 2016)

John Naughton, Putin has a 21st-century digital battle plan, so why is he fighting like it’s 1939? (Guardian, 26 March 2022)

Dan Sabbagh, Russia solving logistics problems and could attack Kyiv within days – experts (Guardian, 8 March 2022)

Alex Vershinin, Feeding the Bear: A Closer Look at Russian Army Logistics and the Fait Accompli (War on the Rocks, 23 November 2021)

Wikipedia: ISTAR, New Generation Warfare, Valery Gerasimov, Vitaly Gerasimov

Previous posts on Information Superiority: Information Superiority and Customer Centricity (March 2017), Developing Data Strategy (December 2019), Information Advantage (not necessarily) in Air and Space (July 2020)

Tuesday, August 10, 2021

Data as pictures?

Many people believe that data should provide a faithful representation or picture of the real world. While this is often a helpful simplification, it can sometimes mislead.

Firstly, the picture theory isn't very good at handling probability and uncertainty. When faced with alternative pictures (facts), people may try to pick the most likely or attractive one, and then act as if this were the truth. 

As I see it, the problem of knowledge and uncertainty fundamentally disrupts our conventional assumptions about representation, in much the same way that quantum physics disrupts our assumptions about reality. See previous posts on Uncertainty.

Secondly, the picture theory misrepresents judgements (whether human or algorithmic) as descriptions. When a person is classified as a poor credit risk, or as a potential criminal or terrorist, this is a speculative judgement about the future, which is often sadly self-fulfilling. For example, when a person is labelled and treated as a potential criminal, this may make it more difficult for them to live as a law-abiding citizen, and they are therefore steered towards a life of crime. Data of this kind may therefore be performative, in the sense that it creates the reality that it claims to describe.

Thirdly, the picture theory assumes that any two facts must be consistent, and simple facts can easily be combined to produce more complex facts. Failures of consistency or composition can then only be explained (and fixed) in terms of data quality and governance. See my post on Three Responses to Inconsistency (December 2003).

Furthermore, a good picture is one that can be verified. Nothing wrong with verification, of course, but the picture theory can sometimes lead to a narrow-minded approach to validation and verification. There may also be an assumption of completeness, treating a dataset as if it provided a complete picture of some clearly delineated domain. (The world is determined by the facts, and by their being all the facts.)


However, although there are some serious limitations with the picture theory, it may sometimes be an acceptable simplification, or even an enabling prejudice. One of the dimensions of data strategy is reach - developing a broad data culture across the organization and its ecosystem by making more data and tools available to a wider community of people. And if some form of the picture theory helps people get started on the ladder towards data mastery, that may not be a bad thing after all. (Hopefully they can throw away the ladder after they have climbed up it.)



 

Daniel C. Dennett, A Difference That Makes a Difference: A Conversation (Edge, 22 November 2017) 

Aaron Sloman, What Did Bateson Mean? (originally posted January 2011, revised October 2018)


See also Architecture and Reality (November 2012), From Sedimented Principles to Enabling Prejudices (March 2013), Data Strategy - Reach (December 2019), On the performativity of data (August 2021)

Thursday, November 26, 2020

Assembling your Data Strategy - walk in the way of insight

How many pillars (or components or building blocks) does your data strategy need? I found lots of different answers, from random bloggers to the UK Government.

3

An anonymous blogger who writes under the pen-name Beautiful Data identifies three pillars of data strategy.

  • Data Management - managing data as an asset
  • Data Democratization - putting data into the hands of the business
  • Data Monetization - driving direct and indirect business benefit

4

SnapAnalytics identifies People, Process, Data and Technology as its four pillars, and that's a popular approach for many things.

For a different approach, we have four pillars of data strategy from Aleksander Velkoski, the Director of Data Science at the National Association of Realtors.

  • Data Literacy
  • Data Acquisition and Governance
  • Knowledge Mining
  • Business Implementation

Olga Lagunova, Chief Data Analytics Officer at Pitney Bowes, identifies four pillars that are roughly similar.

  • Business Outcome - knowing what you want to achieve
  • Mature Data Ecosystem - including data sourcing and data governance
  • Data Science - practices and organization
  • Culture that values data-driven decision

In his conversation with her, Anthony Scriffignano, Chief Data Scientist at Dun & Bradstreet, replies that "we have many of those same elements". Perhaps because he is in the business of selling data, Anthony looks at data strategy from two directions, which broadly correspond to Olga's first two pillars.

  • Customer-centric - addressing customer needs, solving ever more complex business problems
  • Data-centric - data supply chain, including sourcing, quality assurance and governance

The UK National Data Strategy also has four pillars.

  • Data Foundations
  • Data Skills
  • Data Availability
  • Responsible Data

5

A white paper from SAS defines five essential components of a data strategy - Identify, Store, Provision, Process and Govern. But a component isn't a pillar. So the editors of Ingenium magazine have turned these into five pillars - Identify, Store, Provision, Integrate and Govern.

(The SAS paper talks a lot about integration, so the Ingenium modification of the SAS list seems fair.)

6

For six pillars, we can turn to Cynozure, a UK-based data and analytics strategy consultancy.

  • Vision and Value
  • People and Culture
  • Operating Model
  • Technology and Architecture
  • Data Governance
  • Roadmap

Cynozure has also published seven building blocks.

  • Data Vision
  • Data Sources
  • Data Governance and Management
  • Data Analysis
  • Data Team
  • Tech Stack
  • Measuring Success

7

At last we get to the magic number seven, thanks to @EvanLevy.

  • The Questions (aka Problems) - the more valuable your question, the more valuable analytics is to the company
  • Technical Implementation - he argues that the most valuable datasets require high levels of customization
  • The Users - access and control (this links to the Data Democratization pillar mentioned above)
  • Data Storage and Structure - including data retention
  • Data Security - risk and compliance
  • Personally Identifiable Information (PII) - privacy
  • Visualization and Analysis Needs - flexibility and timeliness

 

Lawrence of Arabia's autobiography was entitled Seven Pillars of Wisdom, and this is of course a reference to the Bible. 

Wisdom has built her house; she has set up its seven pillars. ...
Leave your simple ways and you will live; walk in the way of insight.

Proverbs 9.1

n

Maybe it doesn't matter how many pillars your data strategy has, as long as it gets you walking in the way of insight. (Whatever that means.)

Obviously not everyone is using the pillar metaphor in the same way - there is presumably some difference between a foundation, a pillar and a building block - but there is a lot of commonality here as well, with a widely shared emphasis on business value and people, as well as a few interesting outliers. 

While most of the sources listed in this blogpost are fairly brief, the UK National Digital Strategy contains a lot of detail. While it deserves credit for the attention devoted to ethics and accountability in the Responsibility pillar, it is not yet clear to me how it addresses some of the other concerns mentioned in this blogpost. I plan to post a more thorough review in a separate blogpost.

 


 

"Beautiful Data", Three Pillars of a Data Strategy (19 Sept ??)

Cynozure, Building A Data Strategy For Business Success (Cynozure, 29 May 2019)

Jason Foster, The Six Pillars of a Data Strategy (Cynozure via YouTube, 19 April 2019)

Ingenium, The 5 Pillars of a Data Strategy (Ingenium Magazine, 24 August 2017)

Evan Levy, 7 Pillars of Data Strategy (HighFive, 1 March 2018)

SAS, The 5 Essential Components of a Data Strategy (SAS 2018)

Anthony Scriffignano and Olga Lagunova, Data Strategy - Key Pillars That Define Success (Dun & Bradstreet via YouTube, 29 March 2018)

UK Government, UK National Data Strategy (Department for Digital, Culture, Media and Sport, 9 September 2020)

Aleksander Velkoski, The Four Pillars of Data and Analytics Strategy (Business Quick, 24 August 2020)

Tuesday, August 04, 2020

Data by Design

If your #datastrategy involves collecting and harvesting more data, then it makes sense to check this requirement at an early stage of a new project or other initiative, rather than adding data collection as an afterthought.

For requirements such as security and privacy, the not-as-afterthought heuristic is well established in the practices of security-by-design and privacy-by-design. I have also spent some time thinking and writing about technology ethics, under the heading of responsibility-by-design. In my October 2018 post on Responsibility by Design, I suggested that all of these could be regarded as instances of a general pattern of X-by-design, outlining What,Why, When, For Whom, Who, How and How Much for a given concern X.

In this post, I want to look at three instances of the X-by-design pattern that could support your data strategy:

  • data collection by design
  • data quality by design
  • data governance by design


Data Collection by Design

Here's a common scenario. Some engineers in your organization have set up a new product or service or system or resource. This is now fully operational, and appears to be working properly. However, the system is not properly instrumented.
Thought should always be given to the self instrumentation of the prime equipment, i.e. design for test from the outset. Kev Judge
In the past, it was common for a system is instrumented during the test phase, but once the tests are completed, data collection is switched off for performance reasons.
If there is concern that the self instrumentation can add unacceptable processing overheads then why not introduce a system of removing the self instrumentation before delivery? Kev Judge
Not just for operational testing and monitoring but also for business intelligence. And for IBM, this is an essential component of digital advantage:
Digitally reinvented electronics organizations pursue new approaches to products, processes and ecosystem participation. They design products with attention toward the types of information they need to collect to design the right customer experiences. IBM
The point here is that a new system or service needs to have data collection designed in from the start, rather than tacked on later.


Data Quality by Design

The next pitfall I want to talk about is when a new system or service is developed, the data migration / integration is done in a big rush towards the end of the project, and then - surprise, surprise - the data quality isn't good enough.

Particularly relevant when data is being repurposed. During the pandemic, there was a suggestion of using BlueTooth connection strength as a proxy for the distance between two phones, and therefore an indicator of the distance between the owners of the phones. Although this data might have been adequate for statistical analysis, it was not good enough to justify putting a person into quarantine.


Data Governance by Design

Finally, there is the question of the sociotechnical organization and processes needed to manage and support the data - not only data quality but all other aspects of data governance.

The pitfall here is to believe you can sort out the IT plumbing first, leaving the necessary governance and controls to be added in later. 




Scott Burnett, Reza Firouzbakht, Cristene Gonzalez-Wertz and Anthony Marshall, Using Data by Design (IBM Institute for Business Value, 2018)

Kev Judge, Self Instrumentation and S.I. (undated, circa 2007)

Wednesday, July 29, 2020

Information Advantage (not necessarily) in Air and Space

Some good down-to-earth points from #ASPC20 @airpowerassn 's Air and Space Power Conference earlier this month. Although the material was aimed at a defence audience, much of the discussion is equally relevant to civilian and commercial organizations interested in information superiority (US) or information advantage (UK).

Professor Dame Angela Mclean, who is the Chief Scientific Advisor to the MOD, defined information advantage thus:

The credible advantage gained through the continuous, decisive and resilient employment of information and information systems. It involves exploiting information of all kinds to improve every aspect of operations: understanding, decision-making, execution, assessment and resilience.

She noted the temptation for the strategy to jump straight to technology (technology push); the correct approach is to set out ambitious, enduring capability outcomes (capability pull), although this may be harder to communicate. Nevertheless, technology push may make sense in those areas where technologies could contribute to multiple outcomes.

She also insisted that it was not enough just to have good information, it was also necessary to use this information effectively, and she called for cultural change to drive improved evidence-based decision-making. (This chimes with what I've been arguing myself, including the need for intelligence to be actioned, not just actionable.)

In his discussion of multi-domain integration, General Sir Patrick Sanders reinforced some of the same points.
  • Superiority in information (is) critical to success
  • We are not able to capitalise on the vast amounts of data our platforms can deliver us, as they are not able to share, swap or integrate data at a speed that generates tempo and advantage
  • (we need) Faster and better decision making, rooted in deeper understanding from all sources and aided by data analytics and supporting technologies

See my previous post on Developing Data Strategy (December 2019) 


Professor Dame Angela Mclean, Orienting Defence Research to anticipate and react to the challenges of a future information-dominated operational environment (Video)

General Sir Patrick Sanders, Cohering Joint Forces to deliver Multi Domain Integration (Air and Space Power Conference, 15 July 2020) (Video, Official Transcript)

For the full programme, see https://www.airpower.org.uk/air-space-power-conference-2020/programme/

Wednesday, July 22, 2020

Encouraging Data Innovation

@BCSDMSG and @DAMAUK ran an online conference last month, entitled Delivering Value Through Data. Videos are now available on YouTube.

The conference opened with a very interesting presentation by Peter Thomas (Prudential Regulation Authority, part of the Bank of England). Some key takeaways:

The Bank of England is a fairly old-fashioned institution. The data programme was as much a cultural shift as a technology shift, and this was reflected by a change in the language – from data management to data innovation.

Challenges: improve the cadence of situation awareness, sense-making and decision-making.

One of Peter's challenges was to wean the business off Excel. The idea was to get data straight into Tableau, bypassing Excel. Peter referred to this as straight-through processing, and said this was the biggest bang for the buck.

Given the nature of his organization, the link between data governance and decision governance is particularly important. Peter described making governance more effective/efficient by reducing the number of separate governance bodies, and outlined a stepwise approach for persuading people in the business to accept data ownership:
  1. You are responsible for your decisions
  2. You are responsible for your interpretation of the data used in your decisions
  3. You are responsible for your requests and requirements for data.
Some decisions need to be taken very quickly, in crisis management mode. (This is a particular characteristic of a regulatory organization, but also relevant to anyone dealing with COVID-19.) Then if they can cut through the procrastination in such situations, this should create a precedent for doing things more quickly in Business-As-Usual mode.

Finally, Peter reported some tension between two camps – those who want data and decision management to be managed according to strict rules, and those who want the freedom to experiment. Enterprise-wide innovation needs to find a way to reconcile these camps.

Plenty more insights in the video, including the Q&A at the end - well worth watching.

Peter Thomas, Encouraging Data Innovation (BCS via YouTube, 15 June 2020)

Friday, March 27, 2020

Data Strategy - More on Agility

Continuing my exploration of the four dimensions of Data Strategy. In this post, I bring together some earlier themes, including Pace Layering and Trimodal.

The first point to emphasize is that there are many elements to your overall data strategy, and these don't all work at the same tempo. Data-driven design methodologies such as Information Engineering (especially the James Martin version) were based on the premise that the data model was more permanent than the process model, but it turns out that this is only true for certain categories of data.

So one of the critical requirements for your data strategy is to manage both the slow-moving stable elements and the fast-moving agile elements. This calls for a layered approach, where each layer has a different rate of change, known as pace-layering.

The concept of pace-layering was introduced by Stewart Brand. In 1994, he wrote a brilliant and controversial book about architecture, How Buildings Change, which among other things contained a theory about evolutionary change in complex systems based on earlier work by the architect Frank Duffy. Although Brand originally referred to the theory as Shearing Layers, by the time of his 1999 book he had switched to calling it Pace Layering. If there is a difference between the two, Shearing Layers is primarily a descriptive theory about how change happens in complex systems, while Pace Layering is primarily an architectural principle for the design of resilient systems-of-systems.

In 2006, I was working as a software industry analyst, specializing in Service-Oriented Architecture (SOA). Microsoft invited me to Las Vegas to participate in a workshop with other industry analysts, where (among other things) I drew the following layered picture.

SPARK Workshop Day 2

Here's how I now draw the same picture for data strategy. It also includes a rough mapping to the Trimodal approach.











Giles Slinger and Rupert Morrison, Will Organization Design Be Affected By Big Data? (J Org Design Vol 3 No 3, 2014)

Wikipedia: Information Engineering, Shearing Layers 

Related Posts: Layering Principles (March 2005), SPARK 2 - Innovation or Trust (March 2006), Enterprise Tempo (October 2010), Beyond Bimodal (May 2016), Data Strategy - Agility (December 2019)

Saturday, December 07, 2019

Developing Data Strategy

The concepts of net-centricity, information superiority and power to the edge emerged out of the US defence community about twenty years ago, thanks to some thought leadership from the Command and Control Research Program (CCRP). One of the routes of these ideas into the civilian world was through a company called Groove Networks, which was acquired by Microsoft in 2005 along with its founder, Ray Ozzie. The Software Engineering Institute (SEI) provided another route. And from the mid 2000s onwards, a few people were researching and writing on edge strategies, including Philip Boxer, John Hagel and myself.

Information superiority is based on the idea that the ability to collect, process, and disseminate an uninterrupted flow of information will give you operational and strategic advantage. The advantage comes not only from the quantity and quality of information at your disposal, but also from processing this information faster than your competitors and/or fast enough for your customers. TIBCO used to call this the Two-Second Advantage.

And by processing, I'm not just talking about moving terabytes around or running up large bills from your cloud provider. I'm talking about enterprise-wide human-in-the-loop organizational intelligence: sense-making (situation awareness, model-building), decision-making (evidence-based policy), rapid feedback (adaptive response and anticipation), organizational learning (knowledge and culture). For example, the OODA loop. That's my vision of a truly data-driven organization.

There are four dimensions of information superiority which need to be addressed in a data strategy: reach, richness, agility and assurance. I have discussed each of these dimensions in a separate post:





Philip Boxer, Asymmetric Leadership: Power to the Edge

Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017) 

John Hagel III and John Seely Brown, The Agile Dance of Architectures – Reframing IT Enabled Business Opportunities (Working Paper 2003)

Vivek Ranadivé and Kevin Maney, The Two-Second Advantage: How We Succeed by Anticipating the Future--Just Enough (Crown Books 2011). Ranadivé was the founder and former CEO of TIBCO.

Richard Veryard, Building Organizational Intelligence (LeanPub 2012)

Richard Veryard, Information Superiority and Customer Centricity (Cutter Business Technology Journal, 9 March 2017) (registration required)

Wikipedia: CCRP, OODA Loop, Power to the Edge

Related posts: Microsoft and Groove (March 2005), Power to the Edge (December 2005), Two-Second Advantage (May 2010), Enterprise OODA (April 2012), Reach Richness Agility and Assurance (August 2017)

Wednesday, December 04, 2019

Data Strategy - Assurance

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.



In previous posts, I looked at Reach (the range of data sources and destinations), Richness (the complexity of data) and Agility (the speed and flexibility of response to new opportunities and changing requirements). Assurance is about Trust.

In 2002, Microsoft launched its Trustworthy Computing Initiative, which covered security, privacy, reliability and business integrity. If we look specifically at data, this mean two things.
  1. Trustworthy data - the data are reliable and accurate.
  2. Trustworthy data management - the processor is a reliable and responsible custodian of the data, especially in regard to privacy and security
Let's start by looking at trustworthy data. To understand why this is important (both in general and specifically to your organization), we can look at the behaviours that emerge in its absence. One very common symptom is the proliferation of local information. If decision-makers and customer-facing staff across the organization don't trust the corporate databases to be complete, up-to-date or sufficiently detailed, they will build private spreadsheets, to give them what they hope will be a closer version of the truth.

This is of course a data assurance nightmare - the data are out of control, and it may be easier for hackers to get the data out than it is for legitimate users. And good luck handling any data subject access request!

But in most organizations, you can't eliminate this behaviour simply by telling people they mustn't. If your data strategy is to address this issue properly, you need to look at the causes of the behaviour, understand what level of reliability and accessibility you have to give people, before they will be willing to rely on your version of the truth rather than theirs.

DalleMule and Davenport have distinguished two types of data strategy, which they call offensive and defensive. Offensive strategies are primarily concerned with exploiting data for competitive advantage, while defensive strategies are primarily concerned with data governance, privacy and security, and regulatory compliance.

As a rough approximation then, assurance can provide a defensive counterbalance to the offensive opportunities offered by reach, richness and agility. But it's never quite as simple as that. A defensive data quality regime might install strict data validation, to prevent incomplete or inconsistent data from reaching the database. In contrast, an offensive data quality regime might install strict labelling, with provenance data and confidence ratings, to allow incomplete records to be properly managed, enriched if possible, and appropriately used. This is the basis for the NetCentric strategy of Post Before Processing.

Because of course there isn't a single view of data quality. If you want to process a single financial transaction, you obviously need to have a complete, correct and confirmed set of bank details. But if you want aggregated information about upcoming financial transactions, you don't want any large transactions to be omitted from the total because of a few missing attributes. And if you are trying to learn something about your customers by running a survey, it's probably not a good idea to limit yourself to those customers who had the patience and loyalty to answer all the questions.

Besides data quality, your data strategy will need to have a convincing story about privacy and security. This may include certification (e.g. ISO 27001) as well as regulation (GDPR etc.) You will need to have proper processes in place for identifying risks, and ensuring that relevant data projects follow privacy-by-design and security-by-design principles. You may also need to look at the commercial and contractual relationships governing data sharing with other organizations.

All of this should add up to establishing trust in your data management - reassuring data subjects, business partners, regulators and other stakeholders that the data are in safe hands. And hopefully this means they will be happy for you to take your offensive data strategy up to the next level.

Next post: Developing Data Strategy



Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017)

Richard Veryard, Microsoft's Trustworthy Computing (CBDI Journal, March 2003)

Wikipedia: Trustworthy Computing

Tuesday, December 03, 2019

Data Strategy - Agility

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.



In previous posts, I looked at Reach, which is about the range of data sources and destinations, and Richness, which is about the complexity of data. Now let me turn to Agility - the speed and flexibility of response to new opportunities and changing requirements.

Not surprisingly, lots of people are talking about data agility, including some who want to persuade you that their products and technologies will help you to achieve it. Here are a few of them.
Data agility is when your data can move at the speed of your business. For companies to achieve true data agility, they need to be able to access the data they need, when and where they need it. Pinckney
Collecting first-party data across the customer lifecycle at speed and scale. Jones
Keep up with an explosion of data. ... For many enterprises, their ability to collect data has surpassed their ability to organize it quickly enough for analysis and action. Scott
How quickly and efficiently you can turn data into accurate insights. Tuchen
But before we look at technological solutions for data agility, we need to understand the requirements. The first thing is to empower, enable and encourage people and teams to operate at a good tempo when working with data and intelligence, with fast feedback and learning loops.

Under a trimodal approach, for example, pioneers are expected to operate at a faster tempo, setting up quick experiments, so they should not be put under the same kind of governance as settlers and town planners. Data scientists often operate in pioneer mode, experimenting with algorithms that might turn out to help the business, but often don't. Obviously that doesn't mean zero governance, but appropriate governance. People need to understand what kinds of risk-taking are accepted or even encouraged, and what should be avoided. In some organizations, this will mean a shift in culture.

Beyond trimodal, there is a push towards self-service ("citizen") data and intelligence. This means encouraging and enabling active participation from people who are not doing this on a full-time basis, and may have lower levels of specialist knowledge and skill.

Besides knowledge and skills, there are other important enablers that people need to work with data. They need to be able to navigate and interpret, and this calls for meaningful metadata, such as data dictionaries and catalogues. They also need proper tools and platforms. Above all, they need an awareness of what is possible, and how it might be useful.

Meanwhile, enabling people to work quickly and effectively with data is not just about giving them relevant information, along with decent tools and training. It's also about removing the obstacles.

Obstacles? What obstacles?

In most large organizations, there is some degree of duplication and fragmentation of data across enterprise systems. There are many reasons why this happens, and the effects may be felt in various areas of the business, degrading the performance and efficiency of various business functions, as well as compromising the quality and consistency of management information. System interoperability may be inadequate, resulting in complicated workflows and error-prone operations.

But perhaps the most important effect is on inhibiting innovation. Any new IT initiative will need either to plug into the available data stores or create new ones. If this is to be done without adding further to technical debt, then the data engineering (including integration and migration) can often be more laborious than building the new functionality the business wants.

Depending on whom you talk to, this challenge can be framed in various ways - data engineering, data integration and integrity, data quality, master data management. The MDM vendors will suggest one approach, the iPaaS vendors will suggest another approach, and so on. Before you get lured along a particular path, it might be as well to understand what your requirements actually are, and how these fit into your overall data strategy.

And of course your data strategy needs to allow for future growth and discovery. It's no good implementing a single source of truth or a universal API to meet your current view of CUSTOMER or PRODUCT, unless this solution is capable of evolving as your data requirements evolve, with ever-increasing reach and richness. As I've often discussed on this blog before, one approach to building in flexibility is to use appropriate architectural patterns, such as loose coupling and layering, which should give you some level of protection against future variation and changing requirements, and such patterns should probably feature somewhere in your data strategy.

Next post - Assurance


Richard Jones, Agility and Data: The Heart of a Digital Experience Strategy (WayIn, 22 November 2018)

Tom Pinckney, What's Data Agility Anyway (Braze Magazine, 25 March 2019)

Jim Scott, Why Data Agility is a Key Driver of Big Data Technology Development (24 March 2015)

Mike Tuchen, Do You Have the Data Agility Your Business Needs? (Talend, 14 June 2017)

Related posts: Enterprise OODA (April 2012), Beyond Trimodal: Citizens and Tourists (November 2019)

Sunday, December 01, 2019

Data Strategy - Richness

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.



In my previous post, I looked at Reach, which is about the range of data sources and destinations. Richness of data addresses the complexity of data - in particular the detailed interconnections that can be determined or inferred across data from different sources.

For example, if a supermarket is tracking your movements around the store, it doesn't only know that you bought lemons and fish and gin, it knows whether you picked up the lemons from the basket next to the fish counter, or from the display of cocktail ingredients. And can therefore guess how you are planning to use the lemons, leading to various forms of personalized insight and engagement.

Richness often means finer-grained data collection, possibly continuous streaming. It also means being able to synchronize data from different sources, possibly in real-time. For example, being able to correlate visits to your website with the screening of TV advertisements, which not only gives you insight and feedback on the effectiveness of your marketing, but also allows you to guess which TV programmes this customer is watching.

Artificial intelligence and machine learning algorithms should help you manage this complexity, picking weak signals from a noisy data environment, as well as extracting meaningful data from unstructured content. From quantity to quality.

In the past, when data storage and processing was more expensive than today, it was a common practice to remove much of the data richness when passing data from the operational systems (which might contain detailed transactions from the past 24 hours) to the analytic systems (which might only contain aggregated information over a much longer period). Not long ago, I talked to a retail organization where only the basket and inventory totals reached the data warehouse. (Hopefully they've now fixed this.) So some organizations are still faced with the challenge of reinstating and preserving detailed operational data, and making it available for analysis and decision support.

Richness also means providing more subtle intelligence, instead of expecting simple answers or trying to apply one-size-fits all insight. So instead of a binary yes/no answer to an important business question, we might get a sense of confidence or uncertainty, and an ability to take provisional action while actively seeking confirming or disconfirming data. (If you can take corrective action quickly, then the overall risk should be reduced.)

Next post: Agility

Data Strategy - Reach

This is one of a series of posts looking at the four key dimensions of Data and Information that must be addressed in a data strategy - reach, richness, agility and assurance.



Data strategy nowadays is dominated by the concept of big data, whatever that means. Every year our notions of bigness are being stretched further. So instead of trying to define big, let me talk about reach.

Firstly, this means reaching into more sources of data. Instead of just collecting data about the immediate transactions, enterprises now expect to have visibility up and down the supply chain, as well as visibility into the world of the customers and end-consumers. Data and information can be obtained from other organizations in your ecosystem, as well as picked up from external sources such as social media. And the technologies for monitoring (telemetrics, internet of things) and surveillance (face recognition, tracking, etc) are getting cheaper, and may be accurate enough for some purposes.

Obviously there are some ethical as well as commercial issues here. I'll come back to these.

Reach also means reaching more destinations. In a data-driven business, data and information need to get to where they can be useful, both inside the organization and across the ecosystem, to drive capabilities and processes, to support sense-making (also known as situation awareness), policy and decision-making, and intelligent action, as well as organizational learning. These are the elements of what I call organizational intelligence. Self-service (citizen) data and intelligence tools, available to casual as well as dedicated users, improve reach; and the tool vendors have their own reasons for encouraging this trend.

In many organizations, there is a cultural divide between the specialists in Head Office and the people at the edge of the organization. If an organization is serious about being customer-centric, it needs to make sure that relevant and up-to-date information and insight reaches those dealing with awkward customers and other immediate business challenges. This is the power-to-the-edge strategy.

Information and insight may also have value outside your organization - for example to your customers and suppliers, or other parties. Organizations may charge for access to this kind of information and insight (direct monetization), may bundle it with other products and services (indirect monetization), or may distribute it freely for the sake of wider ecosystem benefits.

And obviously there will be some data and intelligence that must not be shared, for security or other reasons. Many organizations will adopt a defensive data strategy, protecting all information unless there is a strong reason for sharing; others may adopt a more offensive data strategy, seeking competitive advantage from sharing and monetization except for those items that have been specifically classified as private or confidential.

How are your suppliers and partners thinking about these issues? To what extent are they motivated or obliged to share data with you, or to protect the data that you share with them? I've seen examples where organizations lack visibility of their own assets, because they have outsourced the maintenance of these assets to an external company, and the external company fails to provide sufficiently detailed or accurate information. (When implementing your data strategy, make sure your contractual agreements cover your information sharing requirements.)

Data protection introduces further requirements. Under GDPR, data controllers are supposed to inform data subjects how far their personal data will reach, although many of the privacy notices I've seen have been so vague and generic that they don't significantly constrain the data controller's ability to share personal data. Meanwhile, GDPR Article 28 specifies some of the aspects of data sharing that should be covered in contractual agreements between data controllers and data processors. But compliance with GDPR or other regulations doesn't fully address ethical concerns about the collection, sharing and use of personal data. So an ethical data strategy should be based on what the organization thinks is fair to data subjects, not merely what it can get away with.

There are various specific issues that may motivate an organization to improve the reach of data as part of its data strategy. For example:
  • Critical data belongs to third parties
  • Critical business decisions lacking robust data
  • I know the data is in there, but I can't get it out.
  • Lack of transparency – I can see the result, but I don’t know how it has been calculated.
  • Analytic insight narrowly controlled by a small group of experts – not easily available to general management
  • Data and/or insight would be worth a lot to our customers, if only we had a way of getting it to them.
In summary, your data strategy needs to explain how you are going to get data and intelligence
  • From a wide range of sources
  • Into a full range of business processes at all touchpoints
  • Delivered to the edge – where your organization engages with your customers


Next post Richness

Related posts

Power to the Edge (December 2005)
Reach, Richness, Agility and Assurance (August 2017)
Setting off towards the data-driven business (August 2019)
Beyond Trimodal - Citizens and Tourists (November 2019)

Thursday, August 22, 2019

Setting off Towards the Data-Driven Business

In an earlier post Towards the Data-Driven Business, I talked about the various roles that data and intelligence can play in the business. But where do you start? In this post, I shall talk about the approach that I have developed and used in a number of large organizations.


To build a roadmap that takes you into the future from where you are today, you need three things.


Firstly an understanding of the present. This includes producing AS-IS models of your current (legacy) systems, what data have you got and how are you currently managing and using it. We need to know about the perceived pain points, not because we only want to fix the symptoms, but because these will help us build a consensus for change. Typically we find a fair amount of duplicated and inconsistent data, crappy or non-existent interfaces, slow process loops and data bottlenecks, and general inflexibility.

This is always complicated by the fact that there are already numerous projects underway to fix some of the problems, or to build additional functionality, so we need to understand how these projects are expected to alter the landscape, and in what timescale. It sometimes becomes apparent that these projects are not ideally planned and coordinated from a data management perspective. If we find overlapping or fragmented responsibility in some critical data areas, we may need to engage with programme management and governance to support greater consistency and synergy.


Secondly a vision of the future opportunities for data and intelligence (and automation based on these). In general terms, these are outlined in my earlier post. To develop a vision for a specific organization, we need to look at their business model - what value do they provide to customers and other stakeholders, how is this value delivered (as business services or otherwise), and how do the capabilities and processes of the organization and its partners support this.

For example, I worked with an organization that had done a fair amount of work on modelling their internal processes and procedures, but lacked the outside-in view. So I developed a business service architecture that showed how the events and processes in their customers' world triggered calls on their services, and what this implied for delivering a seamless experience to their customers.

Using a capability-based planning approach, we can then look at how data, intelligence and automation could improve not only individual business services, processes and underlying capabilities, but also the coordination and feedback loops between these. For example in a retail environment, there are typically processes and capabilities associated with both Buying and Selling, and you may be able to use data and intelligence to make each of them more efficient and effective. But more importantly, you can improve the alignment between Buying and Selling.

(In some styles of business capability model, coordination is shown explicitly as a capability in its own right, but this is not a common approach.)

The business model also identifies which areas are strategically important to the business. At one organization, when we mapped the IT costs against the business model, we found that a disproportionate amount of effort was being devoted to non-strategic stuff, and surprisingly little effort for the customer-facing (therefore more strategically important) activities. (A colour-coded diagram can be very useful in presenting such issues to senior management.)

Most importantly, we find that a lot of stakeholders (especially within IT) have a fairly limited vision about what is possible, often focused on the data they already have rather than the data they could or should have. The double-diamond approach to design thinking works here, to combine creative scenario planning with highly focused practical action. I've often found senior business people much more receptive to these kind of discussions than the IT folk.

We should then be able to produce a reasonably future-proof and technology independent TO-BE data and information architecture, which provides a loosely-coupled blueprint for data collection, processing and management.


Thirdly, how to get from A to B. In a large organization, this is going to take several years. A complete roadmap cannot just be a data strategy, but will usually involve some elements of business process and organizational change, as well as application, integration and technology strategy. It may also involve outside stakeholders - for example, providing direct access to suppliers and business partners via portals and APIs, and sharing data and intelligence with them, while obtaining consent from data subjects and addressing any other privacy, security and compliance issues. There are always dependencies between different streams of activity within the programme as well as with other initiatives, and these dependencies need to be identified and managed, even if we can avoid everything being tightly coupled together.

Following the roadmap will typically contain a mix of different kinds of project. There may need to be some experimental ("pioneer") projects as well as larger development and infrastructure ("settler", "town planner") projects.

To gain consensus and support, you need a business case. Although different organizations may have different ways of presenting and evaluating the business case, and some individuals and organizations are more risk-averse than others, a business case will always involve an argument that the benefits (financial and possibly non-financial) outweigh the costs and risks.

Generally, people like to see some short-term benefits ("quick wins" or the dreaded "low-hanging fruit") as well as longer-term benefits. A well-balanced roadmap spreads the benefits across the phases - if you manage to achieve 80% of the benefits in phase 1, then your roadmap probably wasn't ambitious enough, so don't be surprised if nobody wants to fund phase 2. 


Finally, you have to implement your roadmap. This means getting the funding and resources, kicking off multiple projects as well as connecting with relevant projects already underway, managing and coordinating the programme. It also means being open to feedback and learning, responding to new emerging challenges (such as regulation and competition), maintaining communication with stakeholders, and keeping the vision and roadmap alive and up-to-date.



Related posts

See also

Saturday, August 03, 2019

Towards the Data-Driven Business

If we want to build a data-driven business, we need to appreciate the various roles that data and intelligence can play in the business - whether improving a single business service, capability or process, or improving the business as a whole. The examples in this post are mainly from retail, but a similar approach can easily be applied to other sectors.


Sense-Making and Decision Support

The traditional role of analytics and business intelligence is helping the business interpret and respond to what is going on.

Once upon a time, business intelligence always operated with some delay. Data had to be loaded from the operational systems into the data warehouse before they could be processed and analysed. I remember working with systems that generated management information based on yesterday's data, or even last month's data. Of course, such systems don't exist any more (!?), because people expect real-time insight, based on streamed data.

Management information systems are supposed to support individual and collective decision-making. People often talk about actionable intelligence, but of course it doesn't create any value for the business until it is actioned. Creating a fancy report or dashboard isn't the real goal, it's just a means to an end.

Analytics can also be used to calculate complicated chains of effects on a what-if basis. For example, if we change the price of this product by this much, what effect is this predicted to have on the demand for other products, what are the possible responses from our competitors, how does the overall change in customer spending affect supply chain logistics, do we need to rearrange the shelf displays, and so on. How sensitive is Y to changes in X, and what is the optimal level of Z?

Analytics can also be used to support large-scale optimization - for example, solving complicated scheduling problems.

 
Automated Action

Increasingly, we are looking at the direct actioning of intelligence, possibly in real-time. The intelligence drives automated decisions within operational business processes, often without a human-in-the-loop, where human supervision and control may be remote or retrospective. A good example of this is dynamic retail pricing, where an algorithm adjusts the prices of goods and services according to some model of supply and demand. In some cases, optimized plans and schedules can be implemented without a human in the loop.

So the data doesn't just flow from the operational systems into the data warehouse, but there is a control flow back into the operational systems. We can call this closed loop intelligence.

(If it takes too much time to process the data and generate the action, the action may no longer be appropriate. A few years ago, one of my clients wanted to use transaction data from the data warehouse to generate emails to customers - but with their existing architecture there would have been a 48 hour delay from the transaction to the email, so we needed to find a way to bypass this.)


Managing Complexity

If you have millions of customers buying hundreds of thousands of products, you need ways of aggregating the data in order to manage the business effectively. Customers can be grouped into segments, products can be grouped into categories, and many organizations use these groupings as a basis for dividing responsibilities between individuals and teams. However, these groupings are typically inflexible and sometimes seem perverse.

For example, in a large supermarket, after failing to find maple syrup next to the honey as I expected, I was told I should find it next to the custard. There may well be a logical reason for this grouping, but this logic was not apparent to me as a customer.

But the fact that maple syrup is in the same product category as custard doesn't just affect the shelf layout, it may also mean that it is automatically included in decisions affecting the custard category and excluded from decisions affecting the honey category. For example, pricing and promotion decisions.

A data-driven business is able to group things dynamically, based on affinity or association, and then allows simple and powerful decisions to be made for this dynamic group, at the right level of aggregation.

Automation can then be used to cascade the action to all affected products, making the necessary price, logistical and other adjustments for each product. This means that a broad plan can be quickly and consistently implemented across thousands of products.


Experimentation and Learning

In a data-driven business, every activity is designed for learning as well as doing. Feedback is used in the cybernetic sense - collecting and interpreting data to control and refine business rules and algorithms.

In a dynamic world, it is necessary to experiment constantly. A supermarket or online business is a permanent laboratory for testing the behaviour of its customers. For example, A/B testing where alternatives are presented to different customers on different occasions to test which one gets the best response. As I mentioned in an earlier post, Netflix declares themselves "addicted" to the methodology of A/B testing.

In a simple controlled experiment, you change one variable and leave everything else the same. But in a complex business world, everything is changing. So you need advanced statistics and machine learning, not only to interpret the data, but also to design experiments that will produce useful data.


Managing Organization

A traditional command-and-control organization likes to keep the intelligence and insight in the head office, close to top management. An intelligent organization on the other hand likes to mobilize the intelligence and insight of all its people, and encourage (some) local flexibility (while maintaining global consistency). With advanced data and intelligence tools, power can be driven to the edge of the organization, allowing for different models of delegation and collaboration. For example, retail management may feel able to give greater autonomy to store managers, but only if the systems provide faster feedback and more effective support. 


Transparency

Related to the previous point, data and intelligence can provide clarity and governance to the business, and to a range of other stakeholders. This has ethical as well as regulatory implications.

Among other things, transparent data and intelligence reveal their provenance and derivation. (This isn't the same thing as explanation, but it probably helps.)




Obviously most organizations already have many of the pieces of this, but there are typically major challenges with legacy systems and data - especially master data management. Moving onto the cloud, and adopting advanced integration and robotic automation tools may help with some of these challenges, but it is clearly not the whole story.

Some organizations may be lopsided or disconnected in their use of data and intelligence. They may have very sophisticated analytic systems in some areas, while other areas are comparatively neglected. There can be a tendency to over-value the data and insight you've already got, instead of thinking about the data and insight that you ought to have.

Making an organization more data-driven doesn't always entail a large transformation programme, but it does require a clarity of vision and pragmatic joined-up thinking.


Related posts: Rhyme or Reason: The Logic of Netflix (June 2017), Setting off towards the Data-Driven Business (August 2019)


Updated 13 September 2019

Wednesday, August 16, 2017

Digital Disruption, Delivery and Differentiation in Fast Food

What are the differentiating forces in the fast food sector? Stuart Lauchlan hears some contrasting opinions from a couple of industry leaders.

In the short term, those fast food outlets that offer digital experience and delivery may get some degree of competitive advantage by reaching more customers, with greater convenience. Denny Marie Post, CEO at Red Robin Gourmet Burgers, sees the expansion of third-party delivery services as a strategic priority. So from agility to reach.

But Lenny Comma, CEO of Jack in the Box, argues that this advantage will be short-lived. Longer-term competitive advantage will depend on the quality of the brand. So from assurance to richness.




Stuart Lauchlan, Digital and delivery – which ‘D’ matters most to the fast food industry? Two contrasting views (Diginomica, 16 August 2017)

Related post: Reach, Richness, Agility and Assurance (Aug 2017)


Thursday, March 09, 2017

Information Superiority and Customer Centricity

The business value of consumer analytics and big data is not just about what you can discover or infer about the consumer, but how you can use this insight promptly and effectively across multiple touchpoints (including e-commerce systems and CRM) to create a powerful and truly personalized consumer experience.

In a new article for the Cutter Business Technology Journal, I explore how the concept of information superiority interacts with the concept of customer centricity, and look at three modes of information superiority: conventional, adaptive, and collaborative.


Richard Veryard, Information Superiority and Customer Centricity (Cutter Business Technology Journal, March 2017)


Monday, February 25, 2013

Who owns data management strategy?

@joel_schectman exposes an apparent divergence of opinion among #Gartner analysts - whether CEO or CIO should be in charge of data management strategy.


@ted_friedman says that taking out IT as the gatekeeper of centrally stored data can promote “better fact based decision making across the organization”.

@merv adrian says that bypassing the CIO can have unintended side effects like risks to privacy and the quality of the analysis.


Merv explains further “If you don’t have to go through a procurement process and IT, you’re a lot freer to do what you want,” said Mr. Adrian. “But all of that carefully constructed governance is completely undermined, you can be drawing incorrect conclusions, and exposing risks to privacy because they are doing things IT hasn’t vetted.”

Merv's concern about quality also applies to the widespread and often uncontrolled use of spreadsheets and other end-user tools. For example, we can find @JamesYKwak and @alexhern discussing whether we can blame Microsoft Excel for $9bn losses at JPMorgan?

What exactly do we mean by data management strategy? Joel says it includes how to best utilize customer information to leverage growth. Most CIOs seem to think their responsibility for data finishes when they deliver data and information to the user's device. They seem uninterested in how these users actually use the data, and whether better or faster data genuinely improve decisions and policies, and produce better business outcomes.

In other words, the CIO doesn't operate as a Chief Information Officer but as a Chief Information Systems and Technology Officer.

True information strategy includes a closed feedback and learning loop, so that the use of the information can be monitored. Are these expensively collected and elaborately processed data analytics actually influencing decisions, or are the users mostly ignoring them?



Alex Hern, Is Excel the most dangerous piece of software in the world? (New Statesman Feb 2013)

James Kwak, The Importance of Excel (Baseline Scenario Feb 2013)

Joel Schectman, Democratizing Data Analysis Has Risk (WSJ Feb 2013)


Updated 20 February 2016