Wednesday, January 01, 2020

Assemblage and Plottage

John Reilly of @RealTown explains the terms Assemblage and Plottage.
Assemblage is the process of joining several parcels to form a larger parcel; the resulting increase in value is called plottage.

If we apply these definitions to real estate, which appears to be Mr Reilly's primary domain of expertise, the term parcel refers to parcels of land or other property. He explains why combining parcels increases the total value.

However, Mr Reilly posted these definitions in a blog entitled The Data Advocate, in which he and his colleagues promote the use of data in the real estate business. So we might reasonably use the same terms in the data domain as well. Joining several parcels of data to form a larger parcel (assemblage) is widely recognized as a way of increasing the total value of the data.

While calculation of plottage in the real estate business can be grounded in observations of exchange value or use value, calculation of plottage in the data domain may be rather more difficult. Among other things, we may note that there is much greater diversity in the range of potential uses for a large parcel of data than for a large parcel of land, and that a large parcel of data can often be used for multiple purposes similtaneously.

Nevertheless, even in the absence of accurate monetary estimates of data plottage, the concept of data plottage could be useful for data strategy and management. We should at least be able to argue that some course of action generates greater levels of plottage than some other course of action.



By the way, although the idea that the whole is greater than the sum of its parts is commonly attributed to Aristotle, @sentantiq argues that this attribution is incorrect.



John Reilly, Assemblage vs Plottage (The Data Advocate, 10 July 2014)

Sententiae Antiquae, No, Aristotle Didn’t Write A Whole is Greater Than the Sum of Its Parts (6 July 2018)

Tuesday, December 10, 2019

Is there a Single Version of Truth about Stents?

Clinical trials are supposed to generate reliable data to support healthcare decisions and policies at several levels. Regulators use the data to control the marketing and use of medicines and healthcare products. Clinical practice guidelines are produced by healthcare organizations (from the WHO downwards) as well as professional bodies. Clinicians apply and interpret these guidelines for individual patients, as well as prescribing medicines, products and procedures, both on-label and off-label.

Given the importance of these decisions and policies for patients, there are some critical issues concerning the quality of clinical trial data, and the ability of clinicians, researchers, regulators and others to make sense of these data. Obviously there are significant commercial interests involved, and some players may be motivated to be selective about the publication of trial data. Hence the AllTrials campaign for clinical trial transparency.

But there is a more subtle issue, to do with the way the data are collected, coded and reported. The BBC has recently uncovered an example that is both fascinating and troubling. It concerns a clinical trial comparing the use of stents with heart bypass surgery. The trial was carried out in 2016, funded by a major manufacturer of stents, and published in a prestigious medical journal. According to the article, the two alternatives were equally effective in protecting against future heart attacks.

But this is where the controversy begins. Researchers disagree about the best way of measuring heart attacks, and the authors of the article used a particular definition. Other researchers prefer the so-called Universal Definition, or more precisely the Fourth Universal Definition (there having been three previous attempts). Some experts believe that if you use the Universal Definition instead of the definition used in the article, the results are much more one-sided: stents may be the right solution for many patients, but are not always as good as surgery.

Different professional bodies interpret matters differently. The European Association for Cardio-thoracic Surgery (EACTS) told the BBC that this raised serious concerns about the current guidelines based on the 2016 trial, while the European Society of Cardiology stands by these guidelines. The BBC also notes the potential conflicts of interests of researchers, many of whom had declared financial relationships with stent manufacturers.

I want to draw a more general lesson from this story, which is about the much-vaunted Single Version of Truth (SVOT). By limiting the clinical trial data to a single definition of heart attack, some of the richness and complexity of the data are lost or obscured. For some purposes at least, it would seem appropriate to make multiple versions of the truth available, so that they can be properly analysed and interpreted. SVOT not always a good thing, then.

See my previous blogposts on Single Source of Truth.



Deborah Cohen and Ed Brown, Surgeons withdraw support for heart disease advice (BBC Newsnight, 9 December 2019) See also https://www.youtube.com/watch?v=_vGfJKMbpp8

Debabrata Mukherjee, Fourth Universal Definition of Myocardial Infarction (American College of Cardiology, 25 Aug 2018)

See also Off-Label (March 2005), Is there a Single Version of Truth about Statins? (April 2019), Ethics of Transparency and Concealment (October 2019)

Saturday, December 07, 2019

Developing Data Strategy

The concepts of net-centricity, information superiority and power to the edge emerged out of the US defence community about twenty years ago, thanks to some thought leadership from the Command and Control Research Program (CCRP). One of the routes of these ideas into the civilian world was through a company called Groove Networks, which was acquired by Microsoft in 2005 along with its founder, Ray Ozzie. The Software Engineering Institute (SEI) provided another route. And from the mid 2000s onwards, a few people were researching and writing on edge strategies, including Philip Boxer, John Hagel and myself.

Information superiority is based on the idea that the ability to collect, process, and disseminate an uninterrupted flow of information will give you operational and strategic advantage. The advantage comes not only from the quantity and quality of information at your disposal, but also from processing this information faster than your competitors and/or fast enough for your customers. TIBCO used to call this the Two-Second Advantage.

And by processing, I'm not just talking about moving terabytes around or running up large bills from your cloud provider. I'm talking about enterprise-wide human-in-the-loop organizational intelligence: sense-making (situation awareness, model-building), decision-making (evidence-based policy), rapid feedback (adaptive response and anticipation), organizational learning (knowledge and culture). For example, the OODA loop. That's my vision of a truly data-driven organization.

There are four dimensions of information superiority which need to be addressed in a data strategy: reach, richness, agility and assurance. I have discussed each of these dimensions in a separate post:





Philip Boxer, Asymmetric Leadership: Power to the Edge

Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017) 

John Hagel III and John Seely Brown, The Agile Dance of Architectures – Reframing IT Enabled Business Opportunities (Working Paper 2003)

Vivek Ranadivé and Kevin Maney, The Two-Second Advantage: How We Succeed by Anticipating the Future--Just Enough (Crown Books 2011). Ranadivé was the founder and former CEO of TIBCO.

Richard Veryard, Building Organizational Intelligence (LeanPub 2012)

Richard Veryard, Information Superiority and Customer Centricity (Cutter Business Technology Journal, 9 March 2017) (registration required)

Wikipedia: CCRP, OODA Loop, Power to the Edge

Related posts: Microsoft and Groove (March 2005), Power to the Edge (December 2005), Two-Second Advantage (May 2010), Enterprise OODA (April 2012), Reach Richness Agility and Assurance (August 2017)

Wednesday, December 04, 2019

Data Strategy - Assurance

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.



In previous posts, I looked at Reach (the range of data sources and destinations), Richness (the complexity of data) and Agility (the speed and flexibility of response to new opportunities and changing requirements). Assurance is about Trust.

In 2002, Microsoft launched its Trustworthy Computing Initiative, which covered security, privacy, reliability and business integrity. If we look specifically at data, this mean two things.
  1. Trustworthy data - the data are reliable and accurate.
  2. Trustworthy data management - the processor is a reliable and responsible custodian of the data, especially in regard to privacy and security
Let's start by looking at trustworthy data. To understand why this is important (both in general and specifically to your organization), we can look at the behaviours that emerge in its absence. One very common symptom is the proliferation of local information. If decision-makers and customer-facing staff across the organization don't trust the corporate databases to be complete, up-to-date or sufficiently detailed, they will build private spreadsheets, to give them what they hope will be a closer version of the truth.

This is of course a data assurance nightmare - the data are out of control, and it may be easier for hackers to get the data out than it is for legitimate users. And good luck handling any data subject access request!

But in most organizations, you can't eliminate this behaviour simply by telling people they mustn't. If your data strategy is to address this issue properly, you need to look at the causes of the behaviour, understand what level of reliability and accessibility you have to give people, before they will be willing to rely on your version of the truth rather than theirs.

DalleMule and Davenport have distinguished two types of data strategy, which they call offensive and defensive. Offensive strategies are primarily concerned with exploiting data for competitive advantage, while defensive strategies are primarily concerned with data governance, privacy and security, and regulatory compliance.

As a rough approximation then, assurance can provide a defensive counterbalance to the offensive opportunities offered by reach, richness and agility. But it's never quite as simple as that. A defensive data quality regime might install strict data validation, to prevent incomplete or inconsistent data from reaching the database. In contrast, an offensive data quality regime might install strict labelling, with provenance data and confidence ratings, to allow incomplete records to be properly managed, enriched if possible, and appropriately used. This is the basis for the NetCentric strategy of Post Before Processing.

Because of course there isn't a single view of data quality. If you want to process a single financial transaction, you obviously need to have a complete, correct and confirmed set of bank details. But if you want aggregated information about upcoming financial transactions, you don't want any large transactions to be omitted from the total because of a few missing attributes. And if you are trying to learn something about your customers by running a survey, it's probably not a good idea to limit yourself to those customers who had the patience and loyalty to answer all the questions.

Besides data quality, your data strategy will need to have a convincing story about privacy and security. This may include certification (e.g. ISO 27001) as well as regulation (GDPR etc.) You will need to have proper processes in place for identifying risks, and ensuring that relevant data projects follow privacy-by-design and security-by-design principles. You may also need to look at the commercial and contractual relationships governing data sharing with other organizations.

All of this should add up to establishing trust in your data management - reassuring data subjects, business partners, regulators and other stakeholders that the data are in safe hands. And hopefully this means they will be happy for you to take your offensive data strategy up to the next level.

Next post: Developing Data Strategy



Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017)

Richard Veryard, Microsoft's Trustworthy Computing (CBDI Journal, March 2003)

Wikipedia: Trustworthy Computing

Tuesday, December 03, 2019

Data Strategy - Agility

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.



In previous posts, I looked at Reach, which is about the range of data sources and destinations, and Richness, which is about the complexity of data. Now let me turn to Agility - the speed and flexibility of response to new opportunities and changing requirements.

Not surprisingly, lots of people are talking about data agility, including some who want to persuade you that their products and technologies will help you to achieve it. Here are a few of them.
Data agility is when your data can move at the speed of your business. For companies to achieve true data agility, they need to be able to access the data they need, when and where they need it. Pinckney
Collecting first-party data across the customer lifecycle at speed and scale. Jones
Keep up with an explosion of data. ... For many enterprises, their ability to collect data has surpassed their ability to organize it quickly enough for analysis and action. Scott
How quickly and efficiently you can turn data into accurate insights. Tuchen
But before we look at technological solutions for data agility, we need to understand the requirements. The first thing is to empower, enable and encourage people and teams to operate at a good tempo when working with data and intelligence, with fast feedback and learning loops.

Under a trimodal approach, for example, pioneers are expected to operate at a faster tempo, setting up quick experiments, so they should not be put under the same kind of governance as settlers and town planners. Data scientists often operate in pioneer mode, experimenting with algorithms that might turn out to help the business, but often don't. Obviously that doesn't mean zero governance, but appropriate governance. People need to understand what kinds of risk-taking are accepted or even encouraged, and what should be avoided. In some organizations, this will mean a shift in culture.

Beyond trimodal, there is a push towards self-service ("citizen") data and intelligence. This means encouraging and enabling active participation from people who are not doing this on a full-time basis, and may have lower levels of specialist knowledge and skill.

Besides knowledge and skills, there are other important enablers that people need to work with data. They need to be able to navigate and interpret, and this calls for meaningful metadata, such as data dictionaries and catalogues. They also need proper tools and platforms. Above all, they need an awareness of what is possible, and how it might be useful.

Meanwhile, enabling people to work quickly and effectively with data is not just about giving them relevant information, along with decent tools and training. It's also about removing the obstacles.

Obstacles? What obstacles?

In most large organizations, there is some degree of duplication and fragmentation of data across enterprise systems. There are many reasons why this happens, and the effects may be felt in various areas of the business, degrading the performance and efficiency of various business functions, as well as compromising the quality and consistency of management information. System interoperability may be inadequate, resulting in complicated workflows and error-prone operations.

But perhaps the most important effect is on inhibiting innovation. Any new IT initiative will need either to plug into the available data stores or create new ones. If this is to be done without adding further to technical debt, then the data engineering (including integration and migration) can often be more laborious than building the new functionality the business wants.

Depending on whom you talk to, this challenge can be framed in various ways - data engineering, data integration and integrity, data quality, master data management. The MDM vendors will suggest one approach, the iPaaS vendors will suggest another approach, and so on. Before you get lured along a particular path, it might be as well to understand what your requirements actually are, and how these fit into your overall data strategy.

And of course your data strategy needs to allow for future growth and discovery. It's no good implementing a single source of truth or a universal API to meet your current view of CUSTOMER or PRODUCT, unless this solution is capable of evolving as your data requirements evolve, with ever-increasing reach and richness. As I've often discussed on this blog before, one approach to building in flexibility is to use appropriate architectural patterns, such as loose coupling and layering, which should give you some level of protection against future variation and changing requirements, and such patterns should probably feature somewhere in your data strategy.

Next post - Assurance


Richard Jones, Agility and Data: The Heart of a Digital Experience Strategy (WayIn, 22 November 2018)

Tom Pinckney, What's Data Agility Anyway (Braze Magazine, 25 March 2019)

Jim Scott, Why Data Agility is a Key Driver of Big Data Technology Development (24 March 2015)

Mike Tuchen, Do You Have the Data Agility Your Business Needs? (Talend, 14 June 2017)

Related posts: Enterprise OODA (April 2012), Beyond Trimodal: Citizens and Tourists (November 2019)

Sunday, December 01, 2019

Data Strategy - Richness

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.



In my previous post, I looked at Reach, which is about the range of data sources and destinations. Richness of data addresses the complexity of data - in particular the detailed interconnections that can be determined or inferred across data from different sources.

For example, if a supermarket is tracking your movements around the store, it doesn't only know that you bought lemons and fish and gin, it knows whether you picked up the lemons from the basket next to the fish counter, or from the display of cocktail ingredients. And can therefore guess how you are planning to use the lemons, leading to various forms of personalized insight and engagement.

Richness often means finer-grained data collection, possibly continuous streaming. It also means being able to synchronize data from different sources, possibly in real-time. For example, being able to correlate visits to your website with the screening of TV advertisements, which not only gives you insight and feedback on the effectiveness of your marketing, but also allows you to guess which TV programmes this customer is watching.

Artificial intelligence and machine learning algorithms should help you manage this complexity, picking weak signals from a noisy data environment, as well as extracting meaningful data from unstructured content. From quantity to quality.

In the past, when data storage and processing was more expensive than today, it was a common practice to remove much of the data richness when passing data from the operational systems (which might contain detailed transactions from the past 24 hours) to the analytic systems (which might only contain aggregated information over a much longer period). Not long ago, I talked to a retail organization where only the basket and inventory totals reached the data warehouse. (Hopefully they've now fixed this.) So some organizations are still faced with the challenge of reinstating and preserving detailed operational data, and making it available for analysis and decision support.

Richness also means providing more subtle intelligence, instead of expecting simple answers or trying to apply one-size-fits all insight. So instead of a binary yes/no answer to an important business question, we might get a sense of confidence or uncertainty, and an ability to take provisional action while actively seeking confirming or disconfirming data. (If you can take corrective action quickly, then the overall risk should be reduced.)

Next post: Agility

Data Strategy - Reach

This is one of a series of posts looking at the four key dimensions of Data and Information that must be addressed in a data strategy - reach, richness, agility and assurance.



Data strategy nowadays is dominated by the concept of big data, whatever that means. Every year our notions of bigness are being stretched further. So instead of trying to define big, let me talk about reach.

Firstly, this means reaching into more sources of data. Instead of just collecting data about the immediate transactions, enterprises now expect to have visibility up and down the supply chain, as well as visibility into the world of the customers and end-consumers. Data and information can be obtained from other organizations in your ecosystem, as well as picked up from external sources such as social media. And the technologies for monitoring (telemetrics, internet of things) and surveillance (face recognition, tracking, etc) are getting cheaper, and may be accurate enough for some purposes.

Obviously there are some ethical as well as commercial issues here. I'll come back to these.

Reach also means reaching more destinations. In a data-driven business, data and information need to get to where they can be useful, both inside the organization and across the ecosystem, to drive capabilities and processes, to support sense-making (also known as situation awareness), policy and decision-making, and intelligent action, as well as organizational learning. These are the elements of what I call organizational intelligence. Self-service (citizen) data and intelligence tools, available to casual as well as dedicated users, improve reach; and the tool vendors have their own reasons for encouraging this trend.

In many organizations, there is a cultural divide between the specialists in Head Office and the people at the edge of the organization. If an organization is serious about being customer-centric, it needs to make sure that relevant and up-to-date information and insight reaches those dealing with awkward customers and other immediate business challenges. This is the power-to-the-edge strategy.

Information and insight may also have value outside your organization - for example to your customers and suppliers, or other parties. Organizations may charge for access to this kind of information and insight (direct monetization), may bundle it with other products and services (indirect monetization), or may distribute it freely for the sake of wider ecosystem benefits.

And obviously there will be some data and intelligence that must not be shared, for security or other reasons. Many organizations will adopt a defensive data strategy, protecting all information unless there is a strong reason for sharing; others may adopt a more offensive data strategy, seeking competitive advantage from sharing and monetization except for those items that have been specifically classified as private or confidential.

How are your suppliers and partners thinking about these issues? To what extent are they motivated or obliged to share data with you, or to protect the data that you share with them? I've seen examples where organizations lack visibility of their own assets, because they have outsourced the maintenance of these assets to an external company, and the external company fails to provide sufficiently detailed or accurate information. (When implementing your data strategy, make sure your contractual agreements cover your information sharing requirements.)

Data protection introduces further requirements. Under GDPR, data controllers are supposed to inform data subjects how far their personal data will reach, although many of the privacy notices I've seen have been so vague and generic that they don't significantly constrain the data controller's ability to share personal data. Meanwhile, GDPR Article 28 specifies some of the aspects of data sharing that should be covered in contractual agreements between data controllers and data processors. But compliance with GDPR or other regulations doesn't fully address ethical concerns about the collection, sharing and use of personal data. So an ethical data strategy should be based on what the organization thinks is fair to data subjects, not merely what it can get away with.

There are various specific issues that may motivate an organization to improve the reach of data as part of its data strategy. For example:
  • Critical data belongs to third parties
  • Critical business decisions lacking robust data
  • I know the data is in there, but I can't get it out.
  • Lack of transparency – I can see the result, but I don’t know how it has been calculated.
  • Analytic insight narrowly controlled by a small group of experts – not easily available to general management
  • Data and/or insight would be worth a lot to our customers, if only we had a way of getting it to them.
In summary, your data strategy needs to explain how you are going to get data and intelligence
  • From a wide range of sources
  • Into a full range of business processes at all touchpoints
  • Delivered to the edge – where your organization engages with your customers


Next post Richness

Related posts

Power to the Edge (December 2005)
Reach, Richness, Agility and Assurance (August 2017)
Setting off towards the data-driven business (August 2019)
Beyond Trimodal - Citizens and Tourists (November 2019)