Tuesday, December 10, 2019

Is there a Single Version of Truth about Stents?

Clinical trials are supposed to generate reliable data to support healthcare decisions and policies at several levels. Regulators use the data to control the marketing and use of medicines and healthcare products. Clinical practice guidelines are produced by healthcare organizations (from the WHO downwards) as well as professional bodies. Clinicians apply and interpret these guidelines for individual patients, as well as prescribing medicines, products and procedures, both on-label and off-label.

Given the importance of these decisions and policies for patients, there are some critical issues concerning the quality of clinical trial data, and the ability of clinicians, researchers, regulators and others to make sense of these data. Obviously there are significant commercial interests involved, and some players may be motivated to be selective about the publication of trial data. Hence the AllTrials campaign for clinical trial transparency.

But there is a more subtle issue, to do with the way the data are collected, coded and reported. The BBC has recently uncovered an example that is both fascinating and troubling. It concerns a clinical trial comparing the use of stents with heart bypass surgery. The trial was carried out in 2016, funded by a major manufacturer of stents, and published in a prestigious medical journal. According to the article, the two alternatives were equally effective in protecting against future heart attacks.

But this is where the controversy begins. Researchers disagree about the best way of measuring heart attacks, and the authors of the article used a particular definition. Other researchers prefer the so-called Universal Definition, or more precisely the Fourth Universal Definition (there having been three previous attempts). Some experts believe that if you use the Universal Definition instead of the definition used in the article, the results are much more one-sided: stents may be the right solution for many patients, but are not always as good as surgery.

Different professional bodies interpret matters differently. The European Association for Cardio-thoracic Surgery (EACTS) told the BBC that this raised serious concerns about the current guidelines based on the 2016 trial, while the European Society of Cardiology stands by these guidelines. The BBC also notes the potential conflicts of interests of researchers, many of whom had declared financial relationships with stent manufacturers.

I want to draw a more general lesson from this story, which is about the much-vaunted Single Version of Truth (SVOT). By limiting the clinical trial data to a single definition of heart attack, some of the richness and complexity of the data are lost or obscured. For some purposes at least, it would seem appropriate to make multiple versions of the truth available, so that they can be properly analysed and interpreted. SVOT not always a good thing, then.

See my previous blogposts on Single Source of Truth.

Deborah Cohen and Ed Brown, Surgeons withdraw support for heart disease advice (BBC Newsnight, 9 December 2019) See also https://www.youtube.com/watch?v=_vGfJKMbpp8

Debabrata Mukherjee, Fourth Universal Definition of Myocardial Infarction (American College of Cardiology, 25 Aug 2018)

See also Off-Label (March 2005), Is there a Single Version of Truth about Statins? (April 2019), Ethics of Transparency and Concealment (October 2019)

Saturday, December 07, 2019

Developing Data Strategy

The concepts of net-centricity, information superiority and power to the edge emerged out of the US defence community about twenty years ago, thanks to some thought leadership from the Command and Control Research Program (CCRP). One of the routes of these ideas into the civilian world was through a company called Groove Networks, which was acquired by Microsoft in 2005 along with its founder, Ray Ozzie. The Software Engineering Institute (SEI) provided another route. And from the mid 2000s onwards, a few people were researching and writing on edge strategies, including Philip Boxer, John Hagel and myself.

Information superiority is based on the idea that the ability to collect, process, and disseminate an uninterrupted flow of information will give you operational and strategic advantage. The advantage comes not only from the quantity and quality of information at your disposal, but also from processing this information faster than your competitors and/or fast enough for your customers. TIBCO used to call this the Two-Second Advantage.

And by processing, I'm not just talking about moving terabytes around or running up large bills from your cloud provider. I'm talking about enterprise-wide human-in-the-loop organizational intelligence: sense-making (situation awareness, model-building), decision-making (evidence-based policy), rapid feedback (adaptive response and anticipation), organizational learning (knowledge and culture). For example, the OODA loop. That's my vision of a truly data-driven organization.

There are four dimensions of information superiority which need to be addressed in a data strategy: reach, richness, agility and assurance. I have discussed each of these dimensions in a separate post:

Philip Boxer, Asymmetric Leadership: Power to the Edge

Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017) 

John Hagel III and John Seely Brown, The Agile Dance of Architectures – Reframing IT Enabled Business Opportunities (Working Paper 2003)

Vivek Ranadivé and Kevin Maney, The Two-Second Advantage: How We Succeed by Anticipating the Future--Just Enough (Crown Books 2011). Ranadivé was the founder and former CEO of TIBCO.

Richard Veryard, Building Organizational Intelligence (LeanPub 2012)

Richard Veryard, Information Superiority and Customer Centricity (Cutter Business Technology Journal, 9 March 2017) (registration required)

Wikipedia: CCRP, OODA Loop, Power to the Edge

Related posts: Microsoft and Groove (March 2005), Power to the Edge (December 2005), Two-Second Advantage (May 2010), Enterprise OODA (April 2012), Reach Richness Agility and Assurance (August 2017)

Wednesday, December 04, 2019

Data Strategy - Assurance

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.

In previous posts, I looked at Reach (the range of data sources and destinations), Richness (the complexity of data) and Agility (the speed and flexibility of response to new opportunities and changing requirements). Assurance is about Trust.

In 2002, Microsoft launched its Trustworthy Computing Initiative, which covered security, privacy, reliability and business integrity. If we look specifically at data, this mean two things.
  1. Trustworthy data - the data are reliable and accurate.
  2. Trustworthy data management - the processor is a reliable and responsible custodian of the data, especially in regard to privacy and security
Let's start by looking at trustworthy data. To understand why this is important (both in general and specifically to your organization), we can look at the behaviours that emerge in its absence. One very common symptom is the proliferation of local information. If decision-makers and customer-facing staff across the organization don't trust the corporate databases to be complete, up-to-date or sufficiently detailed, they will build private spreadsheets, to give them what they hope will be a closer version of the truth.

This is of course a data assurance nightmare - the data are out of control, and it may be easier for hackers to get the data out than it is for legitimate users. And good luck handling any data subject access request!

But in most organizations, you can't eliminate this behaviour simply by telling people they mustn't. If your data strategy is to address this issue properly, you need to look at the causes of the behaviour, understand what level of reliability and accessibility you have to give people, before they will be willing to rely on your version of the truth rather than theirs.

DalleMule and Davenport have distinguished two types of data strategy, which they call offensive and defensive. Offensive strategies are primarily concerned with exploiting data for competitive advantage, while defensive strategies are primarily concerned with data governance, privacy and security, and regulatory compliance.

As a rough approximation then, assurance can provide a defensive counterbalance to the offensive opportunities offered by reach, richness and agility. But it's never quite as simple as that. A defensive data quality regime might install strict data validation, to prevent incomplete or inconsistent data from reaching the database. In contrast, an offensive data quality regime might install strict labelling, with provenance data and confidence ratings, to allow incomplete records to be properly managed and appropriately used. This is the basis for the NetCentric strategy of Post Before Processing.

Because of course there isn't a single view of data quality. If you want to process a single financial transaction, you obviously need to have a complete, correct and confirmed set of bank details. But if you want aggregated information about upcoming financial transactions, you don't want any large transactions to be omitted from the total because of a few missing attributes. And if you are trying to learn something about your customers by running a survey, it's probably not a good idea to limit yourself to those customers who had the patience and loyalty to answer all the questions.

Besides data quality, your data strategy will need to have a convincing story about privacy and security. This may include certification (e.g. ISO 27001) as well as regulation (GDPR etc.) You will need to have proper processes in place for identifying risks, and ensuring that relevant data projects follow privacy-by-design and security-by-design principles. You may also need to look at the commercial and contractual relationships governing data sharing with other organizations.

All of this should add up to establishing trust in your data management - reassuring data subjects, business partners, regulators and other stakeholders that the data are in safe hands. And hopefully this means they will be happy for you to take your offensive data strategy up to the next level.

Next post: Developing Data Strategy

Leandro DalleMule and Thomas H. Davenport, What’s Your Data Strategy? (HBR, May–June 2017)

Richard Veryard, Microsoft's Trustworthy Computing (CBDI Journal, March 2003)

Wikipedia: Trustworthy Computing

Tuesday, December 03, 2019

Data Strategy - Agility

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.

In previous posts, I looked at Reach, which is about the range of data sources and destinations, and Richness, which is about the complexity of data. Now let me turn to Agility - the speed and flexibility of response to new opportunities and changing requirements.

Not surprisingly, lots of people are talking about data agility, including some who want to persuade you that their products and technologies will help you to achieve it. Here are a few of them.
Data agility is when your data can move at the speed of your business. For companies to achieve true data agility, they need to be able to access the data they need, when and where they need it. Pinckney
Collecting first-party data across the customer lifecycle at speed and scale. Jones
Keep up with an explosion of data. ... For many enterprises, their ability to collect data has surpassed their ability to organize it quickly enough for analysis and action. Scott
How quickly and efficiently you can turn data into accurate insights. Tuchen
But before we look at technological solutions for data agility, we need to understand the requirements. The first thing is to empower, enable and encourage people and teams to operate at a good tempo when working with data and intelligence, with fast feedback and learning loops.

Under a trimodal approach, for example, pioneers are expected to operate at a faster tempo, setting up quick experiments, so they should not be put under the same kind of governance as settlers and town planners. Data scientists often operate in pioneer mode, experimenting with algorithms that might turn out to help the business, but often don't. Obviously that doesn't mean zero governance, but appropriate governance. People need to understand what kinds of risk-taking are accepted or even encouraged, and what should be avoided. In some organizations, this will mean a shift in culture.

Beyond trimodal, there is a push towards self-service ("citizen") data and intelligence. This means encouraging and enabling active participation from people who are not doing this on a full-time basis, and may have lower levels of specialist knowledge and skill.

Besides knowledge and skills, there are other important enablers that people need to work with data. They need to be able to navigate and interpret, and this calls for meaningful metadata, such as data dictionaries and catalogues. They also need proper tools and platforms. Above all, they need an awareness of what is possible, and how it might be useful.

Meanwhile, enabling people to work quickly and effectively with data is not just about giving them relevant information, along with decent tools and training. It's also about removing the obstacles.

Obstacles? What obstacles?

In most large organizations, there is some degree of duplication and fragmentation of data across enterprise systems. There are many reasons why this happens, and the effects may be felt in various areas of the business, degrading the performance and efficiency of various business functions, as well as compromising the quality and consistency of management information. System interoperability may be inadequate, resulting in complicated workflows and error-prone operations.

But perhaps the most important effect is on inhibiting innovation. Any new IT initiative will need either to plug into the available data stores or create new ones. If this is to be done without adding further to technical debt, then the data engineering (including integration and migration) can often be more laborious than building the new functionality the business wants.

Depending on whom you talk to, this challenge can be framed in various ways - data engineering, data integration and integrity, data quality, master data management. The MDM vendors will suggest one approach, the iPaaS vendors will suggest another approach, and so on. Before you get lured along a particular path, it might be as well to understand what your requirements actually are, and how these fit into your overall data strategy.

And of course your data strategy needs to allow for future growth and discovery. It's no good implementing a single source of truth or a universal API to meet your current view of CUSTOMER or PRODUCT, unless this solution is capable of evolving as your data requirements evolve, with ever-increasing reach and richness. As I've often discussed on this blog before, one approach to building in flexibility is to use appropriate architectural patterns, such as loose coupling and layering, which should give you some level of protection against future variation and changing requirements, and such patterns should probably feature somewhere in your data strategy.

Next post - Assurance

Richard Jones, Agility and Data: The Heart of a Digital Experience Strategy (WayIn, 22 November 2018)

Tom Pinckney, What's Data Agility Anyway (Braze Magazine, 25 March 2019)

Jim Scott, Why Data Agility is a Key Driver of Big Data Technology Development (24 March 2015)

Mike Tuchen, Do You Have the Data Agility Your Business Needs? (Talend, 14 June 2017)

Related posts: Enterprise OODA (April 2012), Beyond Trimodal: Citizens and Tourists (November 2019)

Sunday, December 01, 2019

Data Strategy - Richness

This is one of a series of posts looking at the four key dimensions of data and information that must be addressed in a data strategy - reach, richness, agility and assurance.

In my previous post, I looked at Reach, which is about the range of data sources and destinations. Richness of data addresses the complexity of data - in particular the detailed interconnections that can be determined or inferred across data from different sources.

For example, if a supermarket is tracking your movements around the store, it doesn't only know that you bought lemons and fish and gin, it knows whether you picked up the lemons from the basket next to the fish counter, or from the display of cocktail ingredients. And can therefore guess how you are planning to use the lemons, leading to various forms of personalized insight and engagement.

Richness often means finer-grained data collection, possibly continuous streaming. It also means being able to synchronize data from different sources, possibly in real-time. For example, being able to correlate visits to your website with the screening of TV advertisements, which not only gives you insight and feedback on the effectiveness of your marketing, but also allows you to guess which TV programmes this customer is watching.

Artificial intelligence and machine learning algorithms should help you manage this complexity, picking weak signals from a noisy data environment, as well as extracting meaningful data from unstructured content. From quantity to quality.

In the past, when data storage and processing was more expensive than today, it was a common practice to remove much of the data richness when passing data from the operational systems (which might contain detailed transactions from the past 24 hours) to the analytic systems (which might only contain aggregated information over a much longer period). Not long ago, I talked to a retail organization where only the basket and inventory totals reached the data warehouse. (Hopefully they've now fixed this.) So some organizations are still faced with the challenge of reinstating and preserving detailed operational data, and making it available for analysis and decision support.

Richness also means providing more subtle intelligence, instead of expecting simple answers or trying to apply one-size-fits all insight. So instead of a binary yes/no answer to an important business question, we might get a sense of confidence or uncertainty, and an ability to take provisional action while actively seeking confirming or disconfirming data. (If you can take corrective action quickly, then the overall risk should be reduced.)

Next post: Agility

Data Strategy - Reach

This is one of a series of posts looking at the four key dimensions of Data and Information that must be addressed in a data strategy - reach, richness, agility and assurance.

Data strategy nowadays is dominated by the concept of big data, whatever that means. Every year our notions of bigness are being stretched further. So instead of trying to define big, let me talk about reach.

Firstly, this means reaching into more sources of data. Instead of just collecting data about the immediate transactions, enterprises now expect to have visibility up and down the supply chain, as well as visibility into the world of the customers and end-consumers. Data and information can be obtained from other organizations in your ecosystem, as well as picked up from external sources such as social media. And the technologies for monitoring (telemetrics, internet of things) and surveillance (face recognition, tracking, etc) are getting cheaper, and may be accurate enough for some purposes.

Obviously there are some ethical as well as commercial issues here. I'll come back to these.

Reach also means reaching more destinations. In a data-driven business, data and information need to get to where they can be useful, both inside the organization and across the ecosystem, to drive capabilities and processes, to support sense-making (also known as situation awareness), policy and decision-making, and intelligent action, as well as organizational learning. These are the elements of what I call organizational intelligence. Self-service (citizen) data and intelligence tools, available to casual as well as dedicated users, improve reach; and the tool vendors have their own reasons for encouraging this trend.

In many organizations, there is a cultural divide between the specialists in Head Office and the people at the edge of the organization. If an organization is serious about being customer-centric, it needs to make sure that relevant and up-to-date information and insight reaches those dealing with awkward customers and other immediate business challenges. This is the power-to-the-edge strategy.

Information and insight may also have value outside your organization - for example to your customers and suppliers, or other parties. Organizations may charge for access to this kind of information and insight (direct monetization), may bundle it with other products and services (indirect monetization), or may distribute it freely for the sake of wider ecosystem benefits.

And obviously there will be some data and intelligence that must not be shared, for security or other reasons. Many organizations will adopt a defensive data strategy, protecting all information unless there is a strong reason for sharing; others may adopt a more offensive data strategy, seeking competitive advantage from sharing and monetization except for those items that have been specifically classified as private or confidential.

How are your suppliers and partners thinking about these issues? To what extent are they motivated or obliged to share data with you, or to protect the data that you share with them? I've seen examples where organizations lack visibility of their own assets, because they have outsourced the maintenance of these assets to an external company, and the external company fails to provide sufficiently detailed or accurate information. (When implementing your data strategy, make sure your contractual agreements cover your information sharing requirements.)

Data protection introduces further requirements. Under GDPR, data controllers are supposed to inform data subjects how far their personal data will reach, although many of the privacy notices I've seen have been so vague and generic that they don't significantly constrain the data controller's ability to share personal data. Meanwhile, GDPR Article 28 specifies some of the aspects of data sharing that should be covered in contractual agreements between data controllers and data processors. But compliance with GDPR or other regulations doesn't fully address ethical concerns about the collection, sharing and use of personal data. So an ethical data strategy should be based on what the organization thinks is fair to data subjects, not merely what it can get away with.

There are various specific issues that may motivate an organization to improve the reach of data as part of its data strategy. For example:
  • Critical data belongs to third parties
  • Critical business decisions lacking robust data
  • I know the data is in there, but I can't get it out.
  • Lack of transparency – I can see the result, but I don’t know how it has been calculated.
  • Analytic insight narrowly controlled by a small group of experts – not easily available to general management
  • Data and/or insight would be worth a lot to our customers, if only we had a way of getting it to them.
In summary, your data strategy needs to explain how you are going to get data and intelligence
  • From a wide range of sources
  • Into a full range of business processes at all touchpoints
  • Delivered to the edge – where your organization engages with your customers

Next post Richness

Related posts

Power to the Edge (December 2005)
Reach, Richness, Agility and Assurance (August 2017)
Setting off towards the data-driven business (August 2019)
Beyond Trimodal - Citizens and Tourists (November 2019)

Friday, November 15, 2019

Beyond Trimodal - Citizens and Tourists

I've been hearing a lot about citizens recently. The iPaaS vendors are all talking about citizen integration, and at the Big Data London event this week I heard several people talking about citizen data science.

Gartner's view is that development can be divided into two modes - Mode 1 (conventional) and Mode 2 (self-service), with the citizen directed towards Mode 2. For a discussion of how this applies to Business Intelligence, see my post From Networked BI to Collaborative BI (April 2016).

There is a widespread view that Gartner's bimodal approach is outdated, and that at least three modes are required - a trimodal approach. Simon Wardley's version of the trimodal approach characterizes the roles as pioneers, settlers and town planners. My initial idea on citizen integration was that the citizen-expert spectrum could be roughly fitted into this approach as follows. If the town planners had set things up properly, this would enable easy integration by pioneers and settlers. See my recent posts on DataOps - Organizing the Data Value Chain and Strategy and Requirements for the API ecosystem.

But even if this works for citizen integration, it doesn't seem to work for data science. In her keynote talk this week, Cassie Kozyrkov of Google discussed how TensorFlow had developed from version 1.x (difficult to use, and only suitable for data science pioneers) to version 2.x (much easier to use and suitable for the citizen). And in his talk on the other side of the hall, Chris Williams of IBM Watson also talked about advance in tools that made data science much easier.

That doesn't mean that everyone can do data science, at least not yet, nor that the existing data science skills will become obsolete. Citizen data scientists are those who take data science seriously, who are able and willing to acquire some relevant knowledge and skill, but are performing data science in support of their main job role rather than as an occupation in its own right. 

We may therefore draw a distinction between two types of user - the citizen and the tourist. The tourist may have a casual interest but no serious commitment or responsibility. An analytics or AI platform may well provide some self-service support for tourists as well as citizens, but these will need to be highly constrained in their scope and power.

Now if we add the citizen and the tourist to the pioneer, settler and town planner, we get a pentamodal approach. The tourists may visit the pioneers and the towns, but probably isn't very interested in the settlements. Whereas the citizens mainly occupy the settlements - in other words, the places built by the settlers.

I wonder what Simon will make of this idea?

Andy Callow, Exploring Pioneers, Settlers and Town Planners (3 January 2017)

Jen Underwood, Responsible Citizen Data Science. Yes, it is Possible (9 July 2019)

For further discussion and references on the trimodal approach, see my post Beyond Bimodal (May 2016)

Thursday, November 07, 2019

On Magic Numbers - Privacy and Security

People and organizations often adopt a metrical approach to sensemaking, decision and policy. They attach numbers to things, perhaps using a weighted scorecard or other calculation method, and then make judgements about status or priority or action based on these numbers. Sometimes called triage.

In the simplest version, a single number is produced. More complex versions may involve producing several numbers (sometimes called a vector). For example, if an item can be represented by a pair of numbers, these can be used to position the item on a 2x2 quadrant. See my post Into The Matrix.

In this post, I shall look at how this approach works for managing risk, security and privacy.

A typical example of security scoring is the Common Vulnerability Scoring System (CVSS), which assigns numbers to security vulnerabilities. These numbers may determine or influence the allocation of resources within the security field.

Scoring systems are sometimes used within the privacy field as part of Privacy by Design (PbD) or Data Protection Impact Assessment (DPIA). The resultant numbers are used to decide whether something is acceptable, unacceptable or borderline. And in 2013, two researchers at ENISA published a scoring system for assessing the severity of data breaches. Scores less than 2 indicated low severity, scores higher than 4 indicated very high severity.

The advantage of these systems is that they are (relatively) quick and repeatable, especially across large diverse organizations with variable levels of subject matter expertise. The results are typically regarded as objective, and may therefore be taken more seriously by senior management and other stakeholders.

However, these systems are merely indicative, and the scores may not always provide a reliable or accurate view. For example, I doubt whether any Data Protection Officer would be justified in disregarding a potential data breach simply on the basis of a low score from an uncalibrated calculation.

Part of the problem is that these scoring systems operate a highly simplistic algebra, assuming you can break a complex situation into an number of separate factors (e.g. vulnerabilities), and then add them back together with some appropriate weightings. The weightings can be pretty arbitrary, and may not be valid for your organization. More importantly, as Marc Rogers argues (as reported by Shaun Nichols), the more sophisticated attacks rely on combinations of vulnerabilities, so assessing each vulnerability separately completely misses the point.

Thus although two minor bugs may have low CVSS ratings, interaction between them could allow a high severity attack. It is complex, but there is nothing in the assessment process to deal with that, Rogers said. It has lulled us into a false sense of security where we look at the score, and so long as it is low we don't allocate the resources.

One organization that has moved away from the scorecard approach is the Electronic Frontier Foundation. In 2014, they released a Secure Messaging Scorecard for evaluating messaging apps. However, they later decided that the scorecard format dangerously oversimplified the complex question of how various messengers stack up from a security perspective, so they archived the original scorecard and warned people against relying on it.

Nate Cardozo, Gennie Gebhart and Erica Portnoy, Secure Messaging? More Like A Secure Mess (Electronic Frontier Foundation, 26 March 2018)

Clara Galan Manso and Sławomir Górniak, Recommendations for a methodology of the assessment of severity of personal data breaches (ENISA 2013)

Shaun Nichols, We're almost into the third decade of the 21st century and we're still grading security bugs out of 10 like kids. Why? (The Register, 7 Nov 2019)

Wikipedia: Common Vulnerability Scoring System (CVSS)

Related posts: Into The Matrix (October 2015), False Sense of Security (June 2019)

Friday, October 25, 2019

Strategy and Requirements for the API ecosystem

Is there a framework or methodology for establishing the business / ecosystem requirements to drive API strategy and development?

At an industry event I attended recently, hosted by a company that sells tools and technologies for the API ecosystem, some of the speakers advised that when presenting to non-technical stakeholders, you need to talk about service value/benefit rather than APIs. But this raises an important question, how to identify and quantify service benefit, and how to negotiate share of value between different players in the ecosystem?

One of the ideas of the API economy is that you don't have to maintain all the capabilities yourself, but you find other enterprises that can provide complementary capabilities. So you need to identify and understand what capabilities are available, and map combinations of these capabilities against the demands and unfulfilled needs of potential customers. Then having identified in broad terms what capabilities you wish to combine with your own, and worked out where the service boundaries should be, you may select organizations to partner with and agree business and commercial terms, or create a platform to which many third parties can connect. The technical design of the API should then reflect the service boundaries and commercial arrangements.

In the early days of service-oriented software engineering, people always wanted us to tell them how large their services should be. Not just macro versus micro, but broad (generic) versus narrow (specific). To what extent should a service be completely purpose-agnostic - in other words, with no restrictions on how or where it may be used - or does this conflict with other design goals such as reliability or data protection?

The answer is that it depends not only on what you are trying to do, but how you want to manage and govern your service architecture. A broadly scoped, purpose-agnostic service (or service platform) can achieve wide usage and economies of scale, but may be more complex to configure, test and use, whereas a more narrowly scoped context-specific service might be easier to use but with lower reuse potential. Among other things, this affects how much of the service composition and orchestration can be done by the service provider (supply side), and how much is left to the service consumer (demand-side). And even on the supply side, it affects how much work needs to be done by the integration experts ("town planners"), and how much can be left to citizen integration ("pioneers" and "settlers").

One version of this challenge can be found in large global organizations, working out exactly what functionality should be provided centrally as shared services, and what functionality should be left to local operations. Ideally, the service architecture should be aligned with the business and organizational architecture.

The word "economy" also implies attention to accounting issues - sharing costs and benefits between different players. Although we may regard cloud as almost infinitely extensible, this doesn't come without cost: if the number of service calls goes through the roof, someone has to pay the cloud provider's bill. This is already an issue within large organizations, where we commonly find arguments about whose budget will pay for something. And I have seen some great ideas come to nothing, because the benefits were spread too thinly and nobody was able to fund them.

So although vague appeals to innovation and imagination might be good enough for a marketing pitch, serious strategic thinking is about discovering where there is untapped value in your business and its environment, and working out exactly how an API strategy is going to help you unlock this value.

At the CBDI Forum, we were talking about these issues many years ago: our Service Architecture and Engineering® methodology is still available from Everware-CBDI. Here are some of the articles I wrote for the CBDI Journal.
More at  http://everware-cbdi.com/cbdi-journal-archive

Tuesday, October 15, 2019

DataOps - Organizing the Data Value Chain

At #TalendConnect today frequent mention of #DataOps, although according to a post I found on the Talend blog from earlier this year, Talend prefers the term collaborative data management.
Data Preparation ... should be envisioned as a game-changing technology for information management due to its ability to enable potentially anyone to participate. Armed with innovative technologies, enterprises can organize their data value chain in a new collaborative way. Talend
I've always insisted that the data value chain should end not with delivering insight (so-called actionable intelligence) but with delivering business outcomes (actioned intelligence), and I was pleased to hear some of today's speakers making the same point. However, there are still voices within the industry that have a narrower view of DataOps, and I note with concern that the DataOps Manifesto identifies the goal of DataOps in terms of the early and continuous delivery of valuable analytic insights.

Although there will always be a place for analytic reports and dashboards, I always expected that these would gradually make way for analytic insights being rendered as services and integrated into operational business systems and processes, to create closed-loop business intelligence. There are many good examples of this today, especially in the manufacturing world. There are also systems that deliver insights directly to customers or end-users, perhaps in the form of recommendations. But a lot of the discussion of the data-driven enterprise still seems to be based on a dashboard mindset.

And who actually does the DataOps? A presentation from Virtusa showed a three-step DataOps process - pipeline, innovation and value - which suggests a trimodal approach. So the Town Planners would do the pipeline (building generic and highly customizable data preparation frameworks), Pioneers would do the innovation (experimental proof of concept), and the Settlers would roll out the value. I shall be interested to see some practical implementations of this approach.

Meanwhile, simplistic notions of democratization (or "citizen integration") often divides people into two camps - experts and citizens - and this polarization is encouraged by Gartner's promotion of Bimodal IT. But this leads people to believe that you can have either trust or speed/agility but not both. And as Jonathan Gill of Talend emphasized in his keynote today, digital leaders don't recognize this dichotomy.

Jean-Michel Franco, 3 Key Takeaways from the 2019 Gartner Market Guide for Data Preparation (Talend, 26 April 2019)

Wikipedia: DataOps

Related posts: Service-Oriented Business Intelligence (September 2005), SPARK 2 Innovation or Trust (March 2006), Analytics for Adults (January 2013), From Networked BI to Collaborative BI (April 2016), Beyond Bimodal (May 2016), Towards the Data-Driven Business (August 2019)