Thursday, November 07, 2019

On Magic Numbers - Privacy and Security

People and organizations often adopt a metrical approach to sensemaking, decision and policy. They attach numbers to things, perhaps using a weighted scorecard or other calculation method, and then make judgements about status or priority or action based on these numbers. Sometimes called triage.

In the simplest version, a single number is produced. More complex versions may involve producing several numbers (sometimes called a vector). For example, if an item can be represented by a pair of numbers, these can be used to position the item on a 2x2 quadrant. See my post Into The Matrix.

In this post, I shall look at how this approach works for managing risk, security and privacy.

A typical example of security scoring is the Common Vulnerability Scoring System (CVSS), which assigns numbers to security vulnerabilities. These numbers may determine or influence the allocation of resources within the security field.

Scoring systems are sometimes used within the privacy field as part of Privacy by Design (PbD) or Data Protection Impact Assessment (DPIA). The resultant numbers are used to decide whether something is acceptable, unacceptable or borderline. And in 2013, two researchers at ENISA published a scoring system for assessing the severity of data breaches. Scores less than 2 indicated low severity, scores higher than 4 indicated very high severity.

The advantage of these systems is that they are (relatively) quick and repeatable, especially across large diverse organizations with variable levels of subject matter expertise. The results are typically regarded as objective, and may therefore be taken more seriously by senior management and other stakeholders.

However, these systems are merely indicative, and the scores may not always provide a reliable or accurate view. For example, I doubt whether any Data Protection Officer would be justified in disregarding a potential data breach simply on the basis of a low score from an uncalibrated calculation.

Part of the problem is that these scoring systems operate a highly simplistic algebra, assuming you can break a complex situation into an number of separate factors (e.g. vulnerabilities), and then add them back together with some appropriate weightings. The weightings can be pretty arbitrary, and may not be valid for your organization. More importantly, as Marc Rogers argues (as reported by Shaun Nichols), the more sophisticated attacks rely on combinations of vulnerabilities, so assessing each vulnerability separately completely misses the point.

Thus although two minor bugs may have low CVSS ratings, interaction between them could allow a high severity attack. It is complex, but there is nothing in the assessment process to deal with that, Rogers said. It has lulled us into a false sense of security where we look at the score, and so long as it is low we don't allocate the resources.

One organization that has moved away from the scorecard approach is the Electronic Frontier Foundation. In 2014, they released a Secure Messaging Scorecard for evaluating messaging apps. However, they later decided that the scorecard format dangerously oversimplified the complex question of how various messengers stack up from a security perspective, so they archived the original scorecard and warned people against relying on it.

Nate Cardozo, Gennie Gebhart and Erica Portnoy, Secure Messaging? More Like A Secure Mess (Electronic Frontier Foundation, 26 March 2018)

Clara Galan Manso and Sławomir Górniak, Recommendations for a methodology of the assessment of severity of personal data breaches (ENISA 2013)

Shaun Nichols, We're almost into the third decade of the 21st century and we're still grading security bugs out of 10 like kids. Why? (The Register, 7 Nov 2019)

Wikipedia: Common Vulnerability Scoring System (CVSS)

Related posts: Into The Matrix (October 2015), False Sense of Security (June 2019)

Friday, October 25, 2019

Strategy and Requirements for the API ecosystem

Is there a framework or methodology for establishing the business / ecosystem requirements to drive API strategy and development?

At an industry event I attended recently, hosted by a company that sells tools and technologies for the API ecosystem, some of the speakers advised that when presenting to non-technical stakeholders, you need to talk about service value/benefit rather than APIs. But this raises an important question, how to identify and quantify service benefit, and how to negotiate share of value between different players in the ecosystem?

One of the ideas of the API economy is that you don't have to maintain all the capabilities yourself, but you find other enterprises that can provide complementary capabilities. So you need to identify and understand what capabilities are available, and map combinations of these capabilities against the demands and unfulfilled needs of potential customers. Then having identified in broad terms what capabilities you wish to combine with your own, and worked out where the service boundaries should be, you may select organizations to partner with and agree business and commercial terms, or create a platform to which many third parties can connect. The technical design of the API should then reflect the service boundaries and commercial arrangements.

In the early days of service-oriented software engineering, people always wanted us to tell them how large their services should be. Not just macro versus micro, but broad (generic) versus narrow (specific). To what extent should a service be completely purpose-agnostic - in other words, with no restrictions on how or where it may be used - or does this conflict with other design goals such as reliability or data protection?

The answer is that it depends not only on what you are trying to do, but how you want to manage and govern your service architecture. A broadly scoped, purpose-agnostic service (or service platform) can achieve wide usage and economies of scale, but may be more complex to configure, test and use, whereas a more narrowly scoped context-specific service might be easier to use but with lower reuse potential. Among other things, this affects how much of the service composition and orchestration can be done by the service provider (supply side), and how much is left to the service consumer (demand-side). And even on the supply side, it affects how much work needs to be done by the integration experts ("town planners"), and how much can be left to citizen integration ("pioneers" and "settlers").

One version of this challenge can be found in large global organizations, working out exactly what functionality should be provided centrally as shared services, and what functionality should be left to local operations. Ideally, the service architecture should be aligned with the business and organizational architecture.

The word "economy" also implies attention to accounting issues - sharing costs and benefits between different players. Although we may regard cloud as almost infinitely extensible, this doesn't come without cost: if the number of service calls goes through the roof, someone has to pay the cloud provider's bill. This is already an issue within large organizations, where we commonly find arguments about whose budget will pay for something. And I have seen some great ideas come to nothing, because the benefits were spread too thinly and nobody was able to fund them.

So although vague appeals to innovation and imagination might be good enough for a marketing pitch, serious strategic thinking is about discovering where there is untapped value in your business and its environment, and working out exactly how an API strategy is going to help you unlock this value.

At the CBDI Forum, we were talking about these issues many years ago: our Service Architecture and Engineering® methodology is still available from Everware-CBDI. Here are some of the articles I wrote for the CBDI Journal.
More at

Tuesday, October 15, 2019

DataOps - Organizing the Data Value Chain

At #TalendConnect today frequent mention of #DataOps, although according to a post I found on the Talend blog from earlier this year, Talend prefers the term collaborative data management.
Data Preparation ... should be envisioned as a game-changing technology for information management due to its ability to enable potentially anyone to participate. Armed with innovative technologies, enterprises can organize their data value chain in a new collaborative way. Talend
I've always insisted that the data value chain should end not with delivering insight (so-called actionable intelligence) but with delivering business outcomes (actioned intelligence), and I was pleased to hear some of today's speakers making the same point. However, there are still voices within the industry that have a narrower view of DataOps, and I note with concern that the DataOps Manifesto identifies the goal of DataOps in terms of the early and continuous delivery of valuable analytic insights.

Although there will always be a place for analytic reports and dashboards, I always expected that these would gradually make way for analytic insights being rendered as services and integrated into operational business systems and processes, to create closed-loop business intelligence. There are many good examples of this today, especially in the manufacturing world. There are also systems that deliver insights directly to customers or end-users, perhaps in the form of recommendations. But a lot of the discussion of the data-driven enterprise still seems to be based on a dashboard mindset.

And who actually does the DataOps? A presentation from Virtusa showed a three-step DataOps process - pipeline, innovation and value - which suggests a trimodal approach. So the Town Planners would do the pipeline (building generic and highly customizable data preparation frameworks), Pioneers would do the innovation (experimental proof of concept), and the Settlers would roll out the value. I shall be interested to see some practical implementations of this approach.

Meanwhile, simplistic notions of democratization (or "citizen integration") often divides people into two camps - experts and citizens - and this polarization is encouraged by Gartner's promotion of Bimodal IT. But this leads people to believe that you can have either trust or speed/agility but not both. And as Jonathan Gill of Talend emphasized in his keynote today, digital leaders don't recognize this dichotomy.

Jean-Michel Franco, 3 Key Takeaways from the 2019 Gartner Market Guide for Data Preparation (Talend, 26 April 2019)

Wikipedia: DataOps

Related posts: Service-Oriented Business Intelligence (September 2005), SPARK 2 Innovation or Trust (March 2006), Analytics for Adults (January 2013), From Networked BI to Collaborative BI (April 2016), Beyond Bimodal (May 2016), Towards the Data-Driven Business (August 2019)

Monday, September 30, 2019

Towards Data Model Harmony

Last week, someone asked me how I go about producing a data model. I found out afterwards that my answer was considered too brief. So here's a longer version of my answer.

The first thing to consider is the purpose of the modelling. Sometimes there is a purely technical agenda - for example, updating or reengineering some data platform - but usually there are some business requirements and opportunities - for example to make the organization more data-driven. I prefer to start by looking at the business model - what services does it provide to its customers, what capabilities or processes are critical to the organization, what decisions and policies need to be implemented, and what kind of evidence and feedback loops can improve things. From all this, we can produce a high-level set of data requirements - what concepts, how interconnected, at what level of granularity, etc. - and working top-down from conceptual data models to produce more detailed logical and physical models.

But there are usually many data models in existence already - which may be conceptual, logical or physical. Some of these may be formally documented as models, whether using a proper data modelling tool or just contained in various office tools (e.g. Excel, PowerPoint, Visio, Word). Some of them are implicit in other documents, such as written policies and procedures, or can be inferred ("reverse engineered") from existing systems and from the structure and content of data stores. Some concepts and the relationships between them are buried in people's heads and working practices, and may need to be elicited.

And that's just inside the organization. When we look outside, there may be industry models and standards, such as ACORD (insurance) and GS1 (groceries). There may also be models pushed by vendors and service/platform providers - IBM has been in this game longer than most. There may also be models maintained by external stakeholders - e.g., suppliers, customers, regulators.

There are several points to make about this collection of data models.
  • There will almost certainly be conflicts between these models - not just differences in scope and level/ granularity, but direct contradictions.
  • And some of these models will be internally inconsistent. Even the formal ones may not be perfectly consistent, and the inferred/ elicited ones may be very muddled. The actual content of a data store may not conform to the official schema (data quality issues).
  • You probably don't have time to wade through all of them, although there are some tools that may be able to process some of these automatically for you. So you will have to be selective, and decide which ones are more important.
  • In general, your job is not simply to reproduce these models (minus the inconsistencies) but to build models that will support the needs of the business and its stakeholders. So looking at the existing models is necessary but not sufficient.

So why do you need to look at the "legacy" models at all?  Here are the main reasons.
  • Problems and issues that people may be experiencing with existing systems and processes can often be linked to problems with the underlying data models.
  • Inflexibility in these data models may constrain future business strategies and tactics.
  • New systems and processes typically need to transition from existing ones - not just data migration but also conceptual migration (people learning and adopting a revised set of business concepts and working practices) - and/or interoperate with them (data integration, joined-up business).
  • Some of the complexity in the legacy models may be redundant, but some of it may provide clues about complexity in the real world. (The fallacy of eliminating things just because you don't understand why they're there is known as Chesterton's Fence. See my post on Low-Hanging Fruit.) The requirements elicitation process typically finds a lot of core requirements, but often misses many side details. So looking at the legacy models provides a useful completeness check.

If your goal is to produce a single, consistent, enterprise-wide data model, good luck with that. I'll check back with you in ten years to see how far you've got. Meanwhile, the pragmatic approach is to work at multiple tempos in parallel - supporting short term development sprints, refactoring and harmonizing in the medium term, while maintaining steady progress towards a longer-term vision. Accepting that all models are wrong, and prioritizing the things that matter most to the organization.

The important issues tend to be convergence and unbundling. Firstly, while you can't expect to harmonize everything in one go, you don't want things to diverge any further. And secondly, where two distinct concepts have been bundled together, trying to tease them apart - at least for future systems and data stores - for the sake of flexibility.

Finally, how do I know whether the model is any good? On the one hand, I need to be able to explain it to the business, so it had better not be too complicated or abstract. On the other hand, it needs to be able to reflect the real complexity of the business, which means testing it against a range of scenarios to make sure I haven't embedded any false or simplistic assumptions.

Longer answers are also available. Would you like me to run a workshop for you?

Wikipedia: All Models are Wrong, Chesterton's Fence

Related posts

How Many Products? (October 2004), Modelling Complex Classification (February 2009), Deconstructing the Grammar of Business (June 2009), Conceptual Modelling - Why Theory (November 2011)

Declaration of interest - in 2008(?) I wrote some white papers for IBM concerning the use of their industry models.

Thursday, August 22, 2019

Setting off Towards the Data-Driven Business

In an earlier post Towards the Data-Driven Business, I talked about the various roles that data and intelligence can play in the business. But where do you start? In this post, I shall talk about the approach that I have developed and used in a number of large organizations.

To build a roadmap that takes you into the future from where you are today, you need three things.

Firstly an understanding of the present. This includes producing AS-IS models of your current (legacy) systems, what data have you got and how are you currently managing and using it. We need to know about the perceived pain points, not because we only want to fix the symptoms, but because these will help us build a consensus for change. Typically we find a fair amount of duplicated and inconsistent data, crappy or non-existent interfaces, slow process loops and data bottlenecks, and general inflexibility.

This is always complicated by the fact that there are already numerous projects underway to fix some of the problems, or to build additional functionality, so we need to understand how these projects are expected to alter the landscape, and in what timescale. It sometimes becomes apparent that these projects are not ideally planned and coordinated from a data management perspective. If we find overlapping or fragmented responsibility in some critical data areas, we may need to engage with programme management and governance to support greater consistency and synergy.

Secondly a vision of the future opportunities for data and intelligence (and automation based on these). In general terms, these are outlined in my earlier post. To develop a vision for a specific organization, we need to look at their business model - what value do they provide to customers and other stakeholders, how is this value delivered (as business services or otherwise), and how do the capabilities and processes of the organization and its partners support this.

For example, I worked with an organization that had done a fair amount of work on modelling their internal processes and procedures, but lacked the outside-in view. So I developed a business service architecture that showed how the events and processes in their customers' world triggered calls on their services, and what this implied for delivering a seamless experience to their customers.

Using a capability-based planning approach, we can then look at how data, intelligence and automation could improve not only individual business services, processes and underlying capabilities, but also the coordination and feedback loops between these. For example in a retail environment, there are typically processes and capabilities associated with both Buying and Selling, and you may be able to use data and intelligence to make each of them more efficient and effective. But more importantly, you can improve the alignment between Buying and Selling.

(In some styles of business capability model, coordination is shown explicitly as a capability in its own right, but this is not a common approach.)

The business model also identifies which areas are strategically important to the business. At one organization, when we mapped the IT costs against the business model, we found that a disproportionate amount of effort was being devoted to non-strategic stuff, and surprisingly little effort for the customer-facing (therefore more strategically important) activities. (A colour-coded diagram can be very useful in presenting such issues to senior management.)

Most importantly, we find that a lot of stakeholders (especially within IT) have a fairly limited vision about what is possible, often focused on the data they already have rather than the data they could or should have. The double-diamond approach to design thinking works here, to combine creative scenario planning with highly focused practical action. I've often found senior business people much more receptive to these kind of discussions than the IT folk.

We should then be able to produce a reasonably future-proof and technology independent TO-BE data and information architecture, which provides a loosely-coupled blueprint for data collection, processing and management.

Thirdly, how to get from A to B. In a large organization, this is going to take several years. A complete roadmap cannot just be a data strategy, but will usually involve some elements of business process and organizational change, as well as application, integration and technology strategy. It may also involve outside stakeholders - for example, providing direct access to suppliers and business partners via portals and APIs, and sharing data and intelligence with them, while obtaining consent from data subjects and addressing any other privacy, security and compliance issues. There are always dependencies between different streams of activity within the programme as well as with other initiatives, and these dependencies need to be identified and managed, even if we can avoid everything being tightly coupled together.

Following the roadmap will typically contain a mix of different kinds of project. There may need to be some experimental ("pioneer") projects as well as larger development and infrastructure ("settler", "town planner") projects.

To gain consensus and support, you need a business case. Although different organizations may have different ways of presenting and evaluating the business case, and some individuals and organizations are more risk-averse than others, a business case will always involve an argument that the benefits (financial and possibly non-financial) outweigh the costs and risks.

Generally, people like to see some short-term benefits ("quick wins" or the dreaded "low-hanging fruit") as well as longer-term benefits. A well-balanced roadmap spreads the benefits across the phases - if you manage to achieve 80% of the benefits in phase 1, then your roadmap probably wasn't ambitious enough, so don't be surprised if nobody wants to fund phase 2. 

Finally, you have to implement your roadmap. This means getting the funding and resources, kicking off multiple projects as well as connecting with relevant projects already underway, managing and coordinating the programme. It also means being open to feedback and learning, responding to new emerging challenges (such as regulation and competition), maintaining communication with stakeholders, and keeping the vision and roadmap alive and up-to-date.

Related posts

See also

Wednesday, August 07, 2019

Process Automation and Intelligence

What kinds of automation are there, and is there a natural progression from basic to advanced? Do the terms intelligent automation and cognitive automation actually mean anything useful, or are they merely vendor hype? In this blogpost, I shall attempt an answer.

Robotic Automation

The simplest form of automation is known as robotic automation or robotic process automation (RPA). The word robot (from the Czech word for forced labour, robota) implies a pre-programmed response to a set of incoming events. The incoming events are represented as structured data, and may be held in a traditional database. The RPA tools also include the connectivity and workflow technology to receive incoming data, interrogate databases and drive action, based on a set of rules.

Cognitive Automation

People talk about cognitive technology or cognitive computing, but what exactly does this mean? In its marketing material, IBM uses these terms to describe whatever features of IBM Watson they want to draw our attention to – including adaptability, interactivity and persistence – but IBM’s usage of these terms is not universally accepted.

I understand cognition to be all about perceiving and making sense of the world, and we are now seeing man-made components that can achieve some degree of this, sometimes called Cognitive Agents.

Cognitive agents can also be used to detect patterns in vast volumes of structured and unstructured data and interpret their meaning. This is known as Cognitive Insight, which Thomas Davenport and Rajeev Ronanki refer to as “analytics on steroids”. The general form of the cognitive agent is as follows.

Cognitive agents can be wrapped as a service and presented via an API, in which case they are known as Cognitive Services. The major cloud platforms (AWS, Google Cloud, Microsoft Azure) provide a range of these services, including textual sentiment analysis.

At the current state-of-the-art, cognitive services may be of variable quality. Image recognition may be misled by shadows, and even old-fashioned OCR may struggle to generate meaningful text from poor resolution images. – but of course human cognition is also fallible.

Intelligent Automation

Meanwhile, one of the key characteristics of intelligence is adaptability – being able to respond flexibly to different conditions. Intelligence is developed and sustained by feedback loops – detecting outcomes and adjusting behaviour to achieve goals. Intelligent automation therefore includes a feedback loop, typically involving some kind of machine learning.

Complex systems and processes may require multiple feedback loops (Double-Loop or Triple-Loop Learning). 

Edge Computing

If we embed this automation into the Internet of Things, we can use sensors to perform the information gathering, and actuators to carry out the actions.

Closed-Loop Automation

Now what happens if we put all these elements together?

This fits into a more general framework of human-computer intelligence, in which intelligence is broken down into six interoperating capabilities.

I know that some people will disagree with me as to which parts of this framework are called "cognitive" and which parts "intelligent". Ultimately, this is just a matter of semantics. The real point is to understand how all the pieces of cognitive-intelligent automation work together.

The Limits of Machine Intelligence

There are clear limits to what machines can do – but this doesn’t stop us getting them to perform useful work, in collaboration with humans where necessary. (Collaborative robots are sometimes called cobots.) A well-designed collaboration between human and machine can achieve higher levels of productivity and quality than either human or machine alone. Our framework allows us to identify several areas where human abilities and artificial intelligence can usefully combine.

In the area of perception and cognition, there are big differences in the way that humans and machines view things, and therefore significant differences in the kinds of kinds of cognitive mistakes they are prone to. Machines may spot or interpret things that humans might miss, and vice versa. There is good evidence for this effect in medical diagnosis, where a collaboration between human medic and AI can often produce higher accuracy than either can achieve alone.

In the area of decision-making, robots can make simple decisions much faster, but may be unreliable with more complex or borderline decisions, so a hybrid “human-in-the-loop” solution may be appropriate. 

Decisions that affect real people are subject to particular concern – GDPR specifically regulates any automated decision-making or profiling that is made without human intervention, because of the potential impact on people’s rights and freedoms. In such cases, the “human-in-the-loop” solution reduces the perceived privacy risk. In the area of communication and collaboration, robots can help orchestrate complex interactions between multiple human experts, and allow human observations to be combined with automatic data gathering. Meanwhile, sophisticated chatbots are enabling more complex interactions between people and machines.

Finally there is the core capability of intelligence – learning. Machines learn by processing vast datasets of historical data – but that is also their limitation. So learning may involve fast corrective action by the robot (using machine learning), with a slower cycle of adjustment and recalibration by human operators (such as Data Scientists). This would be an example of Double-Loop learning.

Automation Roadmap

Some of the elements of this automation framework are already fairly well developed, with cost-effective components available from the technology vendors. So there are some modes of automation that are available for rapid deployment. Other elements are technologically immature, and may require a more cautious or experimental approach.

Your roadmap will need to align the growing maturity of your organization with the growing maturity of the technology, exploiting quick wins today while preparing the groundwork to be in a position to take advantage of emerging tools and techniques in the medium term.

Thomas Davenport and Rajeev Ronanki, Artificial Intelligence for the Real World (January–February 2018)

Related posts: Automation Ethics (August 2019), RPA - Real Value or Painful Experimentation? (August 2019)

Saturday, August 03, 2019

Towards the Data-Driven Business

If we want to build a data-driven business, we need to appreciate the various roles that data and intelligence can play in the business - whether improving a single business service, capability or process, or improving the business as a whole. The examples in this post are mainly from retail, but a similar approach can easily be applied to other sectors.

Sense-Making and Decision Support

The traditional role of analytics and business intelligence is helping the business interpret and respond to what is going on.

Once upon a time, business intelligence always operated with some delay. Data had to be loaded from the operational systems into the data warehouse before they could be processed and analysed. I remember working with systems that generated management infomation based on yesterday's data, or even last month's data. Of course, such systems don't exist any more (!?), because people expect real-time insight, based on streamed data.

Management information systems are supposed to support individual and collective decision-making. People often talk about actionable intelligence, but of course it doesn't create any value for the business until it is actioned. Creating a fancy report or dashboard isn't the real goal, it's just a means to an end.

Analytics can also be used to calculate complicated chains of effects on a what-if basis. For example, if we change the price of this product by this much, what effect is this predicted to have on the demand for other products, what are the possible responses from our competitors, how does the overall change in customer spending affect supply chain logistics, do we need to rearrange the shelf displays, and so on. How sensitive is Y to changes in X, and what is the optimal level of Z?

Analytics can also be used to support large-scale optimization - for example, solving complicated scheduling problems.

Automated Action

Increasingly, we are looking at the direct actioning of intelligence, possibly in real-time. The intelligence drives automated decisions within operational business processes, often without a human-in-the-loop, where human supervision and control may be remote or retrospective. A good example of this is dynamic retail pricing, where an algorithm adjusts the prices of goods and services according to some model of supply and demand. In some cases, optimized plans and schedules can be implemented without a human in the loop.

So the data doesn't just flow from the operational systems into the data warehouse, but there is a control flow back into the operational systems. We can call this closed loop intelligence.

(If it takes too much time to process the data and generate the action, the action may no longer be appropriate. A few years ago, one of my clients wanted to use transaction data from the data warehouse to generate emails to customers - but with their existing architecture there would have been a 48 hour delay from the transaction to the email, so we needed to find a way to bypass this.)

Managing Complexity

If you have millions of customers buying hundreds of thousands of products, you need ways of aggregating the data in order to manage the business effectively. Customers can be grouped into segments, products can be grouped into categories, and many organizations use these groupings as a basis for dividing responsibilities between individuals and teams. However, these groupings are typically inflexible and sometimes seem perverse.

For example, in a large supermarket, after failing to find maple syrup next to the honey as I expected, I was told I should find it next to the custard. There may well be a logical reason for this grouping, but this logic was not apparent to me as a customer.

But the fact that maple syrup is in the same product category as custard doesn't just affect the shelf layout, it may also mean that it is automatically included in decisions affecting the custard category and excluded from decisions affecting the honey category. For example, pricing and promotion decisions.

A data-driven business is able to group things dynamically, based on affinity or association, and then allows simple and powerful decisions to be made for this dynamic group, at the right level of aggregation.

Automation can then be used to cascade the action to all affected products, making the necessary price, logistical and other adjustments for each product. This means that a broad plan can be quickly and consistently implemented across thousands of products.

Experimentation and Learning

In a data-driven business, every activity is designed for learning as well as doing. Feedback is used in the cybernetic sense - collecting and interpreting data to control and refine business rules and algorithms.

In a dynamic world, it is necessary to experiment constantly. A supermarket or online business is a permanent laboratory for testing the behaviour of its customers. For example, A/B testing where alternatives are presented to different customers on different occasions to test which one gets the best response. As I mentioned in an earlier post, Netflix declares themselves "addicted" to the methodology of A/B testing.

In a simple controlled experiment, you change one variable and leave everything else the same. But in a complex business world, everything is changing. So you need advanced statistics and machine learning, not only to interpret the data, but also to design experiments that will produce useful data.

Managing Organization

A traditional command-and-control organization likes to keep the intelligence and insight in the head office, close to top management. An intelligent organization on the other hand likes to mobilize the intelligence and insight of all its people, and encourage (some) local flexibility (while maintaining global consistency). With advanced data and intelligence tools, power can be driven to the edge of the organization, allowing for different models of delegation and collaboration. For example, retail management may feel able to give greater autonomy to store managers, but only if the systems provide faster feedback and more effective support. 


Related to the previous point, data and intelligence can provide clarity and governance to the business, and to a range of other stakeholders. This has ethical as well as regulatory implications.

Among other things, transparent data and intelligence reveal their provenance and derivation. (This isn't the same thing as explanation, but it probably helps.)

Obviously most organizations already have many of the pieces of this, but there are typically major challenges with legacy systems and data - especially master data management. Moving onto the cloud, and adopting advanced integration and robotic automation tools may help with some of these challenges, but it is clearly not the whole story.

Some organizations may be lopsided or disconnected in their use of data and intelligence. They may have very sophisticated analytic systems in some areas, while other areas are comparatively neglected. There can be a tendency to over-value the data and insight you've already got, instead of thinking about the data and insight that you ought to have.

Making an organization more data-driven doesn't always entail a large transformation programme, but it does require a clarity of vision and pragmatic joined-up thinking.

Related posts: Rhyme or Reason: The Logic of Netflix (June 2017), Setting off towards the Data-Driven Business (August 2019)

Updated 13 September 2019

Sunday, July 14, 2019

Trial by Ordeal

Some people think that ethical principles only apply to implemented systems, and that experimental projects (trials, proofs of concept, and so on) don't need the same level of transparency and accountability.

Last year, Google employees (as well as US senators from both parties) expressed concern about Google's Dragonfly project, which appeared to collude with the Chinese government in censorship and suppression of human rights. A secondary concern was that Dragonfly was conducted in secrecy, without involving Google's privacy team.  

Google's official position (led by CEO Sundar Pinchai) was that Dragonfly was "just an experiment". Jack Poulson, who left Google last year over this issue and has now started a nonprofit organization called Tech Inquiry, has also seen this pattern in other technology projects.
"I spoke to coworkers and they said 'don’t worry, by the time the thing launches, we'll have had a thorough privacy review'. When you do R and D, there's this idea that you can cut corners and have the privacy team fix it later." (via Alex Hern)
A few years ago, Microsoft Research ran an experiment on "emotional eating", which involved four female employees wearing smart bras. "Showing an almost shocking lack of sensitivity for gender stereotyping", wrote Sebastian Anthony. While I assume that the four subjects willingly volunteered to participate in this experiment, and I hope the privacy of their emotional data was properly protected, it does seem to reflect the same pattern - that you can get away with things in the R and D stage that would be highly problematic in a live product.

Poulson's position is that the engineers working on these projects bear some responsibility for the outcomes, and that they need to see that the ethical principles are respected. He therefore demands transparency to avoid workers being misled. He also notes that if the ethical considerations are deferred to a late stage of a project, with the bulk of the development costs already incurred and many stakeholders now personally invested in the success of the project, the pressure to proceed quickly to launch may be too strong to resist.

Sebastian Anthony, Microsoft’s new smart bra stops you from emotionally overeating (Extreme Tech, 9 December 2013)

Erin Carroll et al, Food and Mood: Just-in-Time Support for Emotional Eating (Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013)

Ryan Gallagher, Google’s Secret China Project “Effectively Ended” After Internal Confrontation (The Intercept, 17 December 2018)

Alex Hern, Google whistleblower launches project to keep tech ethical (Guardian, 13 July 2019)

Casey Michel, Google’s secret ‘Dragonfly’ project is a major threat to human rights (Think Progress, 11 Dec 2018)

Iain Thomson, Microsoft researchers build 'smart bra' to stop women's stress eating (The Register, 6 Dec 2013)

Saturday, June 15, 2019

The Road Less Travelled

Are algorithms trustworthy, asks @NizanGP.
"Many of us routinely - and even blindly - rely on the advice of algorithms in all aspects of our lives, from choosing the fastest route to the airport to deciding how to invest our retirement savings. But should we trust them as much as we do?"

Dr Packin's main point is about the fallibility of algorithms, and the excessive confidence people place in them. @AnnCavoukian reinforces this point.

But there is another reason to be wary of the advice of the algorithm, summed up by the question: Whom does the algorithm serve?

Because the algorithm is not working for you alone. There are many people trying to get to the airport, and if they all use the same route they may all miss their flights. If the algorithm is any good, it will be advising different people to use different routes. (Most well-planned cities have more than one route to the airport, to avoid a single point of failure.) So how can you trust the algorithm to give you the fastest route? However much you may be paying for the navigation service (either directly, or bundled into the cost of the car/device), someone else may be paying a lot more. For the road less travelled.

The algorithm-makers may also try to monetize the destinations. If a particular road is used for getting to a sports venue as well as the airport, then the two destinations can be invited to bid to get the "best" routes for their customers - or perhaps for themselves. ("Best" may not mean fastest - it could mean the most predictable. And the venue may be ambivalent about this - the more unpredictable the journey, the more people will arrive early to be on the safe side, spreading the load on the services as well as spending more on parking and refreshments.)

In general, the algorithm is juggling the interests of many different stakeholders, and we may assume that this is designed to optimize the commercial returns to the algorithm-makers.

The same is obviously true of investment advice. The best time to buy a stock is just before everyone else buys, and the best time to sell a stock is just after everyone else buys. Which means that there are massive opportunities for unethical behaviour when advising people where / when to invest their retirement savings, and it would be optimistic to assume that the people programming the algorithms are immune from this temptation, or that regulators are able to protect investors properly.

And that's before we start worrying about the algorithms being manipulated by hostile agents ...

So remember the Weasley Doctrine: "Never trust anything that can think for itself if you can't see where it keeps its brain."

Nizan Geslevich Packin, Why Investors Should Be Wary of Automated Advice (Wall Street Journal, 14 June 2019)

Dozens of drivers get stuck in mud after Google Maps reroutes them into empty field (ABC7 New York, 26 June 2019) HT @jonerp

Related posts: Towards Chatbot Ethics (May 2019), Whom does the technology serve? (May 2019), Robust Against Manipulation (July 2019)

Updated 27 July 2019

Thursday, May 30, 2019

Responsibility by Design - Activity View

In my ongoing work on #TechnologyEthics. I have identified Five Elements of Responsibility by Design. One of these elements is what I'm calling the Activity View - defining effective and appropriate action at different points in the lifecycle of a technological innovation or product - who does what when. (Others may wish to call it the Process View.)

So in this post, I shall sketch some of the things that may need to be done at each of the following points: planning and requirements; risk assessment; design; verification, validation and test; deployment and operation; incident management; decommissioning. For the time being, I shall assume that these points can be interpreted within any likely development or devops lifecycle, be it sequential ("waterfall"), parallel, iterative, spiral, agile, double diamond or whatever.

Please note that this is an incomplete sketch, and I shall continue to flesh this out.

Planning and Requirements

This means working out what you are going to do, how you are going to do it, who is going to do it, who is going to pay for it, and who is going to benefit from it. What is the problem or opportunity you are addressing, and what kind of solution / output are you expecting to produce? It also means looking at the wider context - for example, exploring potential synergies with other initiatives.

The most obvious ethical question here is to do with the desirability of the solution. What is the likely impact of the solution on different stakeholders, and can this be justified? This is often seen in terms of an ethical veto - should we do this at all - but it is perhaps equally valid to think of it in more positive terms - could we do more?

But who gets to decide on desirability - in other words, whose notion of desirability counts - is itself an ethical question. So ethical planning includes working out who shall have a voice in this initiative, and how shall this voice be heard, making sure the stakeholders are properly identified and given a genuine stake. This was always a key element of participative design methodologies such as Enid Mumford's ETHICS method.

Planning also involves questions of scope and interoperability - how is the problem space divided up between multiple separate initiatives, to what extent do these separate initiatives need to be coordinated, and are there any deficiencies in coverage or resource allocation. See my post on the Ethics of Interoperability.

For example, an ethical review might question why medical devices were being developed for certain conditions and not others, or why technologies developed for the police were concentrated on certain categories of crime, and what the social implications of this might be. Perhaps the ethical judgement could be that a solution proposed for condition/crime X can be developed provided that there is a commitment to develop similar solutions for Y and Z. In other words, a full ethical review should look at what is omitted from the plan as well as what is included.

There may be ethical implications of organization and method, especially on large complicated developments involving different teams in different jurisdictions. Chinese Walls, Separation of Concerns, etc.

In an ethical plan, responsibility will be clear and not diffused. Just saying "we are all responsible" is naive and unhelpful. We all know what this looks like: an individual engineer raises an issue to get it off her conscience, a busy project manager marks the issue as "non-critical", the product owner regards the issue as a minor technicality, and so on. I can't even be bothered to explain what's wrong with this, I'll let you look it up in Wikipedia, because it's Somebody Else's Problem.

Regardless of the development methodology, most projects start with a high-level plan, filling in the details as they go along, and renegotiating with the sponsors and other stakeholders for significant changes in scope, budget or timescale. However, some projects are saddled with commercial, contractual or political constraints that make the plans inflexible, and this inflexibility typically generates unethical behaviours (such as denial or passing the buck).

In short, ethical planning is about making sure you are doing the right things, and doing them right. 

Risk Assessment

The risk assessment and impact analysis may often be done at the same time as the planning, but I'm going to regard it as a logically distinct activity. Like planning, it may be appropriate to revisit the risk assessment from time to time: our knowledge and understanding of risks may evolve, new risks may become apparent, while other risks can be discounted.

There are some standards for risk assessment in particular domains. For example, Data Protection Impact Assessment (DPIA) is mandated by GDPR, Information Security Risk Assessment is included in ISO 27001, and risk/hazard assessment for robotics is covered by BS 8611.

The first ethical question here is How Much. It is clearly important that the risk assessment is done with sufficient care and attention, and the results taken seriously. But there is no ethical argument under the sun that says that one should never take any risks at all, or that risk assessment should be taken to such extremes that it becomes paralysing. In some situations (think Climate Change), risk-averse procrastination may be the position that is hardest to justify ethically.

We also need to think about Scope and Perspective. Which categories of harm/hazard/risk are relevant, whose risk is it (in other words, who would be harmed), and from whose point of view? The voice of the stakeholder needs to be heard here as well.


Responsible design takes care of all the requirements, risks and other stakeholder concerns already identified, as well as giving stakeholders full opportunity to identity additional concerns as the design takes shape.

Among other things, the design will need to incorporate any mechanisms and controls that have been agreed as appropriate for the assessed risks. For example, security controls, safety controls, privacy locks. Also designing in mechanisms to support responsible operations - for example, monitoring and transparency.

There is an important balance between Separation of Concerns and Somebody Else's Problem. So while you shouldn't expect every designer on the team to worry about every detail of the design, you do need to ensure that the pieces fit together and that whole system properties (safety, robustness, etc.) are designed in. So you may have a Solution Architecture role (one person or a whole team, depending on scale and complexity) responsible for overall design integrity.

And when I say whole system, I mean whole system. In general, an IoT device isn't a whole system, it's a component of a larger system. A responsible designer doesn't just design a sensor that collects a load of data and sends it into the ether, she thinks about the destination and possible uses and abuses of the data. Likewise, a responsible designer doesn't just design a robot to whizz around in a warehouse, she thinks about the humans who have to work with the robot - the whole sociotechnical system.

(How far does this argument extend? That's an ethical question as well: as J.P. Eberhard wrote in a classic paper, we ought to know the difference.)

Verification, Validation and Testing

This is where we check that the solution actually works reliably and safely, is accessible by and acceptable to all the possible users in a broad range of use contexts, and that the mechanisms and controls are effective in eliminating unnecessary risks and hazards.

See separate post on Responsible Beta Testing.

These checks don't only apply to the technical system, but also the organizational and institional arrangements, including any necessary contractual agreements, certificates, licences, etc. Is the correct user documentation available, and have the privacy notices been updated? Of course, some of these checks may need to take place even before beta testing can start.

Deployment and Operation

As the solution is rolled out, and during its operation, monitoring is required to ensure that the solution is working properly, and that all the controls are effective.

Regulated industries typically have some form of market surveillance or vigilance, whereby the regulator keeps an eye on what is going on. This may include regular inspections and audits. But of course this doesn't diminish the responsibility of the producer or distributor to be aware of how the technology is being used, and its effects. (Including unplanned or "off-label" uses.)

(And if the actual usage of the technology differs significantly from its designed purpose, it may be necessary to loop back through the risk assessment and the design. See my post On Repurposing AI).

There should also be some mechanism for detecting unusual and unforeseen events. For example, the MHRA, the UK regulator for medicines and medical devices, operates a Yellow Card scheme, which allows any interested party (not just healthcare professionals) to report any unusual event. This is significantly more inclusive than the vigilance maintained by regulators in other industries, because it can pick up previously unknown hazards (such as previously undetected adverse reactions) as well as collecting statistics on known side-effects.

Incident Management

In some domains, there are established procedures for investigating incidents such as vehicle accidents, security breaches, and so on. There may also be specialist agencies and accident investigators.

One of the challenges here is that there is typically a fundamental asymmetry of information. Someone who believes they may have suffered harm may be unable to invoke these procedures until they can conclusively demonstrate the harm, and so the burden of proof lies unfairly on the victim.


Finally, we need to think about taking the solution out of service or replacing it with something better. Some technologies (such as blockchain) are designed on the assumption of eternity and immutability, and we are stuck for good or ill with our original design choices, as @moniquebachner pointed out at a FinTech event I attended last year. With robots, people always worry whether we can ever switch the things off.

Other technologies may be just as sticky. Consider the QWERTY keyboard, which was designed to slow the typist down to prevent the letters on a manual typewriter from jamming. The laptop computer on which I am writing this paragraph has a QWERTY keyboard.

Just as the responsible design of physical products needs to consider the end of use, and the recycling or disposal of the materials, so technological solutions need graceful termination.

Note that decommissioning doesn't necessarily remove the need for continued monitoring and investigation. If a drug is withdrawn following safety concerns, the people who took the drug will still need to be monitored; similar considerations may apply for other technological innovations as well.

Final Remarks

As already indicated, this is just an outline (plan). The detailed design may include checklists and simple tools, standards and guidelines, illustrations and instructions, as well as customized versions for different development methodologies and different classes of product. And I am hoping to find some opportunities to pilot the approach.

There are already some standards existing or under development to address specific areas here. For example, I have seen some specific proposals circulating for accident investigation, with suggested mechanisms to provide transparency to accident investigators. Hopefully the activity framework outlined here will provide a useful context for these standards.

Comments and suggestions for improving this framework always welcome.

Notes and References

For my use of the term Activity Viewpoint, see my blogpost Six Views of Business Architecture, and my eBook Business Architecture Viewpoints.

John P. Eberhard, "We Ought to Know the Difference," Emerging Methods in Environmental Design and Planning, Gary T. Moore, ed. (MIT Press, 1970) pp 364-365. See my blogpost We Ought To Know The Difference (April 2013)

Amany Elbanna and Mike Newman, The rise and decline of the ETHICS methodology of systems implementation: lessons for IS research (Journal of Information Technology 28, 2013) pp 124–136

Regulator Links: What is a DPIA? (ICO), Yellow Card Scheme (MHRA)

Wikipedia: Diffusion of Responsibility, Separation of Concerns, Somebody Else's Problem