Tuesday, August 04, 2020

Data by Design

If your #datastrategy involves collecting and harvesting more data, then it makes sense to check this requirement at an early stage of a new project or other initiative, rather than adding data collection as an afterthought.

For requirements such as security and privacy, the not-as-afterthought heuristic is well established in the practices of security-by-design and privacy-by-design. I have also spent some time thinking and writing about technology ethics, under the heading of responsibility-by-design. In my October 2018 post on Responsibility by Design, I suggested that all of these could be regarded as instances of a general pattern of X-by-design, outlining What,Why, When, For Whom, Who, How and How Much for a given concern X.

In this post, I want to look at three instances of the X-by-design pattern that could support your data strategy:

  • data collection by design
  • data quality by design
  • data governance by design


Data Collection by Design

Here's a common scenario. Some engineers in your organization have set up a new product or service or system or resource. This is now fully operational, and appears to be working properly. However, the system is not properly instrumented.
Thought should always be given to the self instrumentation of the prime equipment, i.e. design for test from the outset. Kev Judge
In the past, it was common for a system is instrumented during the test phase, but once the tests are completed, data collection is switched off for performance reasons.
If there is concern that the self instrumentation can add unacceptable processing overheads then why not introduce a system of removing the self instrumentation before delivery? Kev Judge
Not just for operational testing and monitoring but also for business intelligence. And for IBM, this is an essential component of digital advantage:
Digitally reinvented electronics organizations pursue new approaches to products, processes and ecosystem participation. They design products with attention toward the types of information they need to collect to design the right customer experiences. IBM
The point here is that a new system or service needs to have data collection designed in from the start, rather than tacked on later.


Data Quality by Design

The next pitfall I want to talk about is when a new system or service is developed, the data migration / integration is done in a big rush towards the end of the project, and then - surprise, surprise - the data quality isn't good enough.

Particularly relevant when data is being repurposed. During the pandemic, there was a suggestion of using BlueTooth connection strength as a proxy for the distance between two phones, and therefore an indicator of the distance between the owners of the phones. Although this data might have been adequate for statistical analysis, it was not good enough to justify putting a person into quarantine.


Data Governance by Design

Finally, there is the question of the sociotechnical organization and processes needed to manage and support the data - not only data quality but all other aspects of data governance.

The pitfall here is to believe you can sort out the IT plumbing first, leaving the necessary governance and controls to be added in later. 




Scott Burnett, Reza Firouzbakht, Cristene Gonzalez-Wertz and Anthony Marshall, Using Data by Design (IBM Institute for Business Value, 2018)

Kev Judge, Self Instrumentation and S.I. (undated, circa 2007)

Monday, August 03, 2020

A Cybernetics View of Data-Driven

Cybernetics helps us understand dynamic systems that are driven by a particular type of data. Here are some examples:

  • Many economists see markets as essentially driven by price data.
  • On the Internet (especially social media) we can see systems that are essentially driven by click data.
  • Stan culture, where hardcore fans gang up on critics who fail to give the latest album a perfect score

In a recent interview with Alice Pearson of CRASSH, Professor Will Davies explains the process as follows:

For Hayek, the advantage of the market was that it was a space in which stimulus and response could be in a constant state of interactivity: that prices send out information to people, which they respond to either in the form of consumer decisions or investment decisions or new entrepreneurial strategies.

Davies argued that this is now managed on screens, with traders on Wall Street and elsewhere constantly interacting with (as he says) flashing numbers that are rising and falling.

The way in which the market is visualized to people, the way it presents itself to people, the extent to which it is visible on a single control panel, is absolutely crucial to someone's ability to play the market effectively.

Davies attributes to cybernetics a particular vision of human agency: to think of human beings as black boxes which respond to stimuluses in particular ways that can be potentially predicted and controlled. (In market trading, this thought leads naturally to replacing human beings with algorithmic trading.)

Davies then sees this cybernetic vision encapsulated in the British government approach to the COVID-19 pandemic.

What you see now with this idea of Stay Alert ... is a vision of an agent or human being who is constantly responsive and constantly adaptable to their environment, and will alter their behaviour depending on what types of cues are coming in from one moment to the next. ... The ideological vision being presented is of a society in which the rules of everyday conduct are going to be constantly tweaked in response to different types of data, different things that are appearing on the control panels at the Joint Biosecurity Centre.

The word alert originally comes from an Italian military term all'erta - to the watch. So the slogan Stay Alert implies a visual idea of agency. But as Alice Pearson pointed out, that which is supposed to be the focus of our alertness is invisible. And it is not just the virus itself that is invisible, but (given the frequency of asymptomatic carriers) which people are infectious and should be avoided.

So what visual or other signals is the Government expecting us to be alert to? If we can't watch out for symptoms, perhaps we are expected instead to watch out for significant shifts in the data - ambiguous clues about the effectiveness of masks or the necessity of quarantine. Or perhaps significant shifts in the rules.

Most of us only see a small fraction of the available data - Stafford Beer's term for this is attenuation, and Alice Pearson referred to hyper-attenuation. So we seem to be faced with a choice between on the one hand a shifting set of rules based on the official interpretation of the data - assuming that the powers-that-be have a richer set of data than we do, and a more sophisticated set of tools for managing the data - and on the other hand an increasingly strident set of activists encouraging people to rebel against the official rules, essentially setting up a rival set of norms in which for example mask-wearing is seen as a sign of capitulation to a socialist regime run by Bill Gates, or whatever.
 
Later in the interview, and also in his New Statesman article, Davies talks about a shifting notion of rules, from a binding contract to mere behavioural nudges.

Rules morph into algorithms, ever-more complex sets of instructions, built around an if/then logic. By collecting more and more data, and running more and more behavioural tests, it should in principle be possible to steer behaviour in the desired direction. ... The government has stumbled into a sort of clumsy algorithmic mentality. ... There is a logic driving all this, but it is one only comprehensible to the data analyst and modeller, while seeming deeply weird to the rest of us. ... To the algorithmic mind, there is no such thing as rule-breaking, only unpredicted behaviour.

One of the things that differentiates the British government from more accomplished practitioners of data-driven biopower (such as Facebook and WeChat) is the apparent lack of fast and effective feedback loops. If what the British government is practising counts as cybernetics at all, it seems to be a very primitive and broken version of first-order cybernetics.

When Norbert Wiener introduced the term cybernetics over seventy years ago, describing thinking as a kind of information processing and people as information processing organisms, this was a long way from simple behaviourism. Instead, he emphasized learning and creativity, and insisted on the liberty of each human being to develop in his freedom the full measure of the human possibilities embodied in him.
 
In a talk on the entanglements of bodies and technologies, Lucy Suchman draws on an article by Geoff Bowker to describe the universal aspirations of cybernetics.
 
Cyberneticians declared a new age in which Darwin's placement of man as one among the talks about how animals would now be followed by cybernetics' placement of man as one among the machines.
 
However, as Suchman reminds us
 
Norbert Wiener himself paid very careful attention to questions of labour, and actually cautioned against the too-broad application of models that were designed in relation to physical or computational systems to the social world.

Even if sometimes seeming outnumbered, there have always been some within the cybernetics community who are concerned about epistemology and ethics. Hence second-order (or even third-order) cybernetics.



Ben Beaumont-Thomas, Hardcore pop fans are abusing critics – and putting acclaim before art (The Guardian, 3 August 2020)

Geoffrey Bowker, How to be universal: some cybernetic strategies, 1943-1970 (Social Studies of Science 23, 1993) pp 107-127
 
Philip Boxer & Vincent Kenny, The economy of discourses - a third-order cybernetics (Human Systems Management, 9/4 January 1990) pp 205-224
 

Will Davies, Coronavirus and the Rise of Rule-Breakers (New Statesman, 8 July 2020)

Lucy Suchman, Restoring Information’s Body: Remediations at the Human-Machine Interface (Medea, 20 October 2011) Recording via YouTube
 
Norbert Wiener, The Human Use of Human Beings (1950, 1954)

Stanford Encyclopedia of Philosophy: A cybernetic view of human nature

Wednesday, July 29, 2020

Information Advantage (not necessarily) in Air and Space

Some good down-to-earth points from #ASPC20 @airpowerassn 's Air and Space Power Conference earlier this month. Although the material was aimed at a defence audience, much of the discussion is equally relevant to civilian and commercial organizations interested in information superiority (US) or information advantage (UK).

Professor Dame Angela Mclean, who is the Chief Scientific Advisor to the MOD, defined information advantage thus:

The credible advantage gained through the continuous, decisive and resilient employment of information and information systems. It involves exploiting information of all kinds to improve every aspect of operations: understanding, decision-making, execution, assessment and resilience.

She noted the temptation for the strategy to jump straight to technology (technology push); the correct approach is to set out ambitious, enduring capability outcomes (capability pull), although this may be harder to communicate. Nevertheless, technology push may make sense in those areas where technologies could contribute to multiple outcomes.

She also insisted that it was not enough just to have good information, it was also necessary to use this information effectively, and she called for cultural change to drive improved evidence-based decision-making. (This chimes with what I've been arguing myself, including the need for intelligence to be actioned, not just actionable.)

In his discussion of multi-domain integration, General Sir Patrick Sanders reinforced some of the same points.
  • Superiority in information (is) critical to success
  • We are not able to capitalise on the vast amounts of data our platforms can deliver us, as they are not able to share, swap or integrate data at a speed that generates tempo and advantage
  • (we need) Faster and better decision making, rooted in deeper understanding from all sources and aided by data analytics and supporting technologies

See my previous post on Developing Data Strategy (December 2019) 


Professor Dame Angela Mclean, Orienting Defence Research to anticipate and react to the challenges of a future information-dominated operational environment (Video)

General Sir Patrick Sanders, Cohering Joint Forces to deliver Multi Domain Integration (Air and Space Power Conference, 15 July 2020) (Video, Official Transcript)

For the full programme, see https://www.airpower.org.uk/air-space-power-conference-2020/programme/

Wednesday, July 22, 2020

Encouraging Data Innovation

@BCSDMSG and @DAMAUK ran an online conference last month, entitled Delivering Value Through Data. Videos are now available on YouTube.

The conference opened with a very interesting presentation by Peter Thomas (Prudential Regulation Authority, part of the Bank of England). Some key takeaways:

The Bank of England is a fairly old-fashioned institution. The data programme was as much a cultural shift as a technology shift, and this was reflected by a change in the language – from data management to data innovation.

Challenges: improve the cadence of situation awareness, sense-making and decision-making.

One of Peter's challenges was to wean the business off Excel. The idea was to get data straight into Tableau, bypassing Excel. Peter referred to this as straight-through processing, and said this was the biggest bang for the buck.

Given the nature of his organization, the link between data governance and decision governance is particularly important. Peter described making governance more effective/efficient by reducing the number of separate governance bodies, and outlined a stepwise approach for persuading people in the business to accept data ownership:
  1. You are responsible for your decisions
  2. You are responsible for your interpretation of the data used in your decisions
  3. You are responsible for your requests and requirements for data.
Some decisions need to be taken very quickly, in crisis management mode. (This is a particular characteristic of a regulatory organization, but also relevant to anyone dealing with COVID-19.) Then if they can cut through the procrastination in such situations, this should create a precedent for doing things more quickly in Business-As-Usual mode.

Finally, Peter reported some tension between two camps – those who want data and decision management to be managed according to strict rules, and those who want the freedom to experiment. Enterprise-wide innovation needs to find a way to reconcile these camps.

Plenty more insights in the video, including the Q&A at the end - well worth watching.

Peter Thomas, Encouraging Data Innovation (BCS via YouTube, 15 June 2020)

Friday, July 10, 2020

Three Types of Data - Bronze, Gold and Mercury

In this post, I'm going to look at three types of data, and the implications for data management. For the purposes of this story, I'm going to associate these types with three contrasting metals: bronze, gold and mercury. (Update: fourth type added - scroll down for details.)


Bronze

The first type of data represents something that happened at a particular time. For example, transaction data: this customer made this purchase of this product on this date. This delivery was received, this contract was signed, this machine was installed, this notification was sent.

Once this kind of data is correctly recorded, it should never change. Even if an error is detected in a transaction record, the usual procedure is to add two more transaction records - one to reverse out the incorrect values, and one to reenter the correct values. 

For many organizations, this represents by far the largest portion of data by volume. The main data management challenges tend to be focused on the implications of this - how much to collect, where to store it, how to move it around, how soon can it be deleted or archived.


Gold

The second type of data represents current reality. This kind of data must be promptly and efficiently updated to reflect real-world changes. For example, the customer changes address, an employee moves to a different department. Although the changes themselves may be registered as Bronze Data, what we really want to know is where does the customer now reside, where does Sam now work.

Some of these updates can be regarded as simple facts, based on observations or reports (a customer tells us her address). Some updates are derived from other data, using calculation or inference rules. And some updates are based on decisions - for example, the price of this product shall be X. 

And not all of these updates can be trusted. If you receive an email from a supplier requesting payment to a different bank account, you probably want to check that this email is genuine before updating the supplier record.

Typically much smaller data volumes than Bronze, but much more critical to the business if you get it wrong.


Mercury

Finally, we have data with a degree of uncertainty, including estimates and forecasts. This data is fluid, it can move around for no apparent reason. It can be subjective or based on unreliable or partial sources. Nevertheless, it can be a rich source of insight and intelligence.

This category also includes projected and speculative data. For example, we might be interested in developing a fictional "what if" scenario - what if we opened x more stores, what if we changed the price of this product to y?

For some reason, an estimate that is generated by an algorithm or mathematical model is sometimes taken more seriously than an estimate pulled out of the air by a subject matter expert. However, as Cathy O'Neill reminds us, algorithms are themselves merely opinions embedded in code.

If you aren't sure whether to trust an estimate, you can scrutinize the estimation process. For example, you might suspect that the subject matter expert provides more optimistic estimates after lunch. Or you could just get a second opinion. Two independent but similar opinions might give you more confidence than one extremely precise but potentially flawed opinion.

As well as estimates and forecasts, Mercury data may include assessments of various kinds. For example, we may want to know a customer's level of satisfaction with our products and services. Opinion surveys provide some relevant data points, but what about the customers who don't complete these surveys? And what if we pick up different opinions from different individuals within a large customer organization? In any case, these opinions change over time, and we may be able to correlate these shifts in opinion with specific good or bad events.

Thus Mercury data tend to be more complex than Bronze or Gold data, and can often be interpreted in different ways.


Update: Glass

@tonyjoyce suggests a fourth type.


This is a great insight. If you are not careful, you will end up with pieces of broken glass in your data. While this kind of data may be necessary, it is fragile and has to be treated with due care, and can't just be chucked around like bronze or gold.

Single Version of Truth (SVOT)

Bronze and Gold data usually need to be reliable and consistent. If two data stores have different addresses for the same customer, this could indicate any of the following errors.
  • The data in one of the data stores is incorrect or out-of-date. 
  • It’s not the same customer after all. 
  • It’s not the same address. For example, one is the billing address and the other is the delivery address.
For the purposes of data integrity and interoperability, we need to eliminate such errors. We then have a single version of the truth (SVOT), possibly taken from a single source of truth (SSOT).

Facts and derivations may be accurate or inaccurate. In the case of a simple fact, inaccuracy may be attributed to various causes, including translation errors, carelessness or dishonesty. Calculations may be inaccurate either because the input data are inaccurate or incomplete, or because there is an error in the derivation rule itself. (However, the derived data can sometimes be more accurate or useful, especially if random errors and variations are smoothed out.)

For decisions however, it doesn’t make sense to talk about accuracy / inaccuracy, except in very limited cases. Obviously if someone decides the price of an item shall be x pounds, but this is incorrectly entered into the system as x pence, this is going to cause problems. But even if x pence is the wrong price, arguably it is what the price is until someone fixes it.


Plural Version of Truth (PVOT) 

But as I've pointed out in several previous posts, the Single Version of Truth (SVOT) or Single Source of Truth (SSOT) isn't appropriate for all types of data. Particularly not Mercury Data. When making sense of complex situations, having alternative views provides diversity and richness of interpretation.

Analytical systems may be able to compare alternative data values from different sources. For example, two forecasting models might produce different estimates of the expected revenue from a given product. Intelligent use of these estimates doesn’t entail choosing one and ignoring the other. It means understanding why they are different, and taking appropriate action.

Or what about conflicting assessments? If we are picking up a very high satisfaction score from some parts of the customer organization, and a low satisfaction score from other parts of the same organization, we shouldn't simply average them out. The difference between these two scores could be telling us something important, might be revealing an opportunity to engage differently with the two parts of the customer.

And for some kinds of Mercury Data, it doesn't even make sense to ask whether they are accurate or inaccurate. Someone may postulate x more stores, but this doesn’t imply that x is true, or even likely, merely speculative. And this speculative status is inherited by any forecasts or other calculations based on x. (Just look at the discourse around COVID data for topical examples.)


Master Data Management (MDM)

The purpose of Master Data Management is not just to provide a single source of data for Gold Data - sometimes called the Golden Record - but to provide a single location for updates. A properly functioning MDM solution will execute these updates consistently and efficiently, and ensure all consumers of the data (whether human or software) are using the updated version.

There is an important connection to draw out between master data management and trust.

In order to trust Bronze Data, we simply need some assurance that it is correctly recorded and can never be changed. (“The moving finger writes …”) In some contexts, a central authority may be able to provide this assurance. In systems with no central authority, Blockchain can guarantee that a data item has not been changed, although Blockchain alone cannot guarantee that it was correctly recorded in the first place.

For Gold Data, trustworthiness is more complicated, as there will need to be an ongoing series of automatic and manual updates. Master data management will provide the necessary sociotechnical superstructure to manage and control these updates. For example, what are the controls on updating a supplier's bank account details?

There will always be requirements for data integrity between Bronze Data and Gold Data. Firstly, there will typically be references from Bronze Data to Gold Data. For example, a transaction record may reference a specific customer purchasing a specific product. And secondly, there may be attributes of the Gold Data that are updated as a result of each transaction. For example, the stock levels of a product will be affected by sales of that product.

However, as we've seen, the data management challenges of Bronze Data are not the same as the challenges for Gold Data. And the challenges of Mercury Data are different again. So it is better to focus your MDM efforts exclusively on Gold Data. (And avoid splinters of Glass.)



Post prompted by a discussion on Linked-In with Robert Daniels-Dwyer, Steve Fisher and Steve Lenny. https://www.linkedin.com/posts/danielsdwyer_dataarchitecture-datamanagement-enterprisearchitecture-activity-6673873980866228224-Mu1O

Updated 15 July 2020 following suggestion by Tony Joyce.

Friday, March 27, 2020

Data Strategy - More on Agility

Continuing my exploration of the four dimensions of Data Strategy. In this post, I bring together some earlier themes, including Pace Layering and Trimodal.

The first point to emphasize is that there are many elements to your overall data strategy, and these don't all work at the same tempo. Data-driven design methodologies such as Information Engineering (especially the James Martin version) were based on the premise that the data model was more permanent than the process model, but it turns out that this is only true for certain categories of data.

So one of the critical requirements for your data strategy is to manage both the slow-moving stable elements and the fast-moving agile elements. This calls for a layered approach, where each layer has a different rate of change, known as pace-layering.

The concept of pace-layering was introduced by Stewart Brand. In 1994, he wrote a brilliant and controversial book about architecture, How Buildings Change, which among other things contained a theory about evolutionary change in complex systems based on earlier work by the architect Frank Duffy. Although Brand originally referred to the theory as Shearing Layers, by the time of his 1999 book he had switched to calling it Pace Layering. If there is a difference between the two, Shearing Layers is primarily a descriptive theory about how change happens in complex systems, while Pace Layering is primarily an architectural principle for the design of resilient systems-of-systems.

In 2006, I was working as a software industry analyst, specializing in Service-Oriented Architecture (SOA). Microsoft invited me to Las Vegas to participate in a workshop with other industry analysts, where (among other things) I drew the following layered picture.

SPARK Workshop Day 2

Here's how I now draw the same picture for data strategy. It also includes a rough mapping to the Trimodal approach.











Giles Slinger and Rupert Morrison, Will Organization Design Be Affected By Big Data? (J Org Design Vol 3 No 3, 2014)

Wikipedia: Information Engineering, Shearing Layers 

Related Posts: Layering Principles (March 2005), SPARK 2 - Innovation or Trust (March 2006), Beyond Bimodal (May 2016), Data Strategy - Agility (December 2019)

Wednesday, March 04, 2020

Economic Value of Data

How far can general principles of asset management be applied to data? In this post, I'm going to look at some of the challenges of putting monetary or non-monetary value on your data assets.

Why might we want to do this? There are several reasons why people might be interested in the value of data.
  • Establish internal or external benchmarks
  • Set measurable targets and track progress
  • Identify underutilized assets
  • Prioritization and resource allocation
  • Threat modelling and risk assessment (especially in relation to confidentiality, privacy, security)
Non-monetary benchmarks may be good enough if all we want to do is compare values - for example, this parcel of data is worth a lot more than that parcel, this process/practice is more efficient/effective than that one, this initiative/transformation has added significant value, and so on.

But for some purposes, it is better to express the value in financial terms. Especially for the following:
  • Cost-benefit analysis – e.g. calculate return on investment
  • Asset valuation – estimate the (intangible) value of the data inventory – e.g. relevant for flotation or acquisition
  • Exchange value – calculate pricing and profitability for traded data items

There are (at least) five entirely different ways to put a monetary value on any asset.
  • Historical Cost The total cost of the labour and other resources required to produce and maintain an item. 
  • Replacement Cost The total cost of the labour and other resources that would be required to replace an item. 
  • Liability Cost The potential damages or penalties if the item is lost or misused. (This may include regulatory action, reputational damage, or commercial advantage to your competitors, and may bear no relation to any other measure of value.) 
  • Utility Value The economic benefits that may be received by an actor from using or consuming the item. 
  • Market Value The exchange price of an item at a given point in time. The amount that must be paid to purchase the item, or the amount that could be obtained by selling the item. 

But there are some real difficulties in doing any of this for data. None of these difficulties are unique to data, but I can't think of any other asset class that has all of these difficulties multiplied together to the same extent.

  • Data is an intangible asset. There are established ways of valuing intangible assets, but these are always somewhat more complicated than valuing tangible assets.
  • Data is often produced as a side-effect of some other activity. So the cost of its production may already be accounted for elsewhere, or is a very small fraction of a much larger cost.
  • Data is a reusable asset. You may be able to get repeated (although possibly diminishing) benefit from the same data.
  • Data is an infinitely reproducible asset. You can sell or share the same data many times, while continuing to use it yourself. 
  • Some data loses its value very quickly. If I’m walking past a restaurant, this information has value to the restaurant. Ten minutes later I'm five blocks away, and the information is useless. And even before this point, suppose there are three restaurants and they all have access to the information that I am hungry and nearby. As soon as one of these restaurants manages to convert this information, its value to the remaining restaurants becomes zero or even negative. 
  • Data combines in a non-linear fashion. Value (X+Y) is not always equal to Value (X) + Value (Y). Even within more tangible asset classes, we can find the concepts of Assemblage and Plottage. For data, one version of this non-linearity is the phenomenon of information energy described by Michael Saylor of MicroStrategy. And for statisticians, there is also Simpson’s Paradox.


The production costs of data can be estimated in various ways. One approach is to divide up the total ICT expenditure, estimating roughly what proportion of the whole to allocate to this or that parcel of data. This generally only works for fairly large parcels - for example, this percent to customer transactions, this percentage to transport and logistics, etc.  Another approach is to work out the marginal or incremental cost: this is commonly preferred when considering new data systems, or decommissioning old ones. We can compare the effort consumed in different data domains, or count the number of transformation steps from raw data to actionable intelligence.

As for the value of the data, there are again many different approaches. Ideally, we should look at the use-value or performance value of the data - what contribution does it make to a specific decision or process, or what aggregate contribution does it make to a given set of decisions and processes. 
  • This can be based on subjective assessments of relevance and usefulness, perhaps weighted by the importance of the decisions or processs where the data are used. See Bill Schmarzo's blogpost for a worked example.
  • Or it may be based on objective comparisons of results with and without the data in question - making a measurable difference to some key performance indicator (KPI). In some cases, the KPI may be directly translated into a financial value. 
However, comparing performance fairly and objectively may only be possible for organizations that are already at a reasonable level of data management maturity.

In the absence of this kind of metric, we can look instead at the intrinsic value of the data, independently of its potential or actual use. This could be based on a weighted formula involving such quality characteristics as accuracy, alignment, completeness, enrichment, reliability, shelf-life, timeliness, uniqueness, usability. (Gartner has published a formula that uses a subset of these factors.)

Arguably there should be a depreciation element to this calculation. Last year's data is not worth as much as this year's data, and the accuracy of last year's data may not be so critical, but the data is still worth something.

An intrinsic measure of this kind could be used to evaluate parcels of data at different points in the data-to-information process. For example, showing the increase of enrichment and usability from 1. to 2. and from 2. to 3., and therefore giving a measure of the added-value produced by the data engineering team that does this for us.
    1. Source systems
    2. Data Lake – cleansed, consolidated, enriched and accessible to people with SQL skills
    3. Data Visualization Tool – accessible to people without SQL skills

If any of my readers know of any useful formulas or methods for valuing data that I haven't mentioned here, please drop a link in the comments.



Heather Pemberton Levy, Why and How to Value Your Information as an Asset (Gartner, 3 September 2015)

Bill Schmarzo, Determining the Economic Value of Data (Dell, 14 June 2016)

Wikipedia: Simpson's Paradox, Value of Information

Related posts: Information Algebra (March 2008), Does Big Data Release Information Energy? (April 2014), Assemblage and Plottage (January 2020)