Friday, March 19, 2010

What's Wrong with the Single Version of Truth

As @tonyrcollins reports, a confidential report currently in preparation on the NHS Summary Care Records (SCR) database will reveal serious flaws in a massively expensive database (Computer Weekly, March 2010). Well knock me down with a superbug, whoever would have guessed this might happen?

"The final report may conclude that the success of SCRs will depend on whether the NHS, Connecting for Health and the Department of Health can bridge the deep cultural and institutional divides that have so far characterised the NPfIT. It may also ask whether the government founded the SCR on an unrealistic assumption: that the centralised database could ever be a single source of truth."

There are several reasons to be ambivalent about the twin principles Single Version of Truth (SVOT) and Single Source of Truth (SSOT), and this kind of massive failure must worry even the most fervent advocates of these principles.

Don't get me wrong, I have served my time in countless projects trying to reduce the proliferation and fragmentation of data and information in large organizations, and I am well aware of the technical costs and business risks associated with data duplication. However, I have some serious concerns about the dogmatic way these principles are often interpreted and implemented, especially when this dogmatism results (as seems to be the case here) in a costly and embarrassing failure.

The first problem is that Single-Truth only works if you have absolute confidence in the quality of the data. In the SCR example, there is evidence that doctors simply don't trust the new system - and with good reason. There are errors and omissions in the summary records, and doctors prefer to double-check details of medications and allergies, rather than take the risk of relying on a single source.

The technical answer to this data quality problem is to implement rigorous data validation and cleansing routines, to make sure that the records are complete and accurate. But this would create more work for the GP practices uploading the data. Officials at the Department of Health fear that setting the standards of data quality too high would kill the scheme altogether. (And even the most rigorous quality standards would only reduce the number of errors, could never eliminate them altogether.)

There is a fundamental conflict of interest here between the providers of data and the consumers - even though these may be the same people - and between quality and quantity. If you measure the success of the scheme in terms of the number of records uploaded, then you are obviously going to get quantity at the expense of quality.

So the pusillanimous way out is to build a database with imperfect data, and defer the quality problem until later. That's what people have always done, and will continue to do, and the poor quality data will never ever get fixed.

The second problem is that even if perfectly complete and accurate data are possible, the validation and data cleansing step generally introduces some latency into the process, especially if you are operating a post-before-processing system (particularly relevant to environments such as military and healthcare where, for some strange reason, matters of life-and-death seem to take precedence over getting the paperwork right). So there is a design trade-off between two dimensions of quality - timeliness and accuracy. See my post on Joined-Up Healthcare.

The third problem is complexity. Data cleansing generally works by comparing each record with a fixed schema, which defines the expected structure and rules (metadata) to which each record must conform, so that any information that doesn't fit into this fixed schema will be barred or adjusted. Thus the richness of information will be attenuated, and useful and meaningful information may be filtered out. (See Jon Udell's piece on Object Data and the Procrustean Bed from March 2000. See also my presentation on SOA for Data Management.)

The final problem is that a single source of information represents a single source of failure. If something is really important, it is better to have two independent sources of information or intelligence, as I pointed out in my piece on Information Algebra. This follows Bateson's slogan that "two descriptions are better than one". Doctors using the SCR database appear to understand this aspect of real-world information better than the database designers.

It may be a very good idea to build an information service that provides improved access to patient information, for those who need this information. But if this information service is designed and implemented according to some simplistic dogma, then it isn't going to work properly.


Update. The Health Secretary has announced that NHS regulation will be based on a single version of the truth.

"in the future the chief inspector will ensure that there is a single version of the truth about how their hospitals are performing, not just on finance and targets, but on a single assessment that fully reflects what matters to patients"

Roger Taylor, Jeremy Hunt's dangerous belief in a single 'truth' about hospitals (Guardian 26 March 2013)



Updated 28 March 2013

2 comments:

  1. Agreed, this is becoming quite a fundamental problem in the UK public sector.

    As well as the NHS, we have the forthcoming Police National Database (PND) which is aiming for a similar SVOT. I spent much of the past couple of years arguing with people that especially in the policing domain, SVOT did not exist, not least because of the presence of deliberate misinformation (people, especially criminals, often lie!) and fragmentary and missing information (e.g. from witness statements, intelligence reports). Whilst the glint in many people's eyes suggested they knew I was right (or at least on to something) there was little appetite for any action.

    Similarly with the ID cards system, which is attempting to define another SVOT. Presently, personal identity (and let's sidestep exactly what that means for the moment!) is often verified using multiple, independent forms of credential. For example, to open a bank account you might need a passport, birth certificate and a couple of utility bills, all bearing information that can be correlated as belonging to the same person. This works and delivers results we can trust precisely *because* these forms of credential are independent. If or when everything is based on (ultimately derived from or linked to) a single ID card database, this independence will be lost and the result will likely be a *lowering* of confidence in our ability to verify a person's identity.

    There are some very important issues here that deserve wider discussion and appreciation.

    ReplyDelete
  2. Presumably, Doctors' enlightened attitude stems at least in part from the idea of a "second opinion" on diagnoses... As you say, embed the notion of a Single Source of Truth deeply enough, and a second opinion becomes impossible.

    The analogy is useful in other ways, too. With a second opinion, two doctors are not disputing the identity of the patient; they are sceptically reviewing the diagnosis, symptoms, inferences and conclusions.

    Similarly, with patient data, there are two functions we're looking at:

    (i) ensuring that you're talking about the correct patient, and

    (ii) evaluating the attributes which are associated (correctly or not) with that patient.

    Sir James Crosby pushed for the scope of a National Identity Register to be reduced to the minimum: to produce strong evidence in support of a claim of uniqueness within the population. That corresponds to (i). Ideally there should be little or no 'interpretation'/inference involved.

    That principle shouldn't also apply to the attribute/diagnostic data. To take the argument to the extreme: imagine a doctor prescribing treatment for a compelx condition on the basis of the SCR alone - without ever seeing the patient.

    The idea of a Summary Record is at odds with the diagnostic process... in which it is risky to draw conclusions on the basis of only partial information.

    ReplyDelete