Tuesday, August 10, 2021

Data as pictures?

Many people believe that data should provide a faithful representation or picture of the real world. While this is often a helpful simplification, it can sometimes mislead.

Firstly, the picture theory isn't very good at handling probability and uncertainty. When faced with alternative pictures (facts), people may try to pick the most likely or attractive one, and then act as if this were the truth. 

As I see it, the problem of knowledge and uncertainty fundamentally disrupts our conventional assumptions about representation, in much the same way that quantum physics disrupts our assumptions about reality. See previous posts on Uncertainty.

Secondly, the picture theory misrepresents judgements (whether human or algorithmic) as descriptions. When a person is classified as a poor credit risk, or as a potential criminal or terrorist, this is a speculative judgement about the future, which is often sadly self-fulfilling. For example, when a person is labelled and treated as a potential criminal, this may make it more difficult for them to live as a law-abiding citizen, and they are therefore steered towards a life of crime. Data of this kind may therefore be performative, in the sense that it creates the reality that it claims to describe.

Thirdly, the picture theory assumes that any two facts must be consistent, and simple facts can easily be combined to produce more complex facts. Failures of consistency or composition can then only be explained (and fixed) in terms of data quality and governance. See my post on Three Responses to Inconsistency (December 2003).

Furthermore, a good picture is one that can be verified. Nothing wrong with verification, of course, but the picture theory can sometimes lead to a narrow-minded approach to validation and verification. There may also be an assumption of completeness, treating a dataset as if it provided a complete picture of some clearly delineated domain. (The world is determined by the facts, and by their being all the facts.)


However, although there are some serious limitations with the picture theory, it may sometimes be an acceptable simplification, or even an enabling prejudice. One of the dimensions of data strategy is reach - developing a broad data culture across the organization and its ecosystem by making more data and tools available to a wider community of people. And if some form of the picture theory helps people get started on the ladder towards data mastery, that may not be a bad thing after all. (Hopefully they can throw away the ladder after they have climbed up it.)



 

Daniel C. Dennett, A Difference That Makes a Difference: A Conversation (Edge, 22 November 2017) 

Aaron Sloman, What Did Bateson Mean? (originally posted January 2011, revised October 2018)


See also Architecture and Reality (November 2012), From Sedimented Principles to Enabling Prejudices (March 2013), Data Strategy - Reach (December 2019), On the performativity of data (August 2021)

3 comments:

  1. A bit of a side comment - but in the spirit of 'my electronic image in the machine may be more real than I am' (https://twitter.com/paulpangaro/status/1395385594402967554?s=20) and Shoshana Zuboff, of course one way to go is to ensure that those *are* self-fulfilling prophecies, give up trying to map to the world and map the world to your data. Surprisingly feasible if you have the power of (say) Facebook, and just what a game-theory AI would do!

    ReplyDelete
  2. But what you're talking about here, I think, is a metarational approach to data - which is a critical step forward.

    ReplyDelete
  3. The question whether we are mapping the data to the world or the world to the data is sometimes called Direction of Fit.

    https://plato.stanford.edu/entries/speech-acts/#DirFit

    ReplyDelete