Showing posts with label metadata. Show all posts
Showing posts with label metadata. Show all posts

Saturday, September 04, 2021

Metadata as a Process

Data sharing and collaboration between different specialist areas requires agreement and transparency about the structure and meaning of the data. This is one of the functions of metadata.

I've been reading a paper (by Professor Paul Edwards and others) about the challenges this poses in interdisciplinary scientific research. They identify four characteristic features of scientific metadata, noting that these features can be found within a single specialist discipline as well as cross-discipline.

  • Fragmentation - many people contributing, no overall control
  • Divergent - multiple conflicting versions (often in Excel spreadsheets)
  • Iterative - rarely right first time, lots of effort to repair misunderstandings and mistakes
  • Localized - each participant is primarily focused on their own requirements rather than the global picture

They make two important distinctions, which will be relevant to enterprise data management as well.

Firstly between product and process. Instead of trying to create a static, definitive set of data definitions and properties, which will completely eliminate the need for any human interaction between the data creator and data consumer, assume that an ongoing channel of communication will be required to resolve emerging issues dynamically. (Some of the more advanced data management tools can support this.)

Secondly between precision and lubrication. Tight coupling between two systems requires exact metadata, but interoperability might also be achievable with inexact metadata plus something else to reduce any friction. (Metadata as the new oil, perhaps?)

Finally, they observe that metadata typically falls into the category of almost standards.

Everyone agrees they are a good idea, most have some such standards, yet few deploy them completely or effectively.

Does that sound familiar? 



J Bates, The politics of data friction (Journal of Documentation, 2017)

Paul Edwards, A Vast Machine (MIT Press 2010). I haven't read this book yet, but I found a review by Danny Yee (2011)

Paul Edwards, Matthew Mayernik, Archer Batcheller, Geoffrey Bowker and Christine Borgman, Science Friction: Data, Metadata and Collaboration (Social Studies of Science 41/5, October 2011), pp. 667-690. 

Martin Thomas Horsch, Silvia Chiacchiera, Welchy Leite Cavalcanti and Björn Schembera, Research Data Infrastructures and Engineering Metadata. In Data Technology in Materials Modelling (Springer 2021) pp 13-30

Jillian Wallis, Data Producers Courting Data Reusers: Two Cases from Modeling Communities (International Journal of Digital Curation, 2014, 9/1, 2014) pp 98–109

Thursday, June 10, 2004

Autonomous Services

Comments on ServerSide article by Stuart Charlton 

"XML Web Services are actually a specific case of the larger trend of leveraging extensible metadata as a means to make software more flexible and re-usable." 

This is only a partial characterization. Web services exemplify several interesting trends apart from this one, and have a range of other beneficial outcomes besides software flexibility and reuse. 

But Charlton's focus on leveraging extensible metadata is useful, and raises some important questions about process and order. The value of extensibility depends on who does the extending, and how. Uncontrolled and ungoverned extensibility leads to XML chaos, and does not contribute to the reuse agenda. 

 "[Do not assume] that the developer of one service has any authority to change the implementation another service. [Assume] that in any distributed system, each participating service is autonomous, allowing each to evolve independently of the other." 

This injunction is valid, but doesn't go far enough. With installed software components, the developer may have no right to change the code, but usually has the right to stick with the code she's got – for example, to decline or defer version upgrades. With services, she may have no authority to prevent changes in implementation, and may not even have the right to be notified of such changes, since the service providers may assume (not always correctly) that internal changes will not affect the external characteristics of the service. 

"XML represents a new data model that differs significantly from the traditional programming models of relations and objects."

XML is a notation that allows relational, object and other styles of data model to be expressed. However, XML's preferred style of data model is a tree structure: the traditional hierarchical data model, on which DL/1 and IMS/DB were based. Older developers may like to think of XML as DL/1 on acid. 

Charlton ends his article by recommending different styles of data model for different categories of data. This is useful practical advice for developers, but leaves open the architectural question of joined-up data: how to manage business adaptability and data integrity across these different styles.