Predictably Poor MetaData Quality

Big huge disclaimer — post’s title is a play on Jim Harris’s excellent post titled “Predictably Poor Data Quality.”

In fact, it’s Jim’s post that started me thinking about whether data and metadata quality issues stem from the same root source.

I work with a product that, among other things, infers table relationships and proposes ER diagrams based on statistical patterns found in the data.  We analyze the actual data values, rather than documentation and naming conventions, because the truth is often in the data.  (Of course, data quality issues add noise to that truth, but that’s a topic for another day.)

Why do we analyze the data?  Because database documentation, ER diagrams, naming conventions and subject matter experts’ memories are usually incomplete, if not errant or flat-out missing. 

In other words, this solution exists because organizations have predictably poor metadata.

As with data quality, solving metadata quality means addressing the root problems.  Technology solutions can help, but will never resolve the challenge in and of themselves. 

I’d argue that similar human motivations come into play for metadata quality as with data quality.  For example:

  • short-term thinking
  • believing documentation is someone else’s role
  • empire-building
  • lack of understanding of the impact
  • task overload

Jim lists some great behavior data quality readings.  I think they’ll be worth a re-read from a metadata perspective.

What behaviors do you see as root causes for poor metadata quality?

About Beth

I'm interested in all things data. Big data, data quality, database administration, data architecture. Did I mention I love working with data?

View all posts by Beth

10 Responses to “Predictably Poor MetaData Quality”

  1. Henrik Liliendahl Says:

    Nice one Beth.

    I’m working in Europe where we have a simple thing as language as a problem. Is the label in English, French, German, Danish, or is it a misspelled combination?

    Reply

  2. Beth Breidenbach Says:

    Excellent point! I don’t trust column names when designed in a single language, let alone more than one.

    Reply

  3. John Owens Says:

    Nice post, Beth.

    I know that relationships can be inferred from looking at data, but these are conjectural and may well be incorrect.

    The only sure way of knowing whether a relationship is valid is to look at the what functions create and update the data, because it is at this point that relationships are created.

    Relationships are also determined by worldview. However, once you know all functions included in the worldview, you will know what relationships are required and valid.

    Regards
    John

    Reply

  4. Beth Breidenbach Says:

    Nice comment John. We are agreed that human judgement is required to determine the final ‘truth’ of the inferred relationships. Data analysis presents inferences — hypotheses if you will — that typically take a human 5-10x longer to identify manually.

    Using existing code analysis to infer relationships does indeed work for current application logic. If that is the goal and if you have fast and available access to the codebase it’s a viable approach.

    I often work with customers whose applications have evolved over time — and the meanings of columns and their relationships evolved as well. Data analysis picks up those variants and gives me a more complete picture.

    In all cases, regardless of how the relationships are inferred, a focused conversation with the subject matter expert is a necessity to validate those inferences.

    Regards,
    Beth

    Reply

  5. Jim Harris Says:

    Great post Beth,

    Thanks for referencing my post and extending the discussion to include metadata.

    Earlier is the year, I wrote a post called “What’s the Meta with your Data?” that asked how do you define metadata, how do you use metadata, and what the relationship is between metadata and data quality:

    http://www.dataflux.com/dfblog/?p=1624

    The comments on the post provided quite a few different (and all great) perspectives.

    Best Regards,

    Jim

    Reply

  6. Greg Sharrow Says:

    Beth,

    Great thoughts, I also see business requirements that change slowly over time impacting meta data quality. 10 minor changes over a long period of time end up being one big one that impacts the nature of the application.

    Reply

  7. Wo Ai Ni Says:

    Great post on data quality..
    cheers :)

    Reply

  8. press Says:

    This comment has been removed by a blog administrator.

    Reply

  9. Beth Breidenbach Says:

    I’ve decided to remove an “event invite” comment received today. It was quite long…and the Blogger auto-filter originally classified the comment as Spam. It wasn’t Spam (in my opinion) but was definitely more of an advertisement than I’m comfortable with.

    My compromise is to post an announcement of the event, but not the whole 600-word original doc:

    There’s an Informatica-hosted seminar regarding Single View of the Customer and Customer MDM that may be of interest to my readers. It appears to be scheduled in Atlanta and Washington D.C. More information may be found here:
    http://vip.informatica.com/?elqPURLPage=8645

    Full disclaimers: I am an IBM employee. Informatica is one of our competitors. My decisions regarding what I do and do not post on this blog are totally my own, are never influenced by my employer, and do not represent the views or positions of my employer.

    Reply

  10. Palavitsinis Nikos Says:

    Hello to all! Nice post! I am glad that I spent some extra time googling things to retrieve this post, even by chance! I am working on a number of EU-projects related to digital repositories, either focused on learning, culture or strictly science. My conception so far, and this is sth I am pursuing on a PhD level for the time being, is that poor quality “happens”:
    - Because of poor training of the people providing the metadata, and I am talking about domain experts (pedagogists, scientists, etc) and not librarians, as more and more domain experts are involved in metadata annotation nowadays.
    - Because of poor design of the whole process as most of the efforts that take place in these areas, focus on the quantity of content and not its quality per se, or the quality of the metadata that describe the content. And when retrieval services fail to work efficiently, then we try to compensate for something that should have been among our primary concerns… Not pointing fingers or anything, but this is one aspect of this whole topic that I feel has sth to do with poor metadata quality..

    Reply

Leave a Reply