Assessing Data Quality – Improve vs. Maintain

April 16, 2010 by · 1 Comment
Filed under: English 

For German readers: Es gibt eine deutsche Version dieses Blogeintrags.

Last week, I was discussing measuring Data Quality with a customer. For a while it seemed we couldn’t agree on anything, until we realized that we were talking about different types of DQ projects:

  1. A project geared at improving data quality in a specific area
  2. An ongoing effort to make sure data quality stays within accepted levels


Once we talked about these different types, agreement came very easily.

Improving Data Quality

In this type of project, there is an important business reason that requires improving the data quality. Typically, you start with a large number of errors and have to reach a much improved level. In some cases, this level has to Zero, but typically a low number (say, 10) of error cases is acceptable. Examples of this type of DQ project include meeting regulatory requirements or the migration of data to another system.

This is a project in the strict sense: You have to reach your goal by a fixed date. As often these days, the goal has to be clarified after the project has started. For a data quality project this includes identifying important data areas to be improved, defining rules that the data has to conform and a way of identifying non-complying data. When this step is completed, you end up with a number of DQ Measurands (see my previous post on Describing DQ Measurands) and an automated way of measuring the data quality for each specific measurand. Typical projects I’ve worked on had a list of 20 to about 100 measurands that changed a bit over time, but was relatively stable after the initial definition phase.

The main questions that have to be answered in this type of DQ project are:

  • Which issues have been raised and which have been resolved? Which do we still have to work on?
  • Are we on track to getting to an acceptable level of Data Quality by the end date?

Maintaining Data Quality

In contrast to the “Improvement” type of project, a “Maintain” type does not necessarily have an end date but is an ongoing effort. (It may start towards the end of an improvement project when most issues are resolved and should stay that way until the project ends.)

Most of the definition work has already been done by improvement projects, and the maintain project “inherits” these results. Again, the number of DQ measurands may be quite high – even higher than in an improvement project, as over time the rules of multiple improvement projects move into maintenance. The data quality is usually at an acceptable level, so the type of questions are different:

  • Have there been changes that require action?
  • How well do the rules cover all the data in the organization?

Assessing DQ Measurements in different types of DQ projects

A DQ measurand can be defined without having to take into account what type of project it is used for. But interpreting the measurements has to take the project context into account and leads to different interpretations in order to answer the question. My customer and I are still working on the specifics, but identifying the different types of projects helped us gain a shared understanding.