Assessing Data Quality – Improve vs. Maintain

April 16, 2010 by · 1 Comment
Filed under: English 

For German readers: Es gibt eine deutsche Version dieses Blogeintrags.

Last week, I was discussing measuring Data Quality with a customer. For a while it seemed we couldn’t agree on anything, until we realized that we were talking about different types of DQ projects:

  1. A project geared at improving data quality in a specific area
  2. An ongoing effort to make sure data quality stays within accepted levels


Once we talked about these different types, agreement came very easily.

Improving Data Quality

In this type of project, there is an important business reason that requires improving the data quality. Typically, you start with a large number of errors and have to reach a much improved level. In some cases, this level has to Zero, but typically a low number (say, 10) of error cases is acceptable. Examples of this type of DQ project include meeting regulatory requirements or the migration of data to another system.

This is a project in the strict sense: You have to reach your goal by a fixed date. As often these days, the goal has to be clarified after the project has started. For a data quality project this includes identifying important data areas to be improved, defining rules that the data has to conform and a way of identifying non-complying data. When this step is completed, you end up with a number of DQ Measurands (see my previous post on Describing DQ Measurands) and an automated way of measuring the data quality for each specific measurand. Typical projects I’ve worked on had a list of 20 to about 100 measurands that changed a bit over time, but was relatively stable after the initial definition phase.

The main questions that have to be answered in this type of DQ project are:

  • Which issues have been raised and which have been resolved? Which do we still have to work on?
  • Are we on track to getting to an acceptable level of Data Quality by the end date?

Maintaining Data Quality

In contrast to the “Improvement” type of project, a “Maintain” type does not necessarily have an end date but is an ongoing effort. (It may start towards the end of an improvement project when most issues are resolved and should stay that way until the project ends.)

Most of the definition work has already been done by improvement projects, and the maintain project “inherits” these results. Again, the number of DQ measurands may be quite high – even higher than in an improvement project, as over time the rules of multiple improvement projects move into maintenance. The data quality is usually at an acceptable level, so the type of questions are different:

  • Have there been changes that require action?
  • How well do the rules cover all the data in the organization?

Assessing DQ Measurements in different types of DQ projects

A DQ measurand can be defined without having to take into account what type of project it is used for. But interpreting the measurements has to take the project context into account and leads to different interpretations in order to answer the question. My customer and I are still working on the specifics, but identifying the different types of projects helped us gain a shared understanding.

Specifying DQ Measurands: An Example

February 10, 2010 by · 1 Comment
Filed under: English 

For German readers: Es gibt eine deutsche Version dieses Artikels.

In the last blog post, the elements of a DQ Measurand specification were discussed. These elements will be illustrated with an example. 

1. Name

The following DQ Measurand will be detailed:

  • Measurand 1: Partners without a postal address

Some other, simple DQ Measurands could be the following:

  • Measurand 2: Potential partner duplicates
  • Measurand 3: Active home finance loans, where there is no property value for the real estate object used as collateral

2. DQ Rule

Business partners (records in the table BUT000) should be connected to an address (table ADRC) using the table BUT021_FS (using address kind BUT021_FS-ADRVERW = "Postal"). This measurand consists of partners without a link in the BUT021_FS or without a valid record in BUT021_FS (current date and between DATE_FROM and DATE_TO). This rule is only valid for partners that have taken out a loan. (For these partners, there is a record in the table VDGPO with SNUMOBJ = ‘VD’ and ROLETYP = "TR0100".)

3. Impact of Rule Violations

  • Partners without a valid address cannot be contacted via normal mail channels (e.g. for sales information, event invitations, etc.) – broken process.
  • When a partner without an address is to be dunned[?], the address must be manually – broken process, resulting in additional effort/cost.
  • Location is an important part of rating a customer. The risk analysis of a partner without an address is therefore inaccurate – skewed performance indicator.
  • In Germany (as probably in most countries), money transfers out of the country are highly regulated and must be reported to the authorities. Without a valid address this cannot be properly determined – violation of regulation.

4. Root Causes

The following causes have been identified:

  1. Addresses for a partner are entered in the SAP Business Partner (transactions BUP1 and BUP2). One address for a partner has to be assigned as a “postal address” on the "Addresses" tab. Sometimes, when entering a new partner, users forget to assign a “postal address”.
  2. In addition, the use of an address as a “postal address” is time dependent, i.e. you can enter a start and end date. When it is known that a partner will move in the near future (which is normal when a customer takes out a real estate loan), the “postal address” is assigned with an end date, but without entering the new address which is valid after the end date. When the end date is reached and no new postal address is assigned, the partner will not have a valid postal address.

5. Correction

Depending on the situation and causes as noted above, the following correction procedures shall be applied:

  1. When there is no postal address at all, then a user in Division shall use the SAP transaction BUP2 to assign the default address on the tab "Address list" of a partner as a postal address without a specific end date.
  2. If a postal address was entered but lost its validity, the user shall check whether a new default address has been entered. If that is the case this new default address shall be assigned as a postal address as well.
    If no new default address was entered, the customer shall be contacted to determine the new address and entered as described above. If necessary, the default address shall be changed as well.

Describing Data Quality Measurands

February 5, 2010 by · 1 Comment
Filed under: English 

For German Readers: Es gibt eine deutsche Version dieses Artikels.

There is more than a good name when specifying a Data Quality Measurand. Five important elements are discussed in this blog post. In a subsequent entry the elements will be illustrated with an example.

1. Name of DQ Measurand

A meaningful, descriptive term simplifies the communication between all the parties involved. It should describe the measurand briefly but accurately as possible.

2. Description of the DQ Rule

The description of the data quality rule must be quite detailed, so that the exact tests are clear and and thus the DQ rule can be implemented with a SQL statement or a program. General terms used (e.g. ‘customer’) must be defined, for example, in a general section of the documentation or in a business glossary.

If possible, the rule is developed in collaboration between the IT and business departments, usually in an iterative fashion. Exceptions to the rule (ie situations where a violation of the rule is not a data quality issue) should also be documented.

3. Impact of Rule Violations

In order to prioritize between several DQ problems, it is important to understand the consequences of rule violations. Some general types:

  • Are there direct costs incurred by noncompliance?
  • Are regulations are being violated?
  • Will the DQ problems result in broken processes?
  • Does the DQ problem lead to incorrect or skewed performance indicators?

Qualitative and quantitative results should be described as accurately as possible. For this purpose, it is important to have a good overview of the processes and data of the company. It also requires discussion with the various departments entering and using the data.

In order to obtain management support for the elimination of DQ problems, it is essential to identify the consequences of non-action.

4. Root Causes

If you want to permanently fix the DQ problems, it is necessary to identify the root cause of the DQ problem. The system where the wrong information originates (or necessary entries have been omitted) has to be identified. This sounds simple, but typically you have to backtrack through a number of systems to get to the source of the problem. “Five whys” is a good strategy for this task.

The exact description of the error causes is a valuable aid when defining preventive measures and in case of a recurrence of the DQ problem.

5. Correction

Finally,there is a description of how defective records can be corrected:

  • If data is missing or even wrong, how can you determine the correct values? Can you retrieve these values from the Internet (e.g. wrong ZIP codes) or do you have to check the paper records? Or do you have to contact the customer and ask for the information?
  • Will there be a manual correction (e.g. a user entering data into an ERP system)? Which programs, menus, transactions etc. should be used?
  • Is there a program that “automagically” fixes the problems for large amounts of data?

This information enables a quick response when new records for a known DQ problem show up.