For German Readers: Es gibt eine deutsche Version dieses Artikels.
There is more than a good name when specifying a Data Quality Measurand. Five important elements are discussed in this blog post. In a subsequent entry the elements will be illustrated with an example.
1. Name of DQ Measurand
A meaningful, descriptive term simplifies the communication between all the parties involved. It should describe the measurand briefly but accurately as possible.
2. Description of the DQ Rule
The description of the data quality rule must be quite detailed, so that the exact tests are clear and and thus the DQ rule can be implemented with a SQL statement or a program. General terms used (e.g. ‘customer’) must be defined, for example, in a general section of the documentation or in a business glossary.
If possible, the rule is developed in collaboration between the IT and business departments, usually in an iterative fashion. Exceptions to the rule (ie situations where a violation of the rule is not a data quality issue) should also be documented.
3. Impact of Rule Violations
In order to prioritize between several DQ problems, it is important to understand the consequences of rule violations. Some general types:
- Are there direct costs incurred by noncompliance?
- Are regulations are being violated?
- Will the DQ problems result in broken processes?
- Does the DQ problem lead to incorrect or skewed performance indicators?
Qualitative and quantitative results should be described as accurately as possible. For this purpose, it is important to have a good overview of the processes and data of the company. It also requires discussion with the various departments entering and using the data.
In order to obtain management support for the elimination of DQ problems, it is essential to identify the consequences of non-action.
4. Root Causes
If you want to permanently fix the DQ problems, it is necessary to identify the root cause of the DQ problem. The system where the wrong information originates (or necessary entries have been omitted) has to be identified. This sounds simple, but typically you have to backtrack through a number of systems to get to the source of the problem. “Five whys” is a good strategy for this task.
The exact description of the error causes is a valuable aid when defining preventive measures and in case of a recurrence of the DQ problem.
5. Correction
Finally,there is a description of how defective records can be corrected:
- If data is missing or even wrong, how can you determine the correct values? Can you retrieve these values from the Internet (e.g. wrong ZIP codes) or do you have to check the paper records? Or do you have to contact the customer and ask for the information?
- Will there be a manual correction (e.g. a user entering data into an ERP system)? Which programs, menus, transactions etc. should be used?
- Is there a program that “automagically” fixes the problems for large amounts of data?
This information enables a quick response when new records for a known DQ problem show up.
Leave a Reply