Duplicate Contacts – Where do they come from? (Or: McFly, you’re a slacker)

When dealing with data quality issues such as duplicates, the question people usually focus on is “how do I get rid of the duplicate records?” While this is important, this does not remove the cause of the problem and usually leads to ongoing or recurring cleaning efforts. Therefore, if you really want to resolve a data quality issue, you have to ask the question of  “where do the duplicates come from?”

As with almost all data quality issues, there are easy answers to this question:

“McFly, you’re a slacker”
(image from http://images4.wikia.nocookie.net)

After the teacher needling Marty, I’m also calling this the “Strickland Explanation”.

or a bit less harsh

“I don’t have time!”
(image from http://images.forum-auto.com/)

While these explanations may be true in some cases, they are not very helpful: They insult the people that you enter the data, making them a lot less willing to help you resolve the problem. Also, it shows that you haven’t thought about the problem or discussed the issue with the users – as there are always other explanations that take a bit of effort to unearth.

Here’s a list of issues that I found lead to duplicates in your address book:

  1. Technical Limitations
    Older versions of address books were limited in the number of fields you had available. One typical issue is that you could only have one email address or phone-number. If you wanted to store multiple phone numbers for a person (e.g. the home number, the work number and the cell phone) you had to create multiple records that have the same name, but different details. This is especially true for address book programs from older phones.
  2. Synching Gone Wrong
    Synching is a surprisingly difficult problem, and almost everyone has their own horror stories of synchs that have gone wrong. Typically issues are an extra copy of each record, information showing up in unexpected places (for example a zip code being stored in a phone number) or some information being lost during synch (one of my pet peeves is the birthdate).
  3. Information Hoarders
    Some email programs had the option of storing the email address of each person that sent you an email in your address book, resulting in a large number of “sparse” records (records that may consist of only an email address or just a phone number, but not a proper name). Also, the large number of resulting records makes it hard to figure out if multiple records belong to the same contact or if you already for a record for a person.

I’m sure that there are even more causes for duplicate contacts. Please note that these causes require completely different approaches than the “Strickland Explanation”. I will have a closer look at these in a future post.

Where do the duplicates in your address book come from?

IconSmarterContactsAppstore_BadgeIf you want to find duplicate contacts, please give my iPhone app “SmarterContacts” a try. You can find it on the app store. Please let me know when you identify other causes for duplicates so I can update this post and provide additional functionality in my app.


Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!