Sometimes repeating yourself is redundant.

Pictures of random mailboxes


Sometimes repeating yourself is redundant. The same is true for your data.

After working in several industries, I have found that any company is only as good as its records. According to CIO Magazine, two-percent duplicates in your database is among the realm of acceptable. However, once a database reaches five-percent is when they are in the danger zone. Once your database is in the “danger zone” reports become misleading and data updates become lost. 

One might ask, how do duplicates happen? To name a few reasons: people registering under a few different emails, those with formal full names using a nickname (e.g. Steve vs. Steven), spelling, and other various reasons that can have the same effect. You can read our blog Duplicating the Duplicated Duplicates for more on this topic.

CIO Magazine recommends using de-duping software and zooming in on exactly what makes you have too much of that good data. 

There are some companies that still rely on humans to do their de-duping. For example, a company that I worked for that used medical records, relied solely on humans to simply fill out paperwork every time a duplicate was found. This paperwork was then put in a pipeline of bureaucracy in the hopes that the duplicate got fixed and the process did not need to be repeated. 

I believe there is a happy medium between these two solutions. Not every company can afford “cutting edge software” and clearly leaving a paper trail to be rid of a duplicate on a computer database is impractical. Stay under five-percent, my friends. 

Blog Tags
Related Service

Deduplicating the Duplicated Duplicates

One of the challenges that all CRM administrators have is how do you manage duplicates. In today's world data comes from many sources. Multiple people are entering data which can include emails, physical addresses, and phone numbers can be temporary. Information on individuals and companies is often intensely duplicated, similar but not a duplicate, and more, making deduplication a virtual nightmare. 

Similar Person from CiviCRM

Above is a common example.  Bill Gates is already in our CRM.  Bill Gates gives his card to another member of the staff and so someone tries to enter Bill Gates again.  But this time as William Gates.  In the example above, CiviCRM, a CRM for nonprofits identified this as a possible duplicate and recommended that a new record not to be created.  

What gets even harder is when users enter their own data.  Let's look at this example another way.  Bill Gates is in our CRM.  Mr.  Gates wants to come to the Gala for our organization so he comes to the web page to register. Given the formality of the event, he registers as William Gates and uses his personal email rather than his business email that we have on file.  

If we were to merge Bill Gates and William Gates together with just the name and the different email than there is a high likelihood that we merged the wrong people. Bill Gates of Microsoft fame is the most likely answer but the system is not dealing with celebrity names it is just matching characters.  What about Sarah Smith or Bob Jones.  

In general, I like using first and last name and email address to equal a record that can be automatically merged without human intervention.  We can also use the same formula with birth date or phone number or city, state, and zip.  

The reason you cannot just use the email address is that often married couples use the same address. Sam and Jessica Barton can both use the same Gmail but we often want to have a record for each.  Also, you might have two contacts with and we do not want to merge these records based just on the email field.  

What Do You Do When You Find a Merge

Duplicates are found in three ways.  

  1. CRM users entering data 
  2. Vistors entering their information 
  3. Automated deduplication  

In the first example,  we can trust that the person being told that there is a duplicate or might be a duplicate can make an intelligent selection as to how to manage the duplicate.  We can be the most aggressive about altering the user about potential duplicates and the most passive about automating deduplication. 

When visitors enter their own information or we are doing automated deduplication we have to do everything by training the computer.  It is a completely automated computer task. One of the reasons I like CiviCRM is because it makes creating duplicate management rules very easy.  I am also a fan of Demand Tools and Dupeblocker for Salesforce.  

With this software and other software on the market, we tell the software what the rule is like the ones above and then it tests new records when they are created to see if they match existing records. If they do, then they are merged.  

With automated deduplication you apply the same rules to records that are existing that may not have been deduplicated when they were entered or imported. 

Denver DataMan has participated in deduplicating what must by now be millions of records and we would love to help you clean up your database. 

Related Service
Subscribe to Deduplication