Deduplicating the Duplicated Duplicates

One of the challenges that all CRM administrators have is how do you manage duplicates. In today's world data comes from many sources. Multiple people are entering data which can include emails, physical addresses, and phone numbers can be temporary. Information on individuals and companies is often intensely duplicated, similar but not a duplicate, and more, making deduplication a virtual nightmare. 

Similar Person from CiviCRM

Above is a common example.  Bill Gates is already in our CRM.  Bill Gates gives his card to another member of the staff and so someone tries to enter Bill Gates again.  But this time as William Gates.  In the example above, CiviCRM, a CRM for nonprofits identified this as a possible duplicate and recommended that a new record not to be created.  

What gets even harder is when users enter their own data.  Let's look at this example another way.  Bill Gates is in our CRM.  Mr.  Gates wants to come to the Gala for our organization so he comes to the web page to register. Given the formality of the event, he registers as William Gates and uses his personal email rather than his business email that we have on file.  

If we were to merge Bill Gates and William Gates together with just the name and the different email than there is a high likelihood that we merged the wrong people. Bill Gates of Microsoft fame is the most likely answer but the system is not dealing with celebrity names it is just matching characters.  What about Sarah Smith or Bob Jones.  

In general, I like using first and last name and email address to equal a record that can be automatically merged without human intervention.  We can also use the same formula with birth date or phone number or city, state, and zip.  

The reason you cannot just use the email address is that often married couples use the same address. Sam and Jessica Barton can both use the same Gmail but we often want to have a record for each.  Also, you might have two contacts with and we do not want to merge these records based just on the email field.  

What Do You Do When You Find a Merge

Duplicates are found in three ways.  

  1. CRM users entering data 
  2. Vistors entering their information 
  3. Automated deduplication  

In the first example,  we can trust that the person being told that there is a duplicate or might be a duplicate can make an intelligent selection as to how to manage the duplicate.  We can be the most aggressive about altering the user about potential duplicates and the most passive about automating deduplication. 

When visitors enter their own information or we are doing automated deduplication we have to do everything by training the computer.  It is a completely automated computer task. One of the reasons I like CiviCRM is because it makes creating duplicate management rules very easy.  I am also a fan of Demand Tools and Dupeblocker for Salesforce.  

With this software and other software on the market, we tell the software what the rule is like the ones above and then it tests new records when they are created to see if they match existing records. If they do, then they are merged.  

With automated deduplication you apply the same rules to records that are existing that may not have been deduplicated when they were entered or imported. 

Denver DataMan has participated in deduplicating what must by now be millions of records and we would love to help you clean up your database. 

Related Service
Subscribe to Data Quality