Screening for Name Variations

One of the most common challenges, faced when striking the balance between the effectiveness and efficiency of sanctions screening systems, is how to deal with common name variations. Whilst using simple distance-based measures is typical for identifying matches for family names, it’s usually not enough for first names.

Filters have a variety of strategies in dealing with this problem:

  • Add aliases with alternative spellings and/or nicknames/variants
  • Reduce significance of first name match score
  • Employ specialised name substitutions within the filter’s matching algorithm

Whilst it’s common practice to supress hits against names where the first name is absent, specialised initial matching functionality can often create questionable results. For example consider these two submissions for the name “Hala Mohammed Al Nasser”:

  • Hala Muamad Al Nasser
  • H.M.A. Nasser

It’s very likely that one would like to hit the first of these, rather than the second. However, some naive initial matching algorithms will classify this as a good match, despite the fact that initialising “Al” is not a realistic scenario. On the other hand, a simple distance measure for “Muamad” vs “Mohammed” results in an edit distance of four – possibly pushing this outside of the fuzzy match threshold.

One possible solution to this issue is to “tokenise” these names as part of the filter’s normalisation process. This means essentially converting all given variations in spelling of a first name to a single universal name in both the target list and presented names.

As an example, consider the 20 spellings of Mohammad on the OFAC/HMT/UN/EU lists. Measuring the similarity (using Levenshtein distance) between each pair of names gives in the following table:

The counts of the distances between the 190 possible pairs are as follows:

This shows that roughly half of all the variation pairs have an edit distance of three or more. Relying on your filter’s fuzzy matching capabilities may present some risk depending on the significance of the other components in the name. However, if your filter has synonym logic then this may be a more addressable problem.

What is the best way to understand the risk in this situation? We suggest taking the following steps to assess and mitigate the potential risk:

  1. Understand your filter’s capabilities in this area
  2. Review your filter’s configuration and verify it’s operation
  3. Review or create a set of synonyms based on risk appetite
  4. Confirm filter effectiveness and efficiency by comprehensively testing this issue

If you would like support in executing the above programme, contact us at contact@deep-lake.com.

Deep Lake specialises in advanced analytical techniques and expert business knowledge to provide deeper insight into screening environments. Contact us to find out more about our products and services.