
The concept is probably unfamiliar, but it underpins much of the world’s security—in telecommunications, banking and national security. It’s entity resolution (ER), the technology for consolidating disparate data about someone or something, which isn’t nearly as easy as it sounds.
Modern security and intelligence organisations have critical and sensitive data that they need to review, but it may be siloed, hard to access or manually intensive to bring together. ER is increasingly essential in human, cyber and data-led intelligence operations, for gathering information across datasets. With the ever-increasing availability of data, organisations are becoming reliant on systems that unlock insights from their existing datasets, including ER.
At its core, ER is simple. The system brings together all the information in your data that relates to a single customer, criminal, asset or threat so that you can make informed decisions. ER helps analysts find the needle in the haystack and rapidly builds a complete picture of all relevant information. The market for ER systems is predicted to become a multi-billion-dollar industry in the next decade, with mostly US companies leading the way.
ER is a fundamental enabler of AI and data-driven organisations. Organisations with a lot of data need to know what they know. Equally, those in national security need to know what they don’t know. This enables them to acquire the missing glue data—information that identifies new links between entities and can help join the entities together.
ER is also used by police for mapping organised criminal networks, immigration officers for reviewing your passport and other relevant data, casinos for detecting fraudsters and banks for conducting know-your-customer (KYC) checks to prevent sanctions evasion. Global KYC regulation requires banks to collate information on individuals so that they can conduct due diligence investigations and avoid facilitating criminal activities.
Outside the security domain, ER can enable marketing departments to better understand their customers and help finance teams to identify payments to suppliers. ER works with internal data and can be augmented by external data providers, bringing together relevant information in one view.
Entity-based resolution can also be used to identify links in obscured or deceptive behaviours known as ‘channel separation’, such as the use of a burner phone. This helps track down criminals, fraudsters, terrorists and spies who separate identifiers across different data channels or collection methods.
Basic ER systems rely on record matching, such as linking passport number 1234567 to another record with the same number. Doing this manually is time consuming and doesn’t scale up easily. If you have millions or billions of records, human investigators can’t do the job, not even with AI or advanced search functions.
In contrast, computers can compare vast numbers of records quickly, but they struggle to intuitively understand which records should be joined. ER tools have rapidly improved thanks to decades of research in data science, computer science, mathematics and behavioural science. The latest, most advanced ER systems have better intuitive understanding. For example, they can decide to merge two passport records only if they share additional attributes, such as the holder name and date of birth or the expiry date.
Consider this example. Should an analyst merge two similar accounts linked to one email address? The first is registered under Bob Smith (date of birth 07/09/1997) and the second to Robert E Smith (date of birth 09/07/1977). It could be the same person or perhaps father and son, given the ambiguity of date format (DD/MM/YYYY or MM/DD/YYYY) and the possibility of input error. By layering more information such as residential address, IP address or device type, ER can refine the resolution process and lessen the analyst’s workload.
The image below shows a simplified processing chain within an ER system, broken down into components of an end-to-end process, with each step refined over the years.

ER processes are based on decades-old mathematical models that use a process called ‘blocking’ to handle large data sets. Blocking selects a subset of similar records (for example names or addresses) to compare shared characteristics within a smaller group. More laborious operations can be used on this subset of data, such as fuzzy name matching that calculates ‘Robert’, ‘Bob’ and ‘Rob’ as similar.
However, comparing every bit of data to every other attribute is still a very intensive computer process, so different ER systems take shortcuts to address this. Sometimes entities and blocks are matched using probabilistic and statistical approaches, whereas others are preprogrammed together. For example, blocks may be weighted differently depending on the uniqueness of the data field. A more unique data link, such as a credit card number which is definitely linked to just one person, would be given more weight than a less unique one, such as an email address, phone number, birth year or gender which could be common across many individuals.
The most advanced ER systems balance computation requirements, assessing the rarity of each data field and apply an even more specialised technique such as sequence neutrality. This technique produces the same output regardless of the order in which those records were loaded into the system, unlike predetermined models. This results in more accurate matches, less noise, and ultimately deeper analytical insights.
Despite advances in ER tools, practical implementation can still be challenging. Success depends on data structure, use cases and technology architecture. Name matching alone is hugely complex—even in English, let alone other languages. For instance, a computer doesn’t inherently know that ‘Tony’ is short for ‘Anthony’ or that in Russian ‘Sacha’ can be both ‘Alexandra’ (feminine) and ‘Aleksandr’ (masculine).
Recent developments have improved fuzzy matching across languages and cultures and in handling unstructured data. For organisations with global datasets and international remits, such as national security agencies, extracting the most knowledge and wisdom from currently available data can be the difference between life and death.
Global organisations are increasingly looking to AI to solve their resourcing challenges and help process overwhelming amounts of data. ER and AI can work hand in hand. AI can extract entities from unstructured data, while ER tools can produce meaningful insights from the data.
Organisations are experimenting with the data-science approaches to make the most of the opportunity sitting in their data. But those with ambitious AI goals should consider building on practical use cases. Having a complete understanding of their data would provide a solid foundation for an AI that is powerful and context aware. ER alone can be transformational across sectors; but especially for security analysts, security operations centre managers and investigators.
This article has been corrected to accurately define blocking and to identify sequence neutrality as a specialised technique.