Check out The Power of Patterns and read on to discover the second type:
Patterns of similarity when people make decisions about duplicate records
The second type of patterns about data (let’s call these type 2 patterns) occurs as humans look at multiple records (an illustration follows) trying to decide, for instance, if they are duplicate records. Without realizing it, they are sensing and evaluating the patterns of data similarities for each of the attributes, and sets of attributes, while of course automatically taking into account any misfielding and differences in the data. What they don’t calculate or really have any knowledge of is the mathematical similarity score that could be calculated for each set of attributes.
Now imagine we get a domain expert to label representative sets of record pairs as being duplicate or not-duplicate, yes or no, true or false, etc. We could build a model of the sets (vectors) of attribute similarity scores that lead the domain expert to their particular conclusions. With enough training, the model would know when it was ready to take over reaching the same conclusions (to converge) as the domain expert would reach. At this point, the model is ready to run and reach highly accurate conclusions based on the prior training by domain experts. Providing more accurate results means that higher levels of automation can be achieved in that particular business process.
Looking at the data and the attribute similarity scores you’ll notice a few keys points: misfieldings are taken into account, differences in spelling are recognized and scored appropriately, and also that the use of nicknames (or any other semantic equivalents for that matter) is supported.
Learn more here!