When an environment consists of multiple source systems creating persons through various data entries, it is paramount to have a system in place that will allow duplicate account creations.
The most difficult pieces in this process is building the business process and analyzing the data. If the data isn’t properly analyzed, cleansed, and hardened your business process won’t help. If the business process isn’t implemented and reviewed often the data integrity may fail as systems are replaced or new systems implemented. How to deduplicate accounts? It is all about the initial data source entry, data matching, and business policies.
Below is a list of items to consider in building out your processes, organizing your data sources, and hardening your data:
- Identify the origination of all data entry and where the data resides. Build flow diagrams and use this as a living document as new systems are analyzed and upgrades or as systems are replaced.
- Analyze where data inconsistencies are and where duplication is currently happening in the organization and what common work around or fixes are.
- Identify which data makes the account unique (unique identifiers). This may be their drivers license, social security number, birthdate, address, school attended, etc.
- Identify which internal unique identifiers are generated internally, and confirm which system contains which unique identifiers, and which unique identifiers are shared with other systems. For example, HR may contain a users SSN but generates a unique identifier for the account to help uniquely identify this user with other systems and not share the SSN value.
- Identify which data items uniquely identify a given account. Outside of a data entry user error, a given person would always be identified with this value. The hope is that there would be at least two unique identifiers such as SSN, birthdate, etc.
- Identify which data values are rarely changed on the account or would never change. Such as birthdate, first name, middle name, last name, birth place, cell number, home number, spouse name, etc.
- Identify what must be present in all systems for a system account to be created in the vault.
- Identify what matching criteria must be used to guarantee a matching user. This would imply that if these values are all present, then there is a match to an existing user. This would be multiple matching statements.
For example, each of the following items would run in order. If only one match is found on the first search, it would end the process and consider the found match as being a valid match.
- – Match on First Name, Last Name, Birth Date, home zip code, ssn, hr unique identifier.
- – Match on first name, last name, birth date, home zip code, ssn
- – Match on first name, birth date, home zip code, ssn
- – Match on last name, birth date, home zip code, ssn
- – Match on birth date, home zip code, ssn
- – Match on First name, last name, birth date, home zip code (not hr user)
- If no match is found then we go into the same process, but for suspense conditions.
If there isn’t a match for the set matching criteria to uniquely identify one individual, then we would have a list of matches that would be performed to check for a possible match but one that would need human intervention.
An example would be data entry issues where the birthdate may have been entered backwards where MM/dd were switched to dd/MM by mistake by a data entry employee. If a suspense matching condition is found, we would set off a workflow or notification to the team who would review the account and possible matches.
- – If no matches are found or suspense matches, then a new account would be created.
- – In the workflow or notification send the needed data if possible, but be sure it is done in a secure way.
- – Try at all costs to keep PII data retained to only systems where the data is mandatory, such as HR.
Best practice would be to use unique identifiers from other systems rather than a SSN value. Use different values where needed. If a downstream system needs this data, build in policies and audits to make sure that this data is secure and confirm who has accessed the data and who has permissions to access the data. Keep data in internal subnets that are protected from general purpose employees, etc.