SailPoint IdentityIQ Consumer Scaling Metrics
Recently at a client we went live with SailPoint IdentityIQ 6.3p3. The client was interested in replacing their legacy Sun Identity Manager Service Provider Edition (SPE) implementation with IdentityIQ. The environment handled creating all B2C and B2B users for the company.
The objective of the project was to create a service that both consumer and B2B applications can integrate with to provide provisioning, deprovisioning, access management and password self service functionality. All applications would communicate to SailPoint via REST/JSON API calls. When provisioning identities, IdentityIQ would create an Identity Cube, an Oracle Directory Server Enterprise Edition (ODSEE) LDAP account and a Tivoli Access Manager (TAM) Account. From a difficulty standpoint creating those accounts were relatively straightforward. Where the complexity came in was three areas: Large growing User Population, Bulk Data Loading, and High Transaction Volume.
Here’s an overview of the architecture setup:
• 8x Red hat Linux virtual servers in two datacenters to host SailPoint IdentityIQ. Four active in the primary datacenter and four passive instances in a secondary datacenter for disaster recovery purposes.
• All servers had Quad-core processors.
• All servers had 20GB of RAM (14GB of which was allocated for IdentityIQ).
Large User Population
The B2B population was roughly ~55,000 users across 7 applications.
The B2C population at Go Live was north of 8.9 million users and climbing by ~120,000 users a month. The user population was utilized by 9 B2C applications.
Bulk Data Loading
As with any consumer facing application, we were tasked with minimizing downtime as much as possible during the migration from Sun Identity Manager to SailPoint IdentityIQ. The Bulk Loading requirements included loading 8.9 Million ODSEE Accounts, 8.9 Million TAM Accounts and brining over all of the users challenge/response questions and answers from Sun IDM into IdentityIQ.
Since this was Sun IDM SPE edition all user information was stored in ODSEE and not in a database. This meant ODSEE was the authoritative source for Identities and aggregating in the ODSEE account would bring in all the attributes needed for users. The only exception being the Challenge/Response data, which was stored in an encrypted attribute. A separate Java class running on the Sun IDM Servers was written to decrypt the Challenge/Response data by creating a LighthouseContext and then using the EncryptedData class to decrypt the answers. To aggregate the ODSEE accounts into IdentityIQ we first had to setup partitioning. We calculated that without partitioning it would have taken 40 days to load all of the accounts into IdentityIQ.
The first step was to increase the number of aggregation threads. By default in the Aggregation Partition ServiceDefinition Object the maxThreads value is set to 1. Since all instances had quad-core processors we safely increased that value to 8. Spread across 4 servers that meant we could concurrently aggregate 32 threads. The next step was to update the ODSEE application for partitioning. We added the searchDNs attribute to the application and broke it up into 38 partitions based on a custom login name attribute that was stored in ODSEE. By utilizing partitioning we were able to get the aggregation time down from 40 days to 16 hours. While 16 hours still isn’t acceptable for an outage window, we loaded all users prior to the Go Live into IdentityIQ and then aggregated in only delta changes, which allowed us to migrate all applications from the Legacy Sun IDM environment into IdentityIQ within a 4-hour maintenance window (most of that time was allocated to validation/sign-off).
Loading TAM Accounts
One of the obstacles we had to overcome was how to load 8.9 Million TAM accounts into IdentityIQ without impacting a legacy end of life hardware TAM infrastructure. When we first ran the out of the box TAM Aggregation we ran into issues because of settings in the TAM Policy server limiting the number of users returned in a single partition. Since the TAM accounts were basically a flattened structure of the ODSEE accounts, we were able to come up with a workaround. A custom rule was written that would utilize IdentityIQ’s Aggregator class to create the TAM Link for all users.
High Transaction Volume
The other factor that added to the complexity of this project was the high transaction volume. We tracked the volume of the primary web service calls over a 24-hour period post Go Live:
Get User: 858,621
Create User: 6,904
Password Reset: 7,920
Change Password: 6,947
The peak load during a 1-hour block of time was:
Get User: 65,856
Create User: 618
Password Reset: 600
Change Password: 486
The SLA’s that we for each of those methods that we’ve been able to meet are:
Get User: 3 seconds
Create User: 15 seconds
Password Reset: 5 seconds
Change Password: 5 seconds
We’ve been running in production for almost 2 months at the time of this writing without any major issues or outages.