Recently at a client we were seeing intermittent Authentication errors with the following in the logs:
Caused by: javax.security.auth.login.FailedLoginException: [Security:090304]Authentication Failed: User xelsysadm javax.security.auth.login.FailedLoginException: [Security:090302]Authentication Failed: User xelsysadm denied at weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl.login(LDAPAtnLoginModuleImpl.java:261)
We were not seeing the issues regularly, only during periods of high activity.
As a result of not being able to replicate the issue reliably, we had problems understanding what was going on. The user existed in the Identity Store (LDAP), was seen in the Weblogic Users section, and was viewable in OIM. Sometimes the user would work and sometimes it would not. We could not reliably define when we would have issues and when we would not.
This problem was the pure definition of an intermittent issue and demonstrates why engineers and admins hate dealing with them.
There are many components involved in the login process (Custom Web Services, OSB, etc) so debugging took a while to figure out what was going on. Through trial and error, we finally realized that Weblogic was causing the issue through one of its Authenticators.
Now understand that nothing in Weblogic itself indicated that it might be the issue. We were instead seeing issues in the OIM logs with the Authentication Failure message and had to figure out just where the issue was located.
There is a setting in the Weblogic Console. It is located at Security Realms, my realm, Providers, (Whatever Auth Provider is being used), then the Provider Specific tab. The setting is in the General section called Results Time Limit. This setting tells Weblogic how long to search the Provider’s Identity Store before giving up. The setting is defined in milliseconds.
We replicated the issue we were seeing intermittently by changing this result to 1 ms and tested the change. We saw the same exact error EVERY time now.
Just as a general explanation, this setting will be affected in 2 scenarios (that I can think of). The first is if you have a flat Identity Store where a large number of users are in one container. In that case, Weblogic will start with the first entry in the store returned and go through them one by one; if the user is not found before the Results Time Limit is reached then Weblogic will respond with Authentication Failed: User (username) denied.
The second scenario is if you have a large amount of traffic hitting the Identity Store. In that case, Weblogic will do the same thing, as it will not have enough time from when the timer started until it got through all of the results due to delays from the Identity Store. They are effectively the same issue – Weblogic not having enough time to search properly, but occurring from different causes.
Now there may indeed be a reason for this setting, but in Enterprise environments today I can’t see the benefit. As a result, if you instead put 0 in this setting, Weblogic won’t stop checking the Identity Store until every record is searched. 0 means unlimited or infinity for this setting. (FYI, changing this setting does require a full domain restart.)
After we made that modification, the problem was solved and we no longer saw the intermittent issues related to authentication.
Questions, comments or concerns? Feel free to reach out to us below, or email us at IDMWORKS to learn more about how you can protect your organization and customers.