All about SQL drivers and access tokens

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

We often get support escalations related to failures around expired access tokens when using Microsoft Entra ID (formerly Azure Active Directory) authentication. There is a lot of nuance in the various drivers about how and when access tokens are used and when they get renewed after expiring. I’m going to do a deep-dive and try to explain some of the details here.

This post is mainly targeted at people who have to troubleshoot issues in this area or those who are building large services and need to understand the impact of their authentication choices.

The Problem

Errors related to expired access tokens usually involve an error like “Login failed for user 'xxxxxx'.” (Pretty generic and unhelpful. Security minded folks don't want to give potential attackers clues when authentication fails.) The error could happen either on connection open or statement execution.

Background

Connections to SQL use the MS-TDS protocol and it’s important to keep in mind that TDS is a stateful connection. Meaning the client authenticates at the beginning of the connection and the connection is maintained until the client closes it. There is no mechanism to re-authenticate (send a new access token) in TDS at any point other than at the beginning of a connection. Applications can execute many queries over long periods of time while the connection is open. Opening a connection and authenticating is also a relatively expensive operation. So, connection pools are often used to maximize performance, keeping connections open for a relatively long period.

Connection pools return a “logical” connection to applications that open a new connection. Each logical connection is serviced by a “physical” connection from the connection pool. When a logical connection is “closed” by the application, the physical connection is returned to the pool but remains open behind the scenes.

Obviously, the database validates access tokens when a connection is first opened. Another common point when tokens are validated is on a RESETCONNECTION event. A RESETCONNECTION event happens when a driver sets the RESETCONNECTION status in the next packet header. This is meant to be used in the context of connection pooling. A driver can set the flag in the header of the next statement execution on a logical connection coming out of a pool. When the server sees that status, it resets the environment prior to executing the statement. This helps prevent “dirty” physical connections from impacting different logical connections. It also helps prevent connection pools from persisting connections for security contexts that no longer should have access to the database. For example, you wouldn’t want a security context whose access has been revoked to continue to be able to create new logical connections. Relatedly, if an access token is near expiration or expired, a driver can close a physical connection from the pool instead of reusing it since an expired token would result in an error. It can then open a new physical connection with a new token to service the new logical connection request.

Speaking of connection pooling, not all drivers do connection pooling the same, so clarifying how connection pooling works is relevant to access tokens.

ODBC: Connection pooling in ODBC is handled by the ODBC driver manager, which is what is responsible for loading the correct ODBC driver requested by the application. The ODBC driver manager signals an ODBC driver when a connection is going into and out of the connection pool, so the ODBC driver for SQL Server knows when to set the RESETCONNECTION status. At the same time, it also knows if it needs to obtain a new access token and open a new connection on this event.

JDBC: In JDBC, connection pooling is handled by 3^rd party libraries. There are connection pool specific JDBC classes that need to be used by the connection pool library. Those classes include APIs to let a driver know when a connection is going into or out of a connection pool. This signal is required for the driver to be able to set the RESETCONNECTION status. Unfortunately, not all connection pool libraries utilize those classes. HikariCP is the most notable one here that doesn’t use them. The implication is that, if you are using HikariCP, connections will live as long as their max lifetime as defined in HikariCP. You’ll need to manage that lifetime relative to the access token lifetime. Additionally, if an application encounters the “Login failed for user ’xxxxxx’.” error, the application or the connection pool will need to ensure the underlying connection is closed so that it isn’t reused by a subsequent connection request. This issue can be alleviated by adding a validation query to the connection pool settings that is run before the pool hands the connection out. This does come with a small performance penalty, though.

(Sidebar: HikariCP and others offer a lot of knobs to tune your connection pool. Don’t overload your authentication server with those settings. For example, say you have a microservice architecture with 200 nodes. Each node has its own connection pool with a minimum of 30 connections and a 20 minute connection lifetime. You’ll be hitting your authentication endpoint with an average of 200n * 30c / 20m = 300 requests per minute. Not necessarily spaced evenly over each minute because each set of 30 connections per node will be renewed simultaneously. You will most likely see intermittent authentication failures, if you do something like this. If you use the built-in authentication methods, the driver will try to serialize the authentication attempts and cache the access token for reuse. But this is not guaranteed across all versions.)

.NET: SqlClient (specifically, System.Data.SqlClient, or SDS, and Microsoft.Data.SqlClient, or MDS) implements connection pooling internally. It also uses the RESETCONNECTION status to indicate a connection coming out of a pool. If the pool uses token-based authentication, SqlClient will check the expiration date of the token before handing the connection to the application. If the token is expired or close to it (10 minutes), all connections in the pool are recycled and a new connection is created with a new token. All connections in the pool reuse the same cached token.

This all works pretty well when using the built-in token authentication methods (Authentication=ActiveDirectoryXXX). But let’s talk about access tokens passed from the application.

The SQL drivers also have a connection property that allows an application to pass an access token to them one way or another. What is the implication when applications do this? Well, that access token doesn’t have expiration information with it (at least not the part that is handed to the driver). This means the driver doesn’t know if a token is expired. You need to carefully manage connection lifetime in this scenario. If a connection is taken from a pool with an expired access token and the server sees RESETCONNECTION in the header, the server will validate the access token before executing the query that’s being sent (remember, we said above that RESETCONNECTION is sent in the header of the next statement execution). This will result in an execution error that the application will need to handle. This won't invalidate the connection so it's important that applications recognize this and stop using the expired token or the error will simply keep repeating as long as pooled connections persist. To correct the situation at runtime, the application will need to obtain a new token and create a new connection with the token. This creates a new pool because when the application passes the access token, the token is part of the pool key (it’s not part of the pool key when using the built-in methods). If you are using a driver that allows you to specify a minimum connection pool size, be sure to clear any pools using expired tokens in this scenario, or you might find them lingering forever (essentially a memory leak).

In the Microsoft JDBC Driver 12.4 for SQL Server, we’ve added an AccessTokenCallback on the connection to address this drawback of the AccessToken property. The MDS library added the same thing in 5.2. The callback allows the application to provide an access token and its expiration whenever the driver needs it.

All that said, I’ve heard of instances where customers say long running sessions (with lots of statement executions) result in token expired errors (Login failed for user 'xxxxxx'. A severe error occurred on the current command. The results, if any, should be discarded.) on statement execution. However, after explaining the above, they usually go silent. I can’t tell for sure if the server might be validating access tokens on a query execution without RESETCONNECTION, or if they realize they are in one of the above scenarios. (The error and stack trace will be the same.) But the more you know, the easier it will be to figure out where the problem lies.

David Engel

The Problem

Background

Leave a Reply Cancel reply