Troubleshoot PostgreSQL: ‘An existing connection was forcibly closed by the remote host’

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Application logs may show intermittent connection errors such as “An I/O error occurred while sending to the backend” or other error messages that indicate a timeout, write failure or a pipe error. At the same time the PostgreSQL logs show an error like “could not receive data from client: An existing connection was forcibly closed by the remote host”.

Those errors happen with socket connections between client and PostgreSQL server and usually mean that the client closed the connection while data was still being written on the socket at the database. These two events are interrelated and are frequently associated with client-side connection handling issues.

These types of errors can be tricky to troubleshoot as it involves different components. In this blog post, I will discuss what can lead to this error and best practices to avoid it.

Stale connections

Let’s say a client application tries to execute a query using a previously opened connection object retrieved from a connection pool. When the application attempts to use the connection object, the connection has gone “stale” and the client application throws an exception from attempting to send query data over the “open” connection. The client-side logic catches the exception and the TCP connection is closed. The PostgreSQL backend detects that the client-side connection was closed and reports the dropped connection as Windows error WSAECONNRESET (10054) Connection reset by peer: An existing connection was forcibly closed by the remote host. The ‘remote host’ here is the client, which is separate from the Postgres server.

Curious to know more?
The only way to understand exactly what is causing the issue is to capture network trace at the client-side at the time when the issue occurrs. Network Traces can be captured by different applications such as NetMon, WireShark, Fiddler, etc.

If you are using a Linux client and the tools mentioned above don't work, you can manually generate network dump as in the following command:

tshark -i any -n -b filesize:204800 -w `date +%y%m%d-%H:%M:%S`.pcap -b files:1000

tcpdump -i any -w `date +%y%m%d-%H:%M:%S`.pcap -G 300 -W 1000

Note: If you enable network capture, monitor your disk space and purge as needed to ensure you don’t run out of space.

Solution

Use Pgbouncer:
pgBouncer is a connection pooler that sits between your application and the database. When it needs a new connection to the database it grabs one, and then continues to re-use it. After a certain period of time, it releases that connection. What this means is when your application grabs a connection to the database and doesn’t use it, it’s not actually passed on and consumed as an idle connection. From: Not all Postgres connection pooling is equal.
Implement retry logic to handle transient errors
Implement retry logic as a best practice for designing and developing applications in the cloud as transient errors could occur. In this case, a re-attempt of the client query can find an active connection to use. Read more here: Handling transient connectivity errors

Minimize idle connections’ impact on the database
Managing connections is a topic comes up often in conversations with PostgreSQL users. The connections in Postgres aren’t free. Each connection, whether idle or active, consumes a certain overhead of memory (about 10MB per connection). Idle is something that grabs a connection from your application and holds it. Application connection poolers often also consume one or more idle connections. For more information, see: Connection handling best practices with PostgreSQL.

configure statement_timeout and idle_in_transaction_session_timeout properly, see
Tracking and Managing Your Postgres Connections
Send keep-alive signals from your application in your connection to avoid idle sessions
This trick can make sure pooled connections will not idle out in scenarios when the connection is not active for a certain time.
Check application resource utilization
High resource pressure (high CPU, high IOPS, high context switching) at the client-side might cause slowness. This slowness can cause connections to be open too long and eventually drop them.

Richard Bartel
Senior Software Engineer - Azure Database for PostgreSQL

Bashar Hussein
Embedded Escalation Engineer - Azure OSS DB

Leave a Reply Cancel reply