Auto Failover with PostgreSQL 12

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

The breaking change for pg_auto_failover was the changes around the recovery.conf file, where in Postgres 12 the presence of this file would prevent your Postgres instance from starting at all. It used to be used to signal PostgreSQL that we expect the server to remain in recovery mode, either as a PITR instance of as a standby instance. In Postgres 12 you signal that by using either the standby.signal or the recovery.signal file ; and the recovery and replication parameters are now to be found in the usual setup for PostgreSQL like any other GUC.

What is pg_auto_failover?

The pg_auto_failover project aims to provide fully automated HA for Postgres, in a simple and correct way. Simple to use. Correct implementation. That means the solution is robust, well tested, and easy to setup and get started with. Also anything that could be done automatically will be done automatically, and when the situation does not allow for an automated decision making process, then pg_auto_failover refuses to take any action.

A telling example of our approach to HA in pg_auto_failover can be found in the way we handle a primary server where Postgres is discovered not to be running, where we expect that it would be running, and it was known the be running before that. In that case, the first thing that pg_auto_failover does is to restart Postgres. Because that’s the simplest way to fix your production, and in many cases, it will just work. If that fails, pg_auto_failover continues trying ---- after all, your supervision script that frees some disk space on the WAL volume might need more time to be effective. Only after 3 consecutives failures or 20s spent trying to restart Postgres will pg_auto_failover bail out and failover to the secondary. And that only happens when the secondary is known to be available and all caught-up.

That’s just an example of course. I think it’s an important one in that it shows the spirit with which the pg_auto_failover solution has been implemented, and continues to be improved. Simple and correct.

Can I use pg_auto_failover in production now?

Yes! Some people actually use pg_auto_failover in production already, and happily so.

The pg_auto_failover project is already delivering a solid solution. We still have lots of ideas and ambitions in the area of automating Postgres HA, so there’s more to come! We have not implemented all the things yet at this time, and we focus on having a very solid solution for what we have. So to decide if you want to use pg_auto_failover, you should first understand what we have done at the moment and see if that matches your expectations in terms of features.

So, what it is that we have already done?

We support a single production architecture, with a hard-coded availability trade off that is giving priority to the service over the data in some situation. This allows us to remain very simple and robust.
Registering existing primary servers is possible, without service interruption. Just register your already running PostgreSQL instance as a primary server to the pg_auto_failover monitor and get started from there.
HBA editing is automated in pg_auto_failover, and you don't have to take care of it yourself. It might be that you have specific security rules to implement though, in which case you can of course edit the HBA yourself and discard the pg_auto_failover changes there.

What’s next for pg_auto_failover?

We have a long list of improvements in the work for  pg_auto_failover, and many more ideas for the future. We are still building some of the fundamentals in the area of Postgres HA, and working hard to implement a  fully automated HA solution for Postgres that is both simple and correct.

Stay tuned for more updates, including user facing improvements in terms of HA architectures and also a set of features targeted at docker and Kubernetes integration made (even) easier. The main items on our roadmap for the next releases now are:

support for multiple secondary servers
support for standby that are not candidate for failover
fully automated disaster recovery of the monitor and the Postgres nodes

native integration with docker, including a controller HTTP API
configuration management and syncing in between primary and standby

Remember that pg_auto_failover is fully Open Source. All the development happens in the open on GitHub. You are more than welcome to contribute, either by opening issues where you can let us know about shortcomings, bugs, or feature ideas, or by opening a Pull Request where you improve the product!

What is pg_auto_failover?

Can I use pg_auto_failover in production now?

What’s next for pg_auto_failover?

Leave a Reply Cancel reply