This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.
Since its initial release, pg_cron has become a standard tool in the arsenal for many of you who use PostgreSQL. You can find the pg_cron extension on GitHub—it’s open source, and it’s easy to install and set up. And pg_cron is also included in managed Postgres database services such as the Hyperscale (Citus) and Flexible Server options in Azure Database for PostgreSQL.
The popularity of pg_cron has also created demand for more advanced features, such as an audit log, the ability to update cron job schedules, and for other ways to run pg_cron in cloud database services.
Recently, the RDS team at Amazon reached out to us to see if would accept contributions to pg_cron to enable pg_cron in RDS. And our answer was: Of course! Hence the is the result of a collaboration between Microsoft (job names, reviews, bug fixes) and Amazon (audit log, background workers), with a little help from Zalando (PostgreSQL 13 support).
Audit log lets you see your running cron jobs, as well as past job runs
Pg_cron logs the outcome of jobs in the PostgreSQL log, but the log is not always easy to access for database users. Thanks to an awesome contribution by Bertrand Drouvot and Nathan Bossart from the Amazon RDS team, you can now also see a log of your pg_cron jobs in the database in a table called
cron.job_run_details table shows:
- when a command started and finished,
- whether the pg_cron command succeeded, and
- the number of rows returned—or the error message to quickly detect when something went wrong
The audit log is not automatically cleaned up, but … we have pg_cron! That means you can easily decide on your own retention policy and use pg_cron to schedule a job that cleans up the audit log:
In implementing the audit log in pg_cron, we used the row-level security feature in Postgres to make sure that database users can only see and delete their own audit log entries.
Using background workers avoids connection configuration
Pg_cron normally executes commands by either connecting to localhost or the unix domain socket (configurable using the `cron.host` setting). Both connection options are secure but do require some changes to pg_hba.conf to enable, and which change you need to make depends on which Linux distribution you are using.
Thanks to another contribution by Bertrand and Nathan, you can choose to use dynamic background workers instead of connections by setting
cron.use_background_workers = on in postgresql.conf. That way, you do not need any pg_hba.conf changes.
A slight downside of background workers is that the number of concurrent jobs is limited to
max_worker_processes (8 by default). The standard way of connecting to localhost is only limited to max_connections (which is 100 by default, usually higher). We recommend increasing
max_worker_processes when using background workers.
Job names simplify pg_cron job management
If you are already familiar with pg_cron, you may have noticed that the examples above now includes a job name as the first argument to the
cron.schedule(..) function. The ability to give your pg_cron jobs a name is also a new feature in pg_cron 1.3, by yours truly.
When you use a job name,
cron.schedule does an upsert rather than an insert, which means you idempotently set your schedule and your command.
For instance, if I name my pg_cron job
nightly vacuum then I can easily change the schedule:
Using a job name also makes it easy to ensure a specific set of jobs is running, without having to check whether the job already exists.
PostgreSQL 13 support is available for pg_cron
Finally, PostgreSQL 13 is out! Of course, the new pg_cron 1.3 release supports (and all PostgreSQL versions since 9.5). Big thanks to Alexander Kukushkin from Zalando for updating pg_cron to support Postgres 13.
Better together: Improving PostgreSQL across competitive boundaries
One of the exciting parts of being on the Postgres team at Microsoft is that Microsoft has embraced open source. What that means in practice is that we are making significant contributions to PostgreSQL via our team of PostgreSQL committers and contributors (including engineers Andres Freund, David Rowley, Dimitri Fontaine, Jeff Davis, and Thomas Munro). In addition, we have created and maintain some pretty useful open source extensions to Postgres such as Citus, pg_auto_failover, and of course pg_cron. When we open source our software, competitors might benefit from our work, but the more important point is all PostgreSQL users (including our customers) benefit!
I find pg_cron to be super useful. And some have gone so far to say that it’s “really cool” to have the scheduler within the database, and tied to the data itself. We’re happy to see that pg_cron has gotten significant contributions from Amazon and look forward to seeing one great PostgreSQL job scheduler across Azure Database for PostgreSQL, RDS, and other Postgres cloud database services.