Evolving pg_cron together: Postgres 13, audit log, background workers, & job names

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.

Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:

 

-- vacuum my table every night at 3am (GMT) SELECT cron.schedule('0 3 * * *', 'VACUUM my_table'); -- call a procedure every minute SELECT cron.schedule('process-new-events', '* * * * *', 'CALL process_new_events()'); -- upgrade an extension when PostgreSQL restarts SELECT cron.schedule('upgrade-pgcron', '@reboot', 'ALTER EXTENSION pg_cron UPDATE'); -- Delete old data on Saturday at 1:30am (GMT) SELECT cron.schedule('delete-old-events','30 1 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);

 

 

Since its initial release, pg_cron has become a standard tool in the arsenal for many of you who use PostgreSQL. You can find the pg_cron extension on GitHub—it’s open source, and it’s easy to install and set up. And pg_cron is also included in managed Postgres database services such as the Hyperscale (Citus) and Flexible Server options in Azure Database for PostgreSQL.

 

The popularity of pg_cron has also created demand for more advanced features, such as an audit log, the ability to update cron job schedules, and for other ways to run pg_cron in cloud database services.

 

Recently, the RDS team at Amazon reached out to us to see if would accept contributions to pg_cron to enable pg_cron in RDS. And our answer was: Of course! Hence the is the result of a collaboration between Microsoft (job names, reviews, bug fixes) and Amazon (audit log, background workers), with a little help from Zalando (PostgreSQL 13 support).

 

Audit log lets you see your running cron jobs, as well as past job runs

Pg_cron logs the outcome of jobs in the PostgreSQL log, but the log is not always easy to access for database users. Thanks to an awesome contribution by Bertrand Drouvot and Nathan Bossart from the Amazon RDS team, you can now also see a log of your pg_cron jobs in the database in a table called  cron.job_run_details.

 

The cron.job_run_details table shows:

  • when a command started and finished,
  • whether the pg_cron command succeeded, and
  • the number of rows returned—or the error message to quickly detect when something went wrong

 

-- update my rollup tables every minute SELECT cron.schedule('update rollups', '* * * * *', 'SELECT update_rollup_tables()'); -- after a while SELECT * FROM cron.job_run_details; ┌───────┬───────┬─────────┬──────────┬──────────┬───────────────────────────────┬───────────┬────────────────────────────────────────────┬───────────────────────────────┬───────────────────────────────┐ │ jobid │ runid │ job_pid │ database │ username │ command │ status │ return_message │ start_time │ end_time │ ├───────┼───────┼─────────┼──────────┼──────────┼───────────────────────────────┼───────────┼────────────────────────────────────────────┼───────────────────────────────┼───────────────────────────────┤ │ 2 │ 13 │ 1022 │ postgres │ marco │ SELECT update_rollup_tables() │ succeeded │ 1 row │ 2020-10-28 12:18:00.059566+02 │ 2020-10-09 12:18:01.734262+02 | │ 2 │ 1 │ 986 │ postgres │ marco │ SELECT update_rollup_tables() │ succeeded │ 1 row │ 2020-10-28 12:19:00.029769+02 │ 2020-10-09 12:09:01.345637+02 | │ 1 │ 14 │ 1023 │ postgres │ marco │ VACUUM my_table │ failed │ ERROR: relation "my_table" does not exist │ 2020-10-28 12:19:00.079291+02 │ 2020-10-28 12:19:00.080133+02 | └───────┴───────┴─────────┴──────────┴──────────┴───────────────────────────────┴───────────┴────────────────────────────────────────────┴───────────────────────────────┴───────────────────────────────┘ (3 rows)

 

 

The audit log is not automatically cleaned up, but … we have pg_cron! That means you can easily decide on your own retention policy and use pg_cron to schedule a job that cleans up the audit log:

 

-- at midnight, delete all audit log entries older than 14 days SELECT cron.schedule('clean audit log', '0 0 * * *', $$DELETE FROM cron.job_run_details WHERE end_time < now() – interval ’14 days’$$);

 

 

In implementing the audit log in pg_cron, we used the row-level security feature in Postgres to make sure that database users can only see and delete their own audit log entries.

 

Using background workers avoids connection configuration

Pg_cron normally executes commands by either connecting to localhost or the unix domain socket (configurable using the `cron.host` setting). Both connection options are secure but do require some changes to pg_hba.conf to enable, and which change you need to make depends on which Linux distribution you are using.

 

Thanks to another contribution by Bertrand and Nathan, you can choose to use dynamic background workers instead of connections by setting cron.use_background_workers = on in postgresql.conf. That way, you do not need any pg_hba.conf changes.

 

A slight downside of background workers is that the number of concurrent jobs is limited to max_worker_processes (8 by default). The standard way of connecting to localhost is only limited to max_connections (which is 100 by default, usually higher). We recommend increasing max_worker_processes when using background workers.

 

Job names simplify pg_cron job management

If you are already familiar with pg_cron, you may have noticed that the examples above now includes a job name as the first argument to the cron.schedule(..) function. The ability to give your pg_cron jobs a name is also a new feature in pg_cron 1.3, by yours truly.

 

When you use a job name, cron.schedule does an upsert rather than an insert, which means you idempotently set your schedule and your command.

 

For instance, if I name my pg_cron job nightly vacuum then I can easily change the schedule:

 

-- set up a nightly vacuum at 1am with a job name select cron.schedule('nightly vacuum', '0 1 * * *', 'vacuum my_table'); ┌──────────┐ │ schedule │ ├──────────┤ │ 4 │ └──────────┘ (1 row) -- I can now see the nightly vacuum job in the cron.job.table table cron.job; ┌───────┬───────────┬─────────────────┬───────────┬──────────┬──────────┬──────────┬────────┬────────────────┐ │ jobid │ schedule │ command │ nodename │ nodeport │ database │ username │ active │ jobname │ ├───────┼───────────┼─────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼────────────────┤ │ 4 │ 0 1 * * * │ vacuum my_table │ localhost │ 5432 │ postgres │ marco │ t │ nightly vacuum │ └───────┴───────────┴─────────────────┴───────────┴──────────┴──────────┴──────────┴────────┴────────────────┘ (1 row) -- change the schedule to 3am by using the same job name select cron.schedule('nightly vacuum', '0 3 * * *', 'vacuum my_table'); ┌──────────┐ │ schedule │ ├──────────┤ │ 4 │ └──────────┘ (1 row) -- I can see the new schedule in the cron.job table table cron.job; ┌───────┬───────────┬─────────────────┬───────────┬──────────┬──────────┬──────────┬────────┬────────────────┐ │ jobid │ schedule │ command │ nodename │ nodeport │ database │ username │ active │ jobname │ ├───────┼───────────┼─────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼────────────────┤ │ 4 │ 0 3 * * * │ vacuum my_table │ localhost │ 5432 │ postgres │ marco │ t │ nightly vacuum │ └───────┴───────────┴─────────────────┴───────────┴──────────┴──────────┴──────────┴────────┴────────────────┘ (1 row)

 

 

Using a job name also makes it easy to ensure a specific set of jobs is running, without having to check whether the job already exists.

 

PostgreSQL 13 support is available for pg_cron

Finally, PostgreSQL 13 is out! Of course, the new pg_cron 1.3 release supports (and all PostgreSQL versions since 9.5). Big thanks to Alexander Kukushkin from Zalando for updating pg_cron to support Postgres 13.

 

Packages for pg_cron 1.3 are available via PGDG for Red Hat/CentOS and Debian/Ubuntu.

 

Better together: Improving PostgreSQL across competitive boundaries

One of the exciting parts of being on the Postgres team at Microsoft is that Microsoft has embraced open source. What that means in practice is that we are making significant contributions to PostgreSQL via our team of PostgreSQL committers and contributors (including engineers Andres Freund, David Rowley, Dimitri Fontaine, Jeff Davis, and Thomas Munro). In addition, we have created and maintain some pretty useful open source extensions to Postgres such as Citus, pg_auto_failover, and of course pg_cron. When we open source our software, competitors might benefit from our work, but the more important point is all PostgreSQL users (including our customers) benefit!

 

I find pg_cron to be super useful. And some have gone so far to say that it’s “really cool” to have the scheduler within the database, and tied to the data itself. We’re happy to see that pg_cron has gotten significant contributions from Amazon and look forward to seeing one great PostgreSQL job scheduler across Azure Database for PostgreSQL, RDS, and other Postgres cloud database services.

REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.