Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pg_cron (github.com/citusdata)
115 points by awinter-py on Jan 4, 2022 | hide | past | favorite | 24 comments


You can do some very neat things with this, especially since it’s implemented with Background Workers

> Care is taken that these extra processes do not interfere with other postmaster tasks: only one such process is started on each ServerLoop iteration. This means a large number of them could be waiting to be started up and postmaster is still able to quickly service external connection requests.

I have had the occasional need to (ab)use it for webhooks https://supabase.com/blog/2021/03/05/postgres-as-a-cron-serv...


cron running on only one server is unreliable and "high availability" cron is complex. So, it's unbelievable how no such solution existed before. It might not be the most elegant way, but having a centralized cron (just like the database is centralized) helps a lot.


> So, it's unbelievable how no such solution existed before

I think a lot of folks solved this problem at the application level using background workers or job processing services.

For example, Celery with Python or Sidekiq with Ruby has had recurring job support for a really long time. Celery supported this since 2010. You could run distributed jobs (recurring or not) this way without using cron directly or requiring a custom DB extension.

IMO it feels much more natural to solve this problem at the app level with a background worker since you can run any task you want, not just DB queries. This also has the added benefit of letting you use your DB library of choice (ORM, etc.) since you can run any code you want within the job.


In the Java world the venerable (pre-2006) Quartz library has been around a long time, and is certainly not the only solution.

https://www.quartz-scheduler.org/overview/


Although Celery itself is distributed, the cron manager (celery-beat) is not and will not coordinate with other nodes. If you have two nodes running celery-beat, you'll get two jobs dispatched every time a cron triggers. imo defeats the whole purpose, at that point I might as well use crontab.


Perhaps you’re unaware, but kubernetes has cron jobs. Highly available, and a cron only runs on one node at a time (in the happy case)


> (in the happy case)

That caveat makes a big difference. If you're making an API call like sending an email, you really don't want it to run twice.


Idempotent resource cleanup tasks are the ideal use case for k8s crons.


Not sure it's the best example, having 0.01% of emails being sent twice is probably acceptable for most businesses. If it's a payment processing, it's not acceptable


> If it's a payment processing, it's not acceptable

In the payment processing company I'm working at we deal with that independently. We set some data on Redis with a TTL to signal some request has already been made in the last X minutes, and if the same request arrives again during the same period the cached response is served


hopefully your email/money movement service accepts an idempotency key which solves that


I’ve been using pg_cron to refresh materialized views, which has kept things simple and self contained for our use.


That is really cool! No more crontab with the PostgreSQL CLI's execute and username (and potentially password) to do e.g. clean up tasks in the database.

Does anyone know which time config this will use? Is it the system time, UTC, or something else?


> Does anyone know which time config this will use?

From the readme:

> Be aware that pg_cron always uses GMT!

(=UTC)


Too bad, that will make it problematic to use for batch jobs that have to run at particular business hour.


What if your 'business hour' was 0130[^] (business logic for some reason, rather than a working hour, presumably; hence quotes) - what do you want to happen when you go in to Summer time, or come out of it? Does everybody want the same thing? Is it even obvious what it probably does such that nobody's going to have a bad night?

[^] in the UK at least, clocks go forward skipping the hour 0100-0200, and repeat it when they go back. If that happens at a different time where you are (this hypothetical feature gets even more complicated! and) then I mean whatever appropriate time for my point to make sense.


Not entirely sure why you're being downvoted, but it's true.

Sometimes you need something to run at 9am daily, in your local time zone. If your local time zone shifts for daylight savings, you can't define it in UTC without changing it twice a year.


Why? Can’t you just convert that business hour to UTC?


ho boy, before I break out "falsehoods programmers believe about time,"[0] nobody wants to manually have to rewrite the UTC time something's supposed to run at when the business hours keep changing relative to UTC for (at the most basic) Daylight Savings reasons.

[0] https://infiniteundo.com/post/25326999628/falsehoods-program...

(if this is your first time seeing this article, have fun! also check out the one about names.)


You don't have to do that "manually". Postgres supports the conversion using the `at time zone` operator.


This extension uses cron syntax and only supports GMT.


That's pretty good. I'm finishing a pure SQL merge replication for PSQL. This will definitely wrap up this little project nicely.


Is this like SQL Agent for MS SQL?


I would've needed this a couple of years ago!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: