Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ways to capture changes in Postgres (sequin.io)
297 points by chuckhend on Sept 22, 2023 | hide | past | favorite | 81 comments


Using triggers + history tables (aka audit tables) is the right answer 98% of the time. Just do it. If you're not already doing it, start today. It is a proven technique, in use for _over 30 years_.

Here's a quick rundown of how to do it generically https://gist.github.com/slotrans/353952c4f383596e6fe8777db5d... (trades off space efficiency for "being easy").

It's great if you can store immutable data. Really, really great. But you _probably_ have a ton of mutable data in your database and you are _probably_ forgetting a ton of it every day. Stop forgetting things! Use history tables.

cf. https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

Do not use Papertrail or similar application-space history tracking libraries/techniques. They are slow, error-prone, and incapable of capturing any DB changes that bypass your app stack (which you probably have, and should). Worth remembering that _any_ attempt to capture an "updated" timestamp from your app is fundamentally incorrect, because each of your webheads has its own clock. Use the database clock! It's the only one that's correct!


> each of your webheads has its own clock. Use the database clock!

Yes, for consistency you should use the database clock by embedding the calls to `now()` or similar in the query instead of generating it on the client.

But it's not sufficient to use these timestamps for synchronization. The problem is that these timestamps are generated at the start of the transaction, not the end of the transaction when it commits. So if you poll a table and filter for recent timestamps, you'll miss some from transactions that are committing out of order. You can add a fudge factor like querying back an extra few minutes and removing the duplicates, but some transactions will take longer than a few minutes. There's no upper bound to how long a transaction can take in postgresql, and there's a lot of waste in querying too far back. This approach doesn't work if you care about correctness or efficiency.


There is a way in PostgreSQL to get actual wall clock time of the database server: `clock_timestamp()` regardless of transaction start time.

There's also `statement_timestamp()` per the docs: "returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client)."

https://www.postgresql.org/docs/current/functions-datetime.h...

This isn't to say any of these methods are the best (or even good) in all cases. Time is tricky, especially if you're trying to do any sort of sequencing.


Estuary (https://estuary.dev ; I'm CTO) gives you a real time data lake'd change log of all the changes happening in your database in your cloud storage -- complete with log sequence number, database time, and even before/after states if you use REPLICA IDENTITY FULL -- with no extra setup in your production DB.

By default, if you then go on to materialize your collections somewhere else (like Snowflake), you get synchronized tables that follow your source DB as they update.

But! You can also transform or materialize the complete history of your tables for auditing purposes from that same underlying data-lake, without going back to your source DB for another capture / WAL reader.


I'm evaluating Flow for CDC. Do you support logical decoding messages from `pg_logical_emit_message`? This would allow us to add audit metadata[^1].

[^1]: https://www.infoq.com/articles/wonders-of-postgres-logical-d...


No. But this is neat, and at a glance it looks straight forward to add. Happy to discuss further!


That sure sounds cool but I can't tell from your website that it does any of that. Even giving up on the marketing copy and going straight to the docs... I can't follow them.


Hi fellow Dark Tower friend.

Yes, marketing and docs are not our strongest suits and we need to do better. To be fair, though, we're also trying not to scare off less technical users who see a bullet list like above and think "well this is clearly not for me". It's a hard balance


Are you using debezium to capture changes?


No. We implemented our own [1] for a few reasons:

* Scaling well to multi-TB DBs without pinning the write-ahead log (potentially filling your DB's disk) while the backfill is happening. Instead, our connector constantly reads the WAL and works well in setups like Supabase that have very restrictive WAL sizes (1GB iirc).

* Incremental fault-tolerant backfills that can be stopped and resumed at will.

* Flowing TOAST columns through seamlessly to your materialized destination, without requiring that you resort to REPLICA IDENTITY FULL.

* Being able to offer "precise" captures which are logically consistent in terms of the sequence of create/update/delete events.

The last one becomes really interesting when paired with REPLICA IDENTITY FULL because you can feed the resulting before/after states into an incremental computation (perhaps differential dataflow) for streaming updates of a query.

Our work is based off of the Netflix DBLog paper, which we took and ran with.

[1] https://github.com/estuary/connectors/tree/main/source-postg...


I found that referencing session variables from triggers lets me add additional information (eg. a comment on why the change is being made) to the history. I've only done it in a small personal project, but it's worked well there so far.


Thanks for that example. I got Code Interpreter to port it to SQLite and demonstrate it working here: https://chat.openai.com/share/b5113cb1-10df-4a38-adde-5ec0e7...

I have my own SQLite implementation of a similar pattern (but using columns rather than JSON) which I describe here: https://simonwillison.net/2023/Apr/15/sqlite-history/


I agree and this is a good approach - and how we power the activity feed in our app. But it doesn’t solve the issue of “pushing the changes” out. Of course, you can always listen to the audit table WAL changes - best of both worlds.


If you have a GDPR request to delete everything for a user, do you go through the audit table and delete everything related to that user?


Unfortunately yes.


I enjoyed this blog. I think it provides a great succinct overview of various approaches native to Postgres.

For the "capture changes in an audit table" section, I've had good experiences at a previous company with the Temporal Tables pattern. Unlike other major RDBMS vendors, it's not built into Postgres itself, but there's a simple pattern [1] you can leverage with a SQL function.

This allows you to see a table's state as of a specific point in time. Some sample use cases:

- "What was this user's configuration on Aug 12?"

- "How many records were unprocessed at 11:55pm last night?"

- "Show me the diff on feature flags between now and a week ago"

[1]: https://github.com/nearform/temporal_tables


I once consulted at a company with a very large monolithic SQL Server. It actually wasn't Postgres but let's pretend it was.

It had been around for decades and over time it had ended up being used for all sorts of things within the company. In fact, it was more or less true that every application and business process within the whole company stored its data within this database.

A key issue we had was that because this database had many different applications that queried it, and there were a huge number of processes and procedures that inserted or updated data within it, sometimes queries would break due to upstream insert/update processes being amended or new ones added that broke application-level invariants -- or when a normal process operated differently when there was bad data.

It was very difficult to work out what had happened because often everything that you looked at was written a decade before you and the employees had long since left the company.

Would it be possible to capture changes from a Postgres database in some kind of DAG in order that you could find out things like:

- What processes are inserting, updating or deleting data and historically how are they behaving? For example, do they operate differently ever?

- How are different applications' querying this data? Are there any statistics about their queries which are generally true? Historically how are these statistics changing?

I don't know if there is prior art here, or what kind of approach might allow a tool like this to be made?

(I've thought of making something like this before but I think this is an area in which you'd want to be a core Postgres engineer to make good choices.)


Logical replication, in Postgres, contains all of the information about the change statement (insert/update/delete) in order to logically recreate the same state in another database.

You won't get client-level providence data with each change...

However you could hack around that. The logical replication stream can also include informational messages from the "pg_logical_emit_message" function to insert your own metadata from clients. It might be possible to configure your clients to emit their identifier at the beginning of each transaction.


I’m not sure how to handle queries, but for inserts/updates I have a column that tracks the event source (last updated by). Maybe this is an anti-pattern - I’d love a more robust solution.


Technically log replication has everything done by everything, and if you are careful with triggers you can also track everything as well, using a DDL/DML capture table(DCL too if you're worried!).

These approaches work on basically every type of SQL solution that uses WAL/triggers.

For your specific question I have a trigger approach many times in SQL Server but it has a tendency to slow things down if you are logging every query so designing an insertion mechanism that doesn't bog down production isn't perfect, and you might want to perform some sampling.


You get a lot just by making sure each application has its own user access.


This seems like the right approach to me.

An approach I've taken is temporal tables w/ Application and UpdatedBy fields. That gives you a permanent record of every change, what application did it, and what user performed the action, at what time, and then lets you query the database as if you were querying it at that point in time. You can add triggers to fail CRUD if those fields are not updated if you want to get really paranoid.

There's a lot of overhead to this in terms of storage, so it's not suitable for high-throughput or cost-constrained transactional systems, but it's something for the toolbox.


Even if it doesn't you can start by adding the application name to the connection string and you could probably do something gnarly with triggers to write that in a table and get it pushed in the logical replication.


yeah, for SQL Server the connection string has an ApplicationName property for this purpose, it's pretty useful :)


This is an interesting idea


I had the following idea recently: Go through all scripts/programs that send queries to the db and append a comment to the queey containing a unique id for that query that links it to the script/program. The query log hopefully shows the comment with the id so you can trace the origin.


For the "I just need to emit and visualize a DAG" problem, at one point I wrote a python script that would filter for the relevant data, emit line-by-line mermaidJS-flavored markdown, and then shove that into an HTML file that imported the MermaidJS library.

The MermaidJS solves for the DAG and visualizes it, and your browser has enough context to let you CTRL-F for any interesting phrases you put in the label.


I think it's users + pgAudit.


If you go the "audit table" path, just use "pgaudit" , it's a battle-tested extension, that is even available on RDS if you use AWS

https://github.com/pgaudit/pgaudit/blob/master/README.md

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appen...


There is also the temporal_tables extension [0], which was already discussed in HN. [1]

[0] https://github.com/arkhipov/temporal_tables [1] https://news.ycombinator.com/item?id=26748096


Relax, don't do it. When you want this, you're turning the relations in postgres into a contract. No service gets to persist internal state. If you're _really_ committed to domain driven design it could work... but you'd be better off with a light (but real) event driven system.


The relations in the database _are_ a contract whether you like it or not.

Event-driven anything is 1000x more complex.


Polling an updated_at column is not robust in its most simple form, as transactions are not guaranteed to commit in that order.


Author here. Good point. For those that are curious, parent is referring to the following situation:

1. Transaction A starts, its before trigger fires, Row 1 has its updated_at timestamp set to 2023-09-22 12:00:01.

2. Transaction B starts a moment later, its before trigger fires, Row 2 has its updated_at timestamp set to 2023-09-22 12:00:02.

3. Transaction B commits successfully.

4. Polling query runs, sees Row 2 as the latest change, and updates its cursor to 2023-09-22 12:00:02.

5. Transaction A then commits successfully.

A simple way to avoid this issue is to not poll close to real-time, as the order is eventually consistent.

Perhaps a more robust suggestion would be to use a sequence? Imagine a new column, `updated_at_idx`, that incremented every time a row was changed.


Sequences kind of have the same issue, because you don't know if a gap is because of a rollback or an uncommitted transaction. Though with some logic you can do a pretty good job at this with sequences. And then you're not in the realm of "simple" anymore, at all.


Any ideas for a simple polling implementation that's more robust?


It's not exactly simple as it involves some postgres specific knowledge, but you can make it reliable when working with transaction ids (see https://event-driven.io/en/ordering_in_postgres_outbox/).


I've had pretty much the exact same problem and what I went for in my low-volume case was to simply add advisory locks such that I can guarantee the transaction start times provide correct ordering.


Set the trigger to add the primary key + change time to a separate table, then scan/truncate that table to poll changes.


Woah, that's news to me. Is that true even if triggers are used to update a column?

  CREATE OR REPLACE FUNCTION
      update_updated_at_function()
  RETURNS TRIGGER AS $$
  BEGIN
      NEW.updated_at = now();
      RETURN NEW;
  END;
  $$ language 'plpgsql';


  CREATE TRIGGER
      update_updated_at_trigger
  BEFORE INSERT OR UPDATE ON
      "my_schema"."my_table"
  FOR EACH ROW EXECUTE PROCEDURE
      update_updated_at_function();
  END $$;

Is it possible for two rows to have `updated_at` timestamps that are different from the transaction commit order even if the above function and trigger are used? It's alright if `updated_at` and the commit timestamp are not the same, but the `updated_at` must represent commit order accurate to the millisecond/microsecond.


To confirm your fear, you can't use the updated_at timestamp as a proxy for commit order. The commits happen in a different order, and can be arbitrarily far apart, like hours or days depending on how long your transactions can last.


now() is the timestamp the transaction began at. There is no function to return the commit timestamp because you have to write the value before you commit.


For polling, instead of updated_at, I use a _txid column that gets set by a trigger to the current transaction ID. Then, when polling, use txid_current() to see which transactions have committed and which haven’t. It’s a little dicey and super easy to hit fencepost errors, but it’s been running smoothly in production for a few years.


Outstanding writeup.

If you're an Elixir & Postgres user, I have a little library for listening to WAL changes using a similar approach:

https://github.com/cpursley/walex


this looks great. my startup is elixir based and I've been looking for something like this.


All of those are kind of bad, polling being the most practical, imho.

Would be great if Postgres innovated in this area.


There’s been attempts to revise the SQL standard to accommodate various types of temporality as a “first party” feature.

I think that we won’t see traction at the RDBMS “kernel space” until it’s in the SQL standard. There are many valid and complex options to choose from, and there are successful solutions in user space that aren’t overly burdened, performance-wise, from being in user space.

FWIW, the “audit table” approach is the approach that people who study this field gravitate towards. Mainly because it maintains consistent ACIDity in the database, and maintains Postgres as the single point of failure (a trade off vs introducing a proxy/polling job).


Is one second polling interval practical?


I think there is a big gap in this space in the data world - having query results incrementally pushed to me rather than me asking the data store for them.

I do a lot of work in real-time and streaming analytics. We can do stream processing and maybe some work in the datastore with a materialised view. However, once data has hit your database or data lake you are then effectively back to polling for changes further downstream.

If I want to respond to some situation occuring in my data or update things on a screen without a page refresh then there isn't really a clean solution. Even the solutions in this article feel hacky rather than first class citizens.

Say I want to have a report updating in real time without a page refresh. The go-to approach seems to be to load your data from the database, then stream changes to your GUI through Kafka and a websocket, but then you end up running the weird Lambda architecture with some analytics going through code and some through your database.

There is innovation in this space. KSQL and Kafka Streams can emit changes. Materialize has subscriptions. Clickhouse has live views etc. A lot of these features are new or in preview though and not quite right. Having tried them all, I think they leave far too much work on the part of the developer. I would like a library with the option of [select * from orders with suscribe] and just get a change feed.

I think this is a really important space which hasn't had enough attention.



MongoDB has had ChangeStreams for a long time now.


Is this a stream of changes to a query?

I know the NoSQL world had some progress in this space. RethinkDB was going in this direction around a decade ago if I recall. I would really like this from a modern relational database or data warehouse though. Polling sucks.


You can read more about it here https://www.mongodb.com/docs/manual/changeStreams/

My feeling is that database needs to be built with replication in mind from the get-go to have something like this work well.


There’s a huge footgun with replication that the article didn’t mention, and it’s the reason why I don’t use it.

Postgres is VERY committed to making sure a replication slot’s consumer doesn’t miss any data. This means that if a consumer stops consuming data from the slot, Postgres will helpfully store all that missed data… right up until the disk fills up and the database falls over. Had this happen during prototyping with two different SaaS DBs, and the only way to get it back up was to file a support ticket. (I can’t remember if metrics warned that the disk was about to fill up or not). Basically, if your replication slot consumer stops reading, that should trigger some kind of alert.

The other reason I don’t use it: the code path to get the initial snapshot of a table is totally different from the code path to read changes. Initializing the read from the replication slot so you never miss any changes is nontrivial.

It’s too bad, because replication is obviously the least hacky solution for change capture.

I use polling, but storing the txid instead of updated_at.


> Postgres is VERY committed to making sure a replication slot’s consumer doesn’t miss any data. This means that if a consumer stops consuming data from the slot, Postgres will helpfully store all that missed data… right up until the disk fills up and the database falls over.

You can configure a size limit after which the slot gets marked as invalid, instead of continuing to retain space. See https://www.postgresql.org/docs/current/runtime-config-repli...

What other behaviour would you like?

> The other reason I don’t use it: the code path to get the initial snapshot of a table is totally different from the code path to read changes.

Hm. For anything dealing with larger data volumes you IME want to handle those things differently (so you can initialize in parallel, initialize from physical backups and similar things). But I can see why it could be useful to optionally stream out existing data out data after slot creation.

> Initializing the read from the replication slot so you never miss any changes is nontrivial.

That part shouldn't be hard - what gave you difficulty?


One trick to deal with the first problem is sending logical decoding messages to yourself. That keeps the retained WAL low. Another potentially useful thing is temporary replication slots which clean themselves up on connection loss. I use those when I don’t need all of the changes. There is also configuration for setting the maximum WAL that’s retained so you don’t murder the server.


I've run into this footgun before! It's very subtle – you de-provision a consumer, and that feels like it should have no impact on the primary. But it creates a ticking time bomb.

Mind expanding on how you use txid instead of updated_at?



What I would love is a turnkey service I could set up with logical replication and a straightforward configuration that can listen for any changes on certain tables (ideally with optional filters) that will take those changes and put them in a queue / ESB / whatever. If youre designing a new application from scratch and you have only one place that certain logical changes happen you can do this from the start. But in any system not designed that way from the start, the database is the place where all things go. Using that as your event source allows everything to keep on working like they always have but allow you to now notify new processes when something changes.


If you're using Elixir, check out https://github.com/cpursley/walex

I've actually been thinking about turning this idea into a product where you can just point it at your postgres database and select the tables you want to listen to (with filters, like you describe). And have that forwarded to a webhook (with possibility of other protocols like websockets).

I'd love to hear folks thoughts on that (and if it would be something people would pay for). And if anyone might want to partner up on this.


Yo :D This is what Supabase Realtime does!

https://github.com/supabase/realtime

Spin up a Supabase database and then subscribe to changes with WebSockets.

You can play with it here once you have a db: https://realtime.supabase.com/inspector/new


Yeah, Supabase is awesome. In fact, WalEx originally started out as stolen code from realtime so I could do stuff right in Elixir (biz logic).


What advantages does this have compared to Debezium?


You wouldn't have to host it yourself (not everyone knows how or wants to run a heavy Java app).

Add your database string, a couple extra migrations, your webhook endpoint - and you're off to the races.

Target customer would be low-code space (think admin dashboards on postgres, webhook integration tools).


I used to work at Yelp, which had something that I think it similar to what you are describing called Data Pipeline (https://engineeringblog.yelp.com/2019/12/cassandra-source-co...).

I remember it being pretty simple (like, run one or two bash commands) to get a source table streamed into a kafka topic, or get a kafka topic streamed into a sink datastore (S3, mysql, cassandra, redshift, etc). Kafka topics can also be filtered/transformed pretty easily.

E.g. in https://engineeringblog.yelp.com/2021/04/powering-messaging-... they run `datapipe datalake add-connection --namespace main --source message_enabledness`, which results in the `message_enabledness` table being streamed into a (daily?) parquet snapshot in S3, registered in AWS Glue.

It is open source but it's more of the "look at how we did this" open source VS the "it would be easy to stick this into your infra and use it" kind of open source :(


Something like Debezium? https://debezium.io


- https://hevodata.com/

- https://airbyte.com/

- https://www.stitchdata.com/

Unfortunately none of them are perfect as your data scales, each have up and downsides...


The main thing everyone always wants to know is whom to blame. Who added this store? Who changed the name of this store? Who deleted this store? The last one is particularly tricky - keeping audits of deleted things. Do you just soft delete everything?


Soft delete is really hard to do well IMO, especially if it’s user facing. There have been some long threads on HN before about that.


Unrelated to the article, but I gotta say I just learned about Sequin from this link, and it seems very cool! But, I can't think of a use-case for it. Can you please explain the value-add?


Hey there, author here. We help our customers build their integrations faster/easier by syncing API data with Postgres tables.

Because your API data is in Postgres, you can query it exactly how you like. You're not limited by the API's supported query params, batch sizes, or rate limits. You can use SQL or your ORM.

This also means you don't need to learn all the API's quirks. And every API we support has the same interface. (Which will become more valuable as we add more APIs!)

So, the value is less code and simpler code.


Interesting, thanks. Do you manage the DB yourself or do you sync it to customers' existing DBs? Does this play nicely with pg_graphq?


We offer a demo Postgres instance for getting up and running. You can move the sync to your db when you're ready for production. Therefore, should be compatible with extensions like `pg_graphql` (can't think of customers using that extension, but we have quite a few using Hasura).

Any other questions, feel free to email: anthony@[domain]


If you need to get data out as a feed and you need something robust, you almost always evolve until you land on the WAL option. The only problem with WAL is that many of us are (rightly) moving towards managed cloud sql instances such as AWS Aurora and then your options narrow. It's a fair trade off, but for all the amaze of Supabase, Neon Cockroach et al a PG WAL to AWS native bus / stream with filtering solution is sorely needed. Apologies if this already exists and I've missed it :)


It exists [0] but does not seem to be that interesting to users

[0] https://github.com/subzerocloud/pg-event-proxy-example


Unless it has been updated, another caveat of LISTEN / NOTIFY is that it becomes slower and slower the more NOTIFY you use within a transaction.

If I remember correctly, the reason is that postgres attempts to de-duplicate messages using a naive quadratic algorithm, causing horrible slowdowns for large query with a FOR EACH ROW trigger.

The way we mitigated that is by creating a temporary table and write the notification payloads. An AFTER * trigger reads distinct rows, groups the data and sends the messages.


There’s also an 8000 character limit which is why we went to the WAL approach.


This episode of the Postgres.FM podcast may also be of interest (it's about implementing queues in Postgres).

https://postgres.fm/episodes/queues-in-postgres


I didn't see what imho is a significant LISTEN/NOTIFY caveat, which is that it is sticky to a session. As such, it requires a dedicated connection - eg, it isn't compatible with pgBouncer in transaction mode.


Listen/Notify Is an underrated tool.


Not sure I agree, it’s got some really annoying footguns and limitations. I’ve found much more success with logical replication.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: