Using triggers + history tables (aka audit tables) is the right answer 98% of the time. Just do it. If you're not already doing it, start today. It is a proven technique, in use for _over 30 years_.
It's great if you can store immutable data. Really, really great. But you _probably_ have a ton of mutable data in your database and you are _probably_ forgetting a ton of it every day. Stop forgetting things! Use history tables.
Do not use Papertrail or similar application-space history tracking libraries/techniques. They are slow, error-prone, and incapable of capturing any DB changes that bypass your app stack (which you probably have, and should). Worth remembering that _any_ attempt to capture an "updated" timestamp from your app is fundamentally incorrect, because each of your webheads has its own clock. Use the database clock! It's the only one that's correct!
> each of your webheads has its own clock. Use the database clock!
Yes, for consistency you should use the database clock by embedding the calls to `now()` or similar in the query instead of generating it on the client.
But it's not sufficient to use these timestamps for synchronization. The problem is that these timestamps are generated at the start of the transaction, not the end of the transaction when it commits. So if you poll a table and filter for recent timestamps, you'll miss some from transactions that are committing out of order. You can add a fudge factor like querying back an extra few minutes and removing the duplicates, but some transactions will take longer than a few minutes. There's no upper bound to how long a transaction can take in postgresql, and there's a lot of waste in querying too far back. This approach doesn't work if you care about correctness or efficiency.
There is a way in PostgreSQL to get actual wall clock time of the database server: `clock_timestamp()` regardless of transaction start time.
There's also `statement_timestamp()` per the docs: "returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client)."
This isn't to say any of these methods are the best (or even good) in all cases. Time is tricky, especially if you're trying to do any sort of sequencing.
Estuary (https://estuary.dev ; I'm CTO) gives you a real time data lake'd change log of all the changes happening in your database in your cloud storage -- complete with log sequence number, database time, and even before/after states if you use REPLICA IDENTITY FULL -- with no extra setup in your production DB.
By default, if you then go on to materialize your collections somewhere else (like Snowflake), you get synchronized tables that follow your source DB as they update.
But! You can also transform or materialize the complete history of your tables for auditing purposes from that same underlying data-lake, without going back to your source DB for another capture / WAL reader.
That sure sounds cool but I can't tell from your website that it does any of that. Even giving up on the marketing copy and going straight to the docs... I can't follow them.
Yes, marketing and docs are not our strongest suits and we need to do better.
To be fair, though, we're also trying not to scare off less technical users who see a bullet list like above and think "well this is clearly not for me". It's a hard balance
* Scaling well to multi-TB DBs without pinning the write-ahead log (potentially filling your DB's disk) while the backfill is happening. Instead, our connector constantly reads the WAL and works well in setups like Supabase that have very restrictive WAL sizes (1GB iirc).
* Incremental fault-tolerant backfills that can be stopped and resumed at will.
* Flowing TOAST columns through seamlessly to your materialized destination, without requiring that you resort to REPLICA IDENTITY FULL.
* Being able to offer "precise" captures which are logically consistent in terms of the sequence of create/update/delete events.
The last one becomes really interesting when paired with REPLICA IDENTITY FULL because you can feed the resulting before/after states into an incremental computation (perhaps differential dataflow) for streaming updates of a query.
Our work is based off of the Netflix DBLog paper, which we took and ran with.
I found that referencing session variables from triggers lets me add additional information (eg. a comment on why the change is being made) to the history. I've only done it in a small personal project, but it's worked well there so far.
I agree and this is a good approach - and how we power the activity feed in our app. But it doesn’t solve the issue of “pushing the changes” out. Of course, you can always listen to the audit table WAL changes - best of both worlds.
I enjoyed this blog. I think it provides a great succinct overview of various approaches native to Postgres.
For the "capture changes in an audit table" section, I've had good experiences at a previous company with the Temporal Tables pattern. Unlike other major RDBMS vendors, it's not built into Postgres itself, but there's a simple pattern [1] you can leverage with a SQL function.
This allows you to see a table's state as of a specific point in time. Some sample use cases:
- "What was this user's configuration on Aug 12?"
- "How many records were unprocessed at 11:55pm last night?"
- "Show me the diff on feature flags between now and a week ago"
I once consulted at a company with a very large monolithic SQL Server. It actually wasn't Postgres but let's pretend it was.
It had been around for decades and over time it had ended up being used for all sorts of things within the company. In fact, it was more or less true that every application and business process within the whole company stored its data within this database.
A key issue we had was that because this database had many different applications that queried it, and there were a huge number of processes and procedures that inserted or updated data within it, sometimes queries would break due to upstream insert/update processes being amended or new ones added that broke application-level invariants -- or when a normal process operated differently when there was bad data.
It was very difficult to work out what had happened because often everything that you looked at was written a decade before you and the employees had long since left the company.
Would it be possible to capture changes from a Postgres database in some kind of DAG in order that you could find out things like:
- What processes are inserting, updating or deleting data and historically how are they behaving? For example, do they operate differently ever?
- How are different applications' querying this data? Are there any statistics about their queries which are generally true? Historically how are these statistics changing?
I don't know if there is prior art here, or what kind of approach might allow a tool like this to be made?
(I've thought of making something like this before but I think this is an area in which you'd want to be a core Postgres engineer to make good choices.)
Logical replication, in Postgres, contains all of the information about the change statement (insert/update/delete) in order to logically recreate the same state in another database.
You won't get client-level providence data with each change...
However you could hack around that. The logical replication stream can also include informational messages from the "pg_logical_emit_message" function to insert your own metadata from clients. It might be possible to configure your clients to emit their identifier at the beginning of each transaction.
I’m not sure how to handle queries, but for inserts/updates I have a column that tracks the event source (last updated by). Maybe this is an anti-pattern - I’d love a more robust solution.
Technically log replication has everything done by everything, and if you are careful with triggers you can also track everything as well, using a DDL/DML capture table(DCL too if you're worried!).
These approaches work on basically every type of SQL solution that uses WAL/triggers.
For your specific question I have a trigger approach many times in SQL Server but it has a tendency to slow things down if you are logging every query so designing an insertion mechanism that doesn't bog down production isn't perfect, and you might want to perform some sampling.
An approach I've taken is temporal tables w/ Application and UpdatedBy fields. That gives you a permanent record of every change, what application did it, and what user performed the action, at what time, and then lets you query the database as if you were querying it at that point in time. You can add triggers to fail CRUD if those fields are not updated if you want to get really paranoid.
There's a lot of overhead to this in terms of storage, so it's not suitable for high-throughput or cost-constrained transactional systems, but it's something for the toolbox.
Even if it doesn't you can start by adding the application name to the connection string and you could probably do something gnarly with triggers to write that in a table and get it pushed in the logical replication.
I had the following idea recently: Go through all scripts/programs that send queries to the db and append a comment to the queey containing a unique id for that query that links it to the script/program. The query log hopefully shows the comment with the id so you can trace the origin.
For the "I just need to emit and visualize a DAG" problem, at one point I wrote a python script that would filter for the relevant data, emit line-by-line mermaidJS-flavored markdown, and then shove that into an HTML file that imported the MermaidJS library.
The MermaidJS solves for the DAG and visualizes it, and your browser has enough context to let you CTRL-F for any interesting phrases you put in the label.
Relax, don't do it. When you want this, you're turning the relations in postgres into a contract. No service gets to persist internal state. If you're _really_ committed to domain driven design it could work... but you'd be better off with a light (but real) event driven system.
Sequences kind of have the same issue, because you don't know if a gap is because of a rollback or an uncommitted transaction. Though with some logic you can do a pretty good job at this with sequences. And then you're not in the realm of "simple" anymore, at all.
I've had pretty much the exact same problem and what I went for in my low-volume case was to simply add advisory locks such that I can guarantee the transaction start times provide correct ordering.
Woah, that's news to me. Is that true even if triggers are used to update a column?
CREATE OR REPLACE FUNCTION
update_updated_at_function()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER
update_updated_at_trigger
BEFORE INSERT OR UPDATE ON
"my_schema"."my_table"
FOR EACH ROW EXECUTE PROCEDURE
update_updated_at_function();
END $$;
Is it possible for two rows to have `updated_at` timestamps that are different from the transaction commit order even if the above function and trigger are used? It's alright if `updated_at` and the commit timestamp are not the same, but the `updated_at` must represent commit order accurate to the millisecond/microsecond.
To confirm your fear, you can't use the updated_at timestamp as a proxy for commit order. The commits happen in a different order, and can be arbitrarily far apart, like hours or days depending on how long your transactions can last.
now() is the timestamp the transaction began at. There is no function to return the commit timestamp because you have to write the value before you commit.
For polling, instead of updated_at, I use a _txid column that gets set by a trigger to the current transaction ID. Then, when polling, use txid_current() to see which transactions have committed and which haven’t. It’s a little dicey and super easy to hit fencepost errors, but it’s been running smoothly in production for a few years.
There’s been attempts to revise the SQL standard to accommodate various types of temporality as a “first party” feature.
I think that we won’t see traction at the RDBMS “kernel space” until it’s in the SQL standard. There are many valid and complex options to choose from, and there are successful solutions in user space that aren’t overly burdened, performance-wise, from being in user space.
FWIW, the “audit table” approach is the approach that people who study this field gravitate towards. Mainly because it maintains consistent ACIDity in the database, and maintains Postgres as the single point of failure (a trade off vs introducing a proxy/polling job).
I think there is a big gap in this space in the data world - having query results incrementally pushed to me rather than me asking the data store for them.
I do a lot of work in real-time and streaming analytics. We can do stream processing and maybe some work in the datastore with a materialised view. However, once data has hit your database or data lake you are then effectively back to polling for changes further downstream.
If I want to respond to some situation occuring in my data or update things on a screen without a page refresh then there isn't really a clean solution. Even the solutions in this article feel hacky rather than first class citizens.
Say I want to have a report updating in real time without a page refresh. The go-to approach seems to be to load your data from the database, then stream changes to your GUI through Kafka and a websocket, but then you end up running the weird Lambda architecture with some analytics going through code and some through your database.
There is innovation in this space. KSQL and Kafka Streams can emit changes. Materialize has subscriptions. Clickhouse has live views etc. A lot of these features are new or in preview though and not quite right. Having tried them all, I think they leave far too much work on the part of the developer. I would like a library with the option of [select * from orders with suscribe] and just get a change feed.
I think this is a really important space which hasn't had enough attention.
I know the NoSQL world had some progress in this space. RethinkDB was going in this direction around a decade ago if I recall. I would really like this from a modern relational database or data warehouse though. Polling sucks.
There’s a huge footgun with replication that the article didn’t mention, and it’s the reason why I don’t use it.
Postgres is VERY committed to making sure a replication slot’s consumer doesn’t miss any data. This means that if a consumer stops consuming data from the slot, Postgres will helpfully store all that missed data… right up until the disk fills up and the database falls over. Had this happen during prototyping with two different SaaS DBs, and the only way to get it back up was to file a support ticket. (I can’t remember if metrics warned that the disk was about to fill up or not). Basically, if your replication slot consumer stops reading, that should trigger some kind of alert.
The other reason I don’t use it: the code path to get the initial snapshot of a table is totally different from the code path to read changes. Initializing the read from the replication slot so you never miss any changes is nontrivial.
It’s too bad, because replication is obviously the least hacky solution for change capture.
I use polling, but storing the txid instead of updated_at.
> Postgres is VERY committed to making sure a replication slot’s consumer doesn’t miss any data. This means that if a consumer stops consuming data from the slot, Postgres will helpfully store all that missed data… right up until the disk fills up and the database falls over.
> The other reason I don’t use it: the code path to get the initial snapshot of a table is totally different from the code path to read changes.
Hm. For anything dealing with larger data volumes you IME want to handle those things differently (so you can initialize in parallel, initialize from physical backups and similar things). But I can see why it could be useful to optionally stream out existing data out data after slot creation.
> Initializing the read from the replication slot so you never miss any changes is nontrivial.
That part shouldn't be hard - what gave you difficulty?
One trick to deal with the first problem is sending logical decoding messages to yourself. That keeps the retained WAL low. Another potentially useful thing is temporary replication slots which clean themselves up on connection loss. I use those when I don’t need all of the changes. There is also configuration for setting the maximum WAL that’s retained so you don’t murder the server.
I've run into this footgun before! It's very subtle – you de-provision a consumer, and that feels like it should have no impact on the primary. But it creates a ticking time bomb.
Mind expanding on how you use txid instead of updated_at?
What I would love is a turnkey service I could set up with logical replication and a straightforward configuration that can listen for any changes on certain tables (ideally with optional filters) that will take those changes and put them in a queue / ESB / whatever. If youre designing a new application from scratch and you have only one place that certain logical changes happen you can do this from the start. But in any system not designed that way from the start, the database is the place where all things go. Using that as your event source allows everything to keep on working like they always have but allow you to now notify new processes when something changes.
I've actually been thinking about turning this idea into a product where you can just point it at your postgres database and select the tables you want to listen to (with filters, like you describe). And have that forwarded to a webhook (with possibility of other protocols like websockets).
I'd love to hear folks thoughts on that (and if it would be something people would pay for). And if anyone might want to partner up on this.
I remember it being pretty simple (like, run one or two bash commands) to get a source table streamed into a kafka topic, or get a kafka topic streamed into a sink datastore (S3, mysql, cassandra, redshift, etc). Kafka topics can also be filtered/transformed pretty easily.
E.g. in https://engineeringblog.yelp.com/2021/04/powering-messaging-... they run `datapipe datalake add-connection --namespace main --source message_enabledness`, which results in the `message_enabledness` table being streamed into a (daily?) parquet snapshot in S3, registered in AWS Glue.
It is open source but it's more of the "look at how we did this" open source VS the "it would be easy to stick this into your infra and use it" kind of open source :(
The main thing everyone always wants to know is whom to blame. Who added this store? Who changed the name of this store? Who deleted this store? The last one is particularly tricky - keeping audits of deleted things. Do you just soft delete everything?
Unrelated to the article, but I gotta say I just learned about Sequin from this link, and it seems very cool! But, I can't think of a use-case for it. Can you please explain the value-add?
Hey there, author here. We help our customers build their integrations faster/easier by syncing API data with Postgres tables.
Because your API data is in Postgres, you can query it exactly how you like. You're not limited by the API's supported query params, batch sizes, or rate limits. You can use SQL or your ORM.
This also means you don't need to learn all the API's quirks. And every API we support has the same interface. (Which will become more valuable as we add more APIs!)
We offer a demo Postgres instance for getting up and running. You can move the sync to your db when you're ready for production. Therefore, should be compatible with extensions like `pg_graphql` (can't think of customers using that extension, but we have quite a few using Hasura).
Any other questions, feel free to email: anthony@[domain]
If you need to get data out as a feed and you need something robust, you almost always evolve until you land on the WAL option. The only problem with WAL is that many of us are (rightly) moving towards managed cloud sql instances such as AWS Aurora and then your options narrow. It's a fair trade off, but for all the amaze of Supabase, Neon Cockroach et al a PG WAL to AWS native bus / stream with filtering solution is sorely needed. Apologies if this already exists and I've missed it :)
Unless it has been updated, another caveat of LISTEN / NOTIFY is that it becomes slower and slower the more NOTIFY you use within a transaction.
If I remember correctly, the reason is that postgres attempts to de-duplicate messages using a naive quadratic algorithm, causing horrible slowdowns for large query with a FOR EACH ROW trigger.
The way we mitigated that is by creating a temporary table and write the notification payloads. An AFTER * trigger reads distinct rows, groups the data and sends the messages.
I didn't see what imho is a significant LISTEN/NOTIFY caveat, which is that it is sticky to a session. As such, it requires a dedicated connection - eg, it isn't compatible with pgBouncer in transaction mode.
Here's a quick rundown of how to do it generically https://gist.github.com/slotrans/353952c4f383596e6fe8777db5d... (trades off space efficiency for "being easy").
It's great if you can store immutable data. Really, really great. But you _probably_ have a ton of mutable data in your database and you are _probably_ forgetting a ton of it every day. Stop forgetting things! Use history tables.
cf. https://github.com/matthiasn/talk-transcripts/blob/master/Hi...
Do not use Papertrail or similar application-space history tracking libraries/techniques. They are slow, error-prone, and incapable of capturing any DB changes that bypass your app stack (which you probably have, and should). Worth remembering that _any_ attempt to capture an "updated" timestamp from your app is fundamentally incorrect, because each of your webheads has its own clock. Use the database clock! It's the only one that's correct!