another reason why you provide data only over API - don't reach into my tables a...

simonw · on Oct 30, 2023

An approach I like better than "only access my data via API" is this:

The team that maintains the service is also responsible for how that service is represented in the data warehouse.

The data warehouse tables - effectively denormalized copies of the data that the service stores - are treated as another API contract - they are clearly documented and tested as such.

If the team refactors, they also update the scripts that populate the data warehouse.

If that results in specific columns etc becoming invalid they document that in their release notes, and ideally notify other affected teams.

hermanradtke · on Oct 30, 2023

This same thing can be applied to contracts when firing events, etc. I point people to https://engineering.linkedin.com/distributed-systems/log-wha... and use the same approach to ownership.

simonw · on Oct 30, 2023

Yeah, having a documented stream of published events in Kafka is a similar API contract the team can be responsible for - it might even double as the channel through which the data warehouse is populated.