As a long time user and developer of databases, I would suggest isolation failures are not actually the source of most data related bugs. Most bugs I deal with are due to alternative failure modes like:
* We didn't think about how we would retry this operation when something fails or times out (idempotency)
* We didn't put the appropriate checksums in the right place (corruption)
* We didn't handle the load, often due to trying to provide stronger guarantees than the application needs, and went down causing lost operations (performance bottlenecks)
* We deployed bad software to the app or database, causing irreparable corruption that can't be fixed because we already purged the relevant commit/redo logs + snapshots.
I legitimately don't understand the calls for "SERIALIZABLE is the only valid isolation level" - I have not typically (ever that I can recall) seen at-scale production systems pay that cost for writes _and_ reads. Almost all applications I've seen (including banking/payment software) are fine with eventually consistent reads, as long as the staleness period is understood and reasonably bounded in time. Once you move past a single geographic datacenter, serializable writes become extremely expensive unless you can automatically home users to the appropriate leader datacenter, which most engineering teams can't guarantee.
The key is typically not isolation, it's modeling your application in an idempotent fashion that doesn't require isolation to be correct and keeping snapshots and those idempotent operation logs for a good few weeks at minimum. Maybe the Java analogy would be "if you can design it to not need locks, do that".
Serializble is easy to reason about and it also moves the problems with distributed systems to the database where it can more appropriately be handled imo.
It is by no means a silver bullet and depending on your application it may not be the right choice.
The whole point of the RDBMS revolution in the 70s and 80s was to try to bring about a world where developers did not have to care about how their data was stored, and could rely on consistency (and data representational independence)
The way this should all have gone down is that the caching story should have been something that DB vendors resolved, rather than something pushed into the application tier. But the push towards three tier architectures, and OOP and ORMs, meant this wasn't feasible.
What would be ideal is a single consistent data retrieval model, which extends from the physical retrieval of relations, all the way up to the presentation layer, all one transaction, and handles caching for you. There is already caching happening within the DBMS, for example...
I'm saying the line between the two is largely of our own making. The push towards OO and component models meant a strong separation between the two layers -- this was and is accepted as the "right" way to model things. But it comes with the cost of leaky abstractions, potentially broken isolation models, and high non-essential complexity by nature of the constant transition between components.
If it wasn't for this, we could be looking at DB architectures in which application logic co-habits with the DB. This doesn't imply application logic in the DB, but means that the DB's view of the data moves its way up into the application. Where the logic gets execute isn't as much the concern as what that logic operates on and that the data isolation model is consistent.
I am also of the opinion that the relational model, with its predicate-logic view of the world, is a richer way to model information than objects. So that's my bias.
A lot of this is straight out of the "Out of the Tarpit" paper, FWIW.
Something like Hibernate in Java will fetch data from the database once, populate objects (potentially making cycles and complex relationships between Java objects), and then let your business logic deal with those long-running, persistent Java objects (as opposed to objects that you deallocate right after being done with then after you queried the database)
This means that if you ever happen to use this object again in another context without making a new query, you risk dealing with stale data. And this happens all the time, because querying the db is seen as "expensive" and reusing model objects is "cheap"
* We didn't think about how we would retry this operation when something fails or times out (idempotency)
* We didn't put the appropriate checksums in the right place (corruption)
* We didn't handle the load, often due to trying to provide stronger guarantees than the application needs, and went down causing lost operations (performance bottlenecks)
* We deployed bad software to the app or database, causing irreparable corruption that can't be fixed because we already purged the relevant commit/redo logs + snapshots.
I legitimately don't understand the calls for "SERIALIZABLE is the only valid isolation level" - I have not typically (ever that I can recall) seen at-scale production systems pay that cost for writes _and_ reads. Almost all applications I've seen (including banking/payment software) are fine with eventually consistent reads, as long as the staleness period is understood and reasonably bounded in time. Once you move past a single geographic datacenter, serializable writes become extremely expensive unless you can automatically home users to the appropriate leader datacenter, which most engineering teams can't guarantee.
The key is typically not isolation, it's modeling your application in an idempotent fashion that doesn't require isolation to be correct and keeping snapshots and those idempotent operation logs for a good few weeks at minimum. Maybe the Java analogy would be "if you can design it to not need locks, do that".