isoap's comments

isoap · on Nov 25, 2016

In such a situation though, shouldn't it raise an error? At least that way, the user could have a chance to recover.

Why should it make an arbitrary decision that unknowingly corrupts data for some users?

hamax · on Nov 25, 2016

The only way for C* to know that it should raise an error is if it would implicitly protect all writes with LWT and this is not what most users want.

Following the parent's example, if you don't protect memory access with a lock, you can't know that somebody else locked it.

prodigal_erik · on Nov 25, 2016

"Must be consistent" is a property of the data, not of a particular transaction. Data on which inconsistent transactions are ever allowed should be declared, preferably with big hazard signs.

pkolaczk · on Nov 26, 2016

The data are not either conistent or inconsistent. The data are either correct or incorrect, and the definition of "correct" is determined by business requirements. Most of the time weak eventual consistency model provided by Cassandra is sufficient to keep data correct if you are using it right. E.g. you don't need ACID and serializable consistency to do financial stuff correctly - banks and accountants figured out how to do this a long time before computers were invented. That's why Cassandra has serializable consistency as an opt-in only (LWTs), but this comes at a price of latency and availability. By using strong consistency all the time, you'd lose most of the benefits of Cassandra, and you could probably just replace it with a single RDBMS node (and suffer scalability and availability problems then).

brianwawok · on Nov 25, 2016

Right does Java throw an exception if you forget a lock around a shared variable?

isoap · on Nov 25, 2016

> the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK

"Partially applied" is ok with a database?

The description of Cassandra on it's site is "Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data."

If it's mission-critical data, I wouldn't do arbitrary things with it for conflict resolution that can corrupt data.

pkolaczk · on Nov 25, 2016

Cassandra is perfectly fine. If you want your writes to be consistent, learn to use LWTs properly. The same way, if you want your data to be fully consistent in RDBMS, learn to use SERIALIZABLE isolation level (which is not default in most RDBMSes for performance reasons). If, in an RDBMS, you use SERIALIZABLE for half of the updates and READ UNCOMMITTED for another half, guess what consistency guarantees do you really get?

brianwawok · on Nov 25, 2016

This is not arbitrary. It may be not intuitive coming from a rdbms background, but it isn't arbitrary.

As Johnathon pointed out, it's like properly using synchronized or volatile half the time. I don't call it sometimes not working as arbitrary. I call it expected for not following the rules of the system.

tremon · on Nov 25, 2016

If I follow your argument correctly, it is basically the same argument as "your C compiler is correct, what you've written is invalid and the standard allows undefined behaviour here".

Which may be a technically valid argument against the compiler/database system, but it's not a valid argument for defending the system as a whole: if a standard allows arbitrary execution instead of bailing out on non-standard (ambiguous) input, it is unreliable.

hamax · on Nov 25, 2016

Is variable assignment in c/java/... unreliable? It behaves very similar to what C* does. Concurrent access and modification will produce undefined behaviour if you don't explicitly protect it.

brianwawok · on Nov 25, 2016

Exactly.

Getting access to things like concurrent locks is HARD to get right. That is why there are so many simple languages that don't let you touch concurrency.

Doesn't mean there is no need for it in the world, and no one should be able to use it.

pkolaczk · on Nov 25, 2016

RDBMSes can cause similar inconsistencies if you don't know what you're doing. It is like setting read uncommitted and then complaining about dirty reads.

isoap · on Nov 25, 2016

Let's pretend I'm leading a blind child by telling them which direction they should go, and if they don't step carefully, they could be hurt.

If I can't see the child, should I continue to give them direction, or tell them to stop?

In this use case, the database makes changes to data without knowing what is correct and what is harmful. That is not the user's fault. It's a code choice.

hamax · on Nov 25, 2016

No, it's a users choice to use a DB that has this tradeoff.

Like it was said a couple of time already, all of your complaints about C* would work just as well for locking mechanisms in most popular languages.

isoap · on Nov 25, 2016

> Don't take shortcuts and you won't get burned.

I'm sorry that the user did something unexpected and then posted about it in a way that made your application look bad. I know that must be frustrating. However...

Calling the user out as doing something wrong when your application is failing because of a use case you can't handle properly just looks bad. You serve your users, not the other way around. Don't forget that.

If it were me and there were a case that my application couldn't handle properly, if I couldn't fix it, I'd raise an error, and then document clearly that they should not do this, such that when they search for that error, they'd find the answer. Then, I'd work to see if there were a way I could avoid the error altogether by not allowing that use case.

nkellenicki · on Nov 25, 2016

There is no "one size" fits all database or distributed system in existence. To say that there is is to say that MySQL is equally as well suited to all use cases as Cassandra or Redis. They all store and serve data, right?

Cassandra makes no claims to be such a holy grail. Read their documentation, and you can see the use cases it is good for and those it is not.

The author of this blog post chose one it is not good for.

Put another way, "I'm sorry that the Lamborghini you bought broke when you attempted to go off-roading with it. Perhaps you should have bought a Jeep instead?"

imagist · on Nov 25, 2016

It's not possible to detect whether the user wants CP. You can assume they want CP, but the entire point of Cassandra is that it doesn't make that assumption.

Cassandra is AP with opt-in CP. This is an explicit tradeoff. You're giving up the assumption (which enables error checking) that everything is CP in order to get AP performance. This tradeoff is one of the main use cases for which Cassandra exists.

The vast majority of the time, error checking is way more valuable than AP performance, so your approach to handling the error makes sense, but if that's your situation you shouldn't be using Cassandra. There are a wide variety of ACID-compliant relational databases that do what you want.

TL;DR: Using Cassandra and expecting CP error checking is like using a hammer and expecting screwdriver behavior.

throwanem · on Nov 25, 2016

The high-handedness doesn't improve your argument. Consider forgoing it next time.