Also the HTTP API URL isn't deterministic because a) the operator sets it for an...

otoolep · on June 10, 2021

>At what cluster size and concurrency does asking every node break down?

None, a follower only needs to ask the leader. So regardless of the size of the cluster, in 6.0 querying a follower only introduces a single hop to the leader before responding to the client. While this hop was not required in earlier versions, earlier versions had to maintain state -- and stateful systems are generally more prone to bugs.

vlowther · on June 10, 2021

I am curious about where things broke down with the 301 based solution y'all used earlier.

otoolep · on June 10, 2021

I included details in the blog post, the 3.x to 5.x design had the following issues:

- stateful system, with extra data stored in Raft. Always a chance for bugs with stateful systems.

- some corner cases whereby the state rqlite was storing got out of sync with the some other cluster configuration. Finding the root cause of these bugs could have been very time-consuming.

- certain failure cases happened during automatic cluster operations, meaning an operator mightn't notice and be able to deal with them. Now those failures cases -- while still very rare -- happen at query time. The operators know immediately sometime is up, and can deal with the problem there and then, usually by just re-issuing the query.