Lack of backpressure is perhaps the most common newbie mistake in distributed systems. I've just seen so many failures because the servers were overwhelmed and the clients were greedy/impatient (didn't react well to anything less than an immediate response even when they were doing everything they could to overwhelm). It should be a primary design consideration for any such system being built anew.
> Backpressure Strategies: control, buffer, drop
One of my favorite hacks for Gluster (at Facebook) was in this area. We were using bog-standard NFS as the connection protocol, so neither explicit control nor dropping were options. (People who have always had that luxury with their own thrift/grpc/whatever protocols and libraries don't realize how much of a pain it is not to have it.) That left buffering, which led to memory pressure and latency bubbles. What I ended up implementing was implicit control. We'd just stop reading on a particular client's socket for increasing numbers of milliseconds if overall latency was too high and its queue was too long. This leveraged information we already had, and "borrowed" TCP's flow/buffer control as our own. Clients couldn't ignore it; they'd block in send() or somewhere equivalent. It worked really well to smooth out many of the worst latency spikes. If you're ever stuck in a similar situation, I highly recommend this approach.
This is indeed the right way to leverage TCP flow control, and works well at the layer that knows exactly what is coming over the socket. The pitfall here is if semantically independent data is being multiplexed over the same TCP connection then suspending reads may cause collateral damage and in the worst case livelock situations. Examples of multiplexing include separate streams and control frames in HTTP/2, in-band health checks and keepalives in various protocols, etc.
Backpressure was a recurring topic at Basho. I thought I might find something from one of our engineers online, but instead I found these: a look at one client’s approach for using Riak, including how they implemented their own backpressure in front of it, and Fred Hebert‘s discussion on queues and backpressure.
Freds blog is great. I keep finding myself needing to go back to an article I didn’t think was applicable to me at the time. The complete opposite of medium.com, for me, which I nearly always regret clicking on.
With pipes between threads, the OS is able to suspend the reading thread when the buffer is empty, or suspend the writing thread when the buffer is full, because the OS "owns" the buffer and the threads. With distributed systems connected via Kafka for example, there is no "process" overseeing both ends that can suspend or resume hollistically, so its difficult for producers to know and respond to back pressure.
TCP will only slow down the writer if the network or the Kafka broker can't keep up. It the Kafka consumer can't keep up, their backlog will grow unhindered until Kafka runs out of disk space.
I wish there was an easier built in way to do it, but a producer could read the backlog length of it's consumers and try to throttle itself.
Interesting. For two reasons. In the many years that I developed software this was known as the producer/consumer problem [1]. A name that is somewhat (to me) more expressive than back-pressure. But that's just my opinion!
What's different now, as the article points out, is the growth of micro-service architectures. Effectively creating a lot of tiny producer/consumer problems across the solution space.
Sometimes, I'm glad I don't write code anymore, but only sometimes :-)
Yeah, when I was in school it was called "producer/consumer problem" too, back pressure was considered one of the solutions to the problem(what he calls control). I guess someone somewhere confused the two, and it kinda stuck.
Yeah right. This seems like fancy talk for the same scaling problems we've had forever. And it was 100X worse before cloud computing. System-to-system scale problems are generally a lot easier to deal with because you always have the option of resetting expectations with users. Serving web requests has to be real-time, but if a data pipeline has to be overnight due to cost constraints, you can just make the argument to management and do it.
Without backpressure, queues only delay the inevitable. At best. Queuing delays will increase latency. Extra memory pressure can even make things worse by decreasing throughput. If latency gets high enough, clients will start to error out or misbehave in all sorts of unpredictable ways.
> In the situations where it doesn’t, add some more capacity.
That obviously has its limits too. Some people actually care about cost. Also, added capacity might be totally wasted if there's any sort of affinity between servers and data. Maybe it works if you're in a stateless middle tier, but not e.g. in a storage system. You can add capacity to those, sure, but rebalancing the data to take advantage of the new capacity can be a lengthy process and compete with user traffic (making user-perceived latency even worse) if you're not careful. I've also seen too many cases where rebalancing undid the careful work done during initial data placement to maximize data safety. Oops.
> Back pressure is something I’ve never had to deal with
Letting data back up in unbounded queues means (a) you have giant ass queues sitting around hogging resources, (b) upstream components are now going to see service times go through the roof as shit sits around doing nothing in these queues, and (c) upstream clients have no way to predict/avoid these service time spikes due to queueing (vs. say, the operations just take a while) and prevents them making intelligent decisions about their own work which could avoid these service time spikes.
I've worked in the embedded space now for about a decade (network, storage, databases). Everywhere I've been where the product has no backpressure, the product inevitably suffers from the above issues, which translate directly into customer dissatisfaction. And it's never easy to add backpressure to a system which doesn't have it.
It's also not always an option to just "add more capacity" -- there are many situations where that's not possible, or requires time to enact. Meanwhile, your queues are growing without bound.
1) Queues should be of finite size (exactly what size is entirely application dependent, but the point is that it's almost never ok to let a queue grow indefinitely).
2) When a queue is full, producers should often be blocked (backpressure).
Obviously it depends a lot on the particular use case, but IME it's usually risky to think that a 'sufficiently large' queue is going to be an acceptable solution.
I've worked with systems where petabytes are flowing through it every day. If your data starts backing up, it costs lots of real money in terms of hard disk space. Backpressure crosses organizational boundaries--the team running the system which consumes data is not the same team running the system which produces data. We exposed backpressure through our APIs... if our system was overloaded, we returned a specific error code. Clients were expected to retry, sensibly.
Sometimes "add more capacity" just costs too much.
Failure codes themselves aren't back pressure though, its the client side "retry later" which creates that effect. But, if you have that behavior, its better just to signal it directly because a simple fail, back off, retry system actually increases overall load due to duplicate requests retrying. And if its gets bad enough, potentially results in livelock like behaviors. Simply slowing down ACK's or explicit "item queued, please wait before sending another" return codes are far more effective. See buffer crediting, flow control.
OTOH, I'm sort of amazed by the people who reach for queues to solve loading issues without first assuring that the system works properly at full load without any queuing. Queue's only serve to smooth out busty behavior and come with their own problems (buffer bloat related latency).
> ... a simple fail, back off, retry system actually increases overall load due to duplicate requests retrying.
Well, two things.
First, our clients didn't have a "simple" back off, retry system. They tended to have more sophisticated back-off and retry systems that didn't increase pressure on services that were overloaded. There are a number of different techniques you use to accomplish this and you can wrap up several of these techniques in a library, use the library in your client code, and have that client code (a fat client) be the official API for your service.
Second, this also assumes that front-end load is significant to begin with. In our system, it definitely wasn't. Requests were super cheap. You could hammer the front-end extremely hard and it would just respond with "try again later" until there was some capacity available, and then it would let in some work. The size of an actual request was quite small, it was the size of the work represented by that request that was large.
> OTOH, I'm sort of amazed by the people who reach for queues to solve loading issues without first assuring that the system works properly at full load without any queuing. Queue's only serve to smooth out busty behavior and come with their own problems (buffer bloat related latency).
Queues in our system were a necessary part of the design. The system would not operate efficiently without them, because work items needed to be batched in order to be processed efficiently. I don't think there's a way that you can indict a system for using queues unless you know something about the design requirements.
There are absolutely other reasons to use queues other than smoothing out bursty behavior.
Obviously you've got to pay for something. You can either increase your compute capacity to process requests more quickly, pay for storage on a queue to regulate flow, or do some kind of internal code optimization to decrease resource demands. All of which cost money one way or the other. Alternately, you go up the food chain and determine the ROI on all the data you're receiving and maybe stop collecting some things that aren't valuable. If they are valuable, then it's a no-brainer to pay what you have to.
afaik the only time a modern software developer needs to worry about backpressure is if they're using asynchronous or otherwise non-flow-controlled i/o. neither of these cases nor even a discussion of synchronous vs. asynchronous i/o were mentioned.
I agree it’s less of an issue in more traditional, threaded req/reply servers. But queues can appear in a bunch of places. Like your load balancer may queue up connections from clients and dole them out to app servers as they free up capacity. Normally the queue is emptyish, but app servers will do backpressure via blocking as needed, so the load balancer connection queue fills. In this case you probably don’t even want to
add capacity (to an extent) because what’s the point if clients have timeouts.
Also eventually you might need a worker pool pulling from a queue. Just guessing, because they seem to appear eventually in/near web servers. You can spin up a bajillion lambdas simultaneously like a madman if you want, but odds are there’s a resource constraint on the source and/or destination data store side at least. So again you’ve got async execution, a growing queue, and bottlenecks to think about still.
But maybe with threaded code it’s like a “pit of success” for traffic shaping. You only have so many threads you can use, so you configure a pool size and think about what happens when you’re at capacity, and define some nice predictable failure behavior and test it. (As opposed to the node.js server that will handle more connections but surprise you with a weird bottleneck that you don’t handle gracefully). So then the trad server developer can go longer (or forever) without being bitten by the queues that remain elsewhere in their system.
Synchronous systems can benefit from backpressure. Clients can and do open multiple concurrent connections to such systems (common to work around the inevitable throughput limitations of synchronous systems)... it is still helpful to tell any given connections, no, I'm at my concurrency limit, any work you give me is just going to sit in a queue and increase your service time, so please don't.
They key is "flow controlled". Syncronicity doesn't really enter into it, but I can see how people can get confused and think it's about that because asynchronous designs tend to introduce a greater degree of decoupling.
I think the key insight here is that just autoscaling your service to keep queue length short is not always gonna work, or not always going to be cost effective, and upstream sources should be prepared to reduce the amount of work they send (unless there is a business reason to do otherwise). Making the queue longer or just dropping data that's already accepted into the queue will sometimes break people's assumptions about how your system works. You can adjust queue size from time to time if you need it, but if your service sees increased use over time, the amount of variance in queue size can get a bit crazy.
I worked on a system a while back which processed petabytes of data... autoscaling was out of the question (we had a fixed amount of hardware to process the data, and the lead time for getting new hardware was not especially short), and buffering the data would require eye-watering amounts of disk space. We just buffered as much as we could and then returned a "try again later" error code. We made sure that data accepted by the system was processed within a short enough window. We made sure that the cost of submitting a work item was so small that you could really hammer the front-end as much as you wanted, and it would just say "try again later" or accept work items.
I think one of the lessons here is that you need to think long and hard about what error conditions you want to expose upstream. The farther you propagate an error condition, the more different ways you can solve it, but the more complicated your clients get. For example, disks sometimes fail. You can decide to not propagate disk failure beyond an individual machine, load up RAID 1 (mirror) in all your file servers, and back everything up. Or you can push disk failures farther up the stack, and recover from failed disks at a higher level, with lower hardware cost but higher implementation complexity. And if you build a bunch of systems assuming that each part is always working, you run very serious risks of cascading failures once one part does inevitably fail.
Obviously, small enough systems running on cloud hardware can usually be autoscaled just fine.
Or can we add autoscale? I encountered this exact problem today as I were looking for a simple to setup and maintain autoscaler solution. The programs feed in the backpressure metrics to monitor, the autoscaler would tell me how many pods/consumers there needs to be, based on naive estimation or prediction by yesterday's traffic.
K8s seems to be an over kill and with stateful pods problems, KEDA looks promising, but I wish there's much simpler ones.
I'm quite befuddled by this article, because the perspective is pretty exactly 180° of what I'm used to.
I don't (try to) control or manage backpressure. Backpressure controls - or rather, steers - me/my code. Getting backpressure right is massively important and frequently under-considered design aspect.
The most significant thing to understand IMHO is that you need to look at backpressure in the context of the larger system. More concretely, you need to ask yourself "is this telling me that another task should run right now"? Both for synchronously driven sources (e.g. blocking write() syscall) as well as asynchronously driven sources (event loop with "can send" flag), you need to make sure you (a) don't block other progress (held locks on DB) and (b) actually pull up other tasks (particularly important for partially-asynchronous code).
And this leads straight to the single most impactful design/scaling decision for these systems: is the source maintaining its outbound scheduling at a high enough level? This is best clarified with an example: say you have a key/value store that sends out change notifications per key. Does your application need every single step, or just the most recent value? And if it's the latter: you shouldn't even have a "true" queue! Instead, the code needs to track which keys are pending for which receiver, and whenever the receiver can accept data, it needs to send the most recent data for pending keys. This is the difference between an unbounded queue and a bounded by number of keys queue. The former can fail disastrously, the latter will just start lagging but once it reaches the worst case, it just stays there.
> For example, if someone says: “I made this new library with built-in backpressure…” they probably mean it somehow helps you manage backpressure, not that it actually intrinsically is the cause of it. Backpressure is not “desirable” except when it’s inescapable and you want to protect something else from receiving it.
… and this really encapsulates my disagreement. Backpressure is absolutely desirable. I want my libraries to provide me with backpressure when there is reason to do so, and do it in a way I can actually "consume" the backpressure!
And lastly:
> Backpressure Strategies: control, buffer, drop
"buffer" and "drop" are in most cases just bugs. And very hard to debug & track down, due to their load dependency.
This needs to be upvoted more... younger developers don't know about the problems that already had solutions for 40 years. RTS/CTS flow control is still in use today...
> Backpressure Strategies: control, buffer, drop
One of my favorite hacks for Gluster (at Facebook) was in this area. We were using bog-standard NFS as the connection protocol, so neither explicit control nor dropping were options. (People who have always had that luxury with their own thrift/grpc/whatever protocols and libraries don't realize how much of a pain it is not to have it.) That left buffering, which led to memory pressure and latency bubbles. What I ended up implementing was implicit control. We'd just stop reading on a particular client's socket for increasing numbers of milliseconds if overall latency was too high and its queue was too long. This leveraged information we already had, and "borrowed" TCP's flow/buffer control as our own. Clients couldn't ignore it; they'd block in send() or somewhere equivalent. It worked really well to smooth out many of the worst latency spikes. If you're ever stuck in a similar situation, I highly recommend this approach.