Why does Kafka allow a consumer to process the same messages as another consumer by default anyway? With RMQ or others, this "consumer group" behaviour is just how it works be default.
A Kafka topic is divided into partitions. A consumer group can have many consumers. The consumers inside the same group will be assigned to consume a subset of the partitions.
Say you have a topic with 10 partitions. Consumer group A has 10 consumers and Consumer group B has 1 consumer.
Each consumer in group A will be assigned to a single partition,thus processing different messages.
The single consumer in group B will be assigned to all 10 partitions, processing the same messages as group A.
Kafka is a distributed log, so this pattern is useful for replicating data across downstream applications.
Different defaults, RMQ and others also had another way of storing the messages, so the sensible defaults for RMQ are different from the ones for Kafka. Head to head, Kafka does less processing on the messages and that work is left for the consumers, while rabbit does more work on the broker and less on the consumers. Depending on the requirements, one of the two is best suited than the other, but both can do almost the same kind of work.
Because Kafka is not a message queue, but can be used as one. Multiple consumers on a topic is the default behavior because it’s very useful for the typical use cases in Kafka. I like to think of Kafka as bringing “micro services” to data processing. If I want order data to be both shown real time on a dashboard as well as aggregated for reporting purposes I’d build two different consumers for the same topic. One which sends the results off to the order dashboard via websocket and one which performs a rolling aggregation using a streaming query most likely. Maybe you now need to alert the sales team when an order comes in over a certain amount so they can give the customer the white glove treatment. Just add another consumer which is a ksql query for orders over $$$ which then get dropped into the high value orders topic. There you can have a consumer fire off the email to the sales team. Want to do other things when a high value order comes in? Add another consumer.
I believe the default of kafka is if you do not specify a consumer group it makes up a random one for you. It is an optional field.
Not sure 'why of it'. If it were me doing that sort of behavior I would think the idea was short lived processes that do a small amount of work then disappear. For that idea, consumer group would not make sense as you would just want to start at the front or end of the topic and blast thru it anyway. Once you add the idea of two different things happening or gangs of processes working from the same events with partitions then consumer groups makes a lot more sense to add in. But retro fitting it back in would be awful? So default is 'random' groupid? Probably in the pip's somewhere what the rational is.
Having different consumer groups let's you have multiple different consumers for different purposes that DO process the same records.