We had two communication patterns:
Event Driven Architecture (we throw an event and forget about it, and a bunch of readers on the other side consume it) and RPC (we send a request and wait for a response from one reader) + obviously, all of this must work in a cluster mode with at least 3 nodes.
EDA must guarantee us: "if we throw an event and see that it has entered the queue, then eventually it can be read from the other side in the correct sequence by the required number of instances."
RPC must guarantee: "if the reader is not available, we must find out about it to perform a retry; if it still doesn't appear, then return a timeout; and if the reader is available, then get a response within an optimal time interval or return a timeout."
RMQ, NATS, and Kafka walk into a bar.
NATS is a message broker, meaning its task is to allow subscribing to and sending messages to topics in a fire-and-forget format, and it does not guarantee order.
It's great for RPC (it even has a built-in module), but since it doesn't guarantee sequence and that a message will reach a reader that hasn't come up yet, we can't use it for EDA (I'll immediately say, we're not considering JetStreams).
Kafka is a distributed log: we write to the end and read from a desired point (offset).
It perfectly solves the EDA problem (the only complexity is in calculating partitions and implementing external locks on queues, but more on that below) and RPC can be implemented in general (it will just be a bit "heavy").
RMQ is something in between the two worlds: fire-and-forget, with guaranteed sequence, and distribution and uniqueness are achieved through queue parameters (exclusive, single-active-consumer, durability, etc.) and juggling exchange + queue types.
In theory, it solves the RPC problem well (it even has a built-in RPC plugin) and, except for long-term message storage, it will give us all the characteristics needed for EDA, even with the ability to choose between speed (no replication) and reliability (latent Kafka replication).
In practice, however, this garbage only works normally with certain settings, which makes it a terrible alternative to both NATS and Kafka.
A multi-tool = zero quality.
So, RMQ really likes to break a cluster and not be able to fix it.
Firstly, for RMQ, it's absolutely normal to show you that everything is fine, all queues and exchanges exist, but in reality, when trying to write or consume, nothing will happen, NOT EVEN ERRORS. Why? Because it incorrectly synchronized its configuration. How to fix it? Either by manually deleting queues, restarting a node, or replacing the entire cluster (I'm not kidding, we had to do all 3 options).
Secondly, if you don't use quorum (replicated) queues or use any kind of TTL on queues, then any problems in the cluster guarantee you either complete dysfunction of these queues or complete dysfunction of the cluster.
And if you use quorum queues, then prepare for a significant degradation in message transfer and processing speed.
This means that for RMQ to work stably, we'll have to turn it into a slower version of Kafka without a persistent log.
The only remaining advantage is its built-in reader management system, BUT the funny thing is that in the end, if you want not just "one reader at a time per queue," but "one reader at a time per set of queues with balancing the number of queues to readers" (and this is a common use case), you'll still have to implement locks on queues yourself, just like with Kafka...
For me, this is another case of a "Swiss Army knife" that can do things, but poorly.
Can RMQ be used? If you use quorum queues without frequent creation, you have enough fire-and-forget, and you don't want to deal with partitions, cursors, and distribution in Kafka, then yes, RMQ will do its job.
In other cases, when choosing between RMQ and Kafka, it's better to choose Kafka. It has many more problems at the beginning, but later you can at least sleep at night.