[ $davids.sh ] β€” david shekunts blog

πŸ–•πŸ½Refuse RabbitMQ, Kafka, and other queues πŸ–•πŸ½

# [ $davids.sh ] Β· message #186

πŸ–•πŸ½Refuse RabbitMQ, Kafka, and other queues πŸ–•πŸ½

(continuation in comments)

#rmq #kafka #mq #postgresql

  • @ [ $davids.sh ] Β· # 633

    This isn't about "messages" or "events," it's about "jobs" - tasks with a state of "pending," "in-progress," "success," or "failed." + reliability of execution is more important than speed + a job history is needed.

    And it's precisely for such "jobs" that I prefer to use a database, and here's what databases offer that queues don't:

    . Absolute control - we retrieve as many tasks as we want and in any way we want, with full control over load and priorities. . Sequential pile - if you need to process data sequentially (FIFO) by entity, you'll have to create a queue for each instance of the entity. When working through a database, you dump everything into one pile, and workers themselves maintain the sequence by simply retrieving them one by one. . To restart, just change the status to "pending." . Job history. . To find out anything about the job's status, just query the tables/collections. . Easy to count metrics. . Real "only once" guarantee (no one else will give you that).

    Disadvantages:

    . You'll have to either create some triggers for work to appear or poll the database yourself from time to time. . This usually puts a significant load on the database. . It's only suitable if the reliability of execution is more important to you than speed.

    So far, I've built such database-based queues myself, but of course, it's better to try something like PgQ or similar plugins.

  • @ Arsen IT-K Arakelyan Β· # 634

    Interesting thought, but why are these software programs so popular then?

    Perhaps there are still some features that make them more preferable than databases?

  • @ [ $davids.sh ] Β· # 635

    Queues are beneficial:

    – If you need "send it and forget about it until it arrives and is processed" – If you need "send it and have it arrive as soon as possible" – If "at least once" delivery guarantee is sufficient for you – If you need high throughput – If you need broadcast ("send it and have it arrive at many") – If you need Events ("send it and have it arrive reliably at many") – If you need Request/Response but not synchronous (like HTTP, gRPC)

    And with jobs, it often happens like this: people save them in the DB, then push the ID into a queue, then pull the ID from the queue and the job from the DB, and execute it. And if retries are needed, they go through the whole cycle again.

    Why? "Because we have a queue, and we need to process sequentially, there's the root word 'queue', so we'll use a queue."

    But in this situation, the queue doesn't add any benefit and, on the contrary, only complicates the system.

  • @ [ $davids.sh ] Β· # 636

    And there's also a Message Broker – if a queue is TCP, then a Message Broker is UDP.

    This means a Message Broker most often has low delivery guarantees and doesn't store anything on disk, BUT that's precisely why they are much faster, lighter, and more distributed.

    And here too, many people implement, for example, RPC or Events via MQ, when a Message Broker would have been a much better fit.

    Why? Because people are used to: "To send a message, use HTTP and there are queues" – and they don't look beyond this black box.

    The coolest Message Broker I've used is Nats.io

  • @ [ $davids.sh ] Β· # 637

    And there's also a golden mean between MQ and MB – that's the MQTT protocol.

    It's a protocol used in IoT, so not many people know about it, but in reality, Google Pub/Sub implements exactly that under the hood.

    It's MB but with high delivery guarantees and speed.

    The coolest MQTT broker is VerneMQ.

  • @ Artur G Β· # 638

    I did the same thing. Good option.

    But here, only once is just a state toggle, and something needs to be done with the rest too.

    It doesn't seem difficult if you know how, so it's not a minus, but a feature. 😁

  • @ Ivan ITK 🚫 Β· # 656

    Alas, but you are mistaken about your claims.

    Firstly, the Kafka protocol out of the box supports FIFO per partition (nothing prevents you from creating one if the architecture requires it).

    Exactly-once through transactions.

    Message history out of the box; Redpanda also has a separation between hot and cold storage.

    There are no problems with metrics or even tracing out of the box.

    Job state is also not a problem if you use a state pattern, for example: task.pending task.in-progress task.success task.failed Each stream has tasks with the desired state. You can find a task by ID or time using the mask task.*.

    Secondly, yes, you can build all this on CDC; PostgreSQL also has various queues through pub/sub. However, if we are talking about high load and scaling, all these database-based solutions lose out sharply.

    I have an article that I never published, where we worked out an algorithm for regular (and also ordinary) queues with the maximum possible guarantees and speed of operation on the Kafka protocol. It would be interesting to let you read it and hear your opinion)

  • @ Ivan ITK 🚫 Β· # 658

    From the downsides: Kafka doesn't allow for building regular queues in a simple way. On RabbitMQ, the scheduler works within a single instance and doesn't support distribution.

    From the upsides: Regular queues on a database are an elementary task. As is writing any custom logic on top of a queue.

  • @ [ $davids.sh ] Β· # 660

    Ooh, I was waiting for you 😍

    We had 2 tasks where I was disappointed with using queues:

    • Jobs that run once a month and should (1) calculate how much resources companies have used, (2) calculate how much to charge from the balance considering discounts and similar, (3) charge the system balance and then go to a third-party system for the remainder, (4) issue invoices and acts, (4) sign and send by email

    • There are controllers (around 10-50k) each sending messages that must be processed sequentially within a single controller, but in parallel across multiple controllers (because they send at different frequencies and processing some can take seconds due to waiting for third-party providers)

    I'll say right away: in both cases, speed is in fourth place; the most important things are (1) reliability of execution, (2) ease of adding new features (this functionality is actively expanding), (3) debugging and the ability to quickly restart in case of problems (and there are usually a lot of problems with them because we are tied to a bunch of third-party systems)

    In the first case, the database provides advantages because (1) we have exposed all job statuses to technical support, and they can see what and how for each job, as well as what and how was issued, (2) they can also directly in the UI choose "from which stage to repeat," all created entity IDs are stored in the job, and we delete all created entities up to the desired stage and continue processing, (3) advantage from the point below

    In the second case, the biggest problem is "sequentially within 1 controller and in parallel across all," where there are >10,000 controllers. Creating partitions for each individual controller is too much, and if we create topics, we'll have to twist the system to link N topics to M services (and all this distributed stuff).

    And we solved all the scaling in 1 line: UPDATE job SET status='in-progress' WHERE id = (SELECT id FROM job WHERE status = 'pending' AND controller_id NOT IN ( SELECT controller_id FROM job WHERE status = 'in-progress') ORDER BY created_at ASC LIMIT 1) RETURNING * – now each individual instance from a single pool concurrently picks up a task, considering the sequence within one and parallelism across many.

    Yes, undoubtedly, there's no smell of speed here (although we have no speed problems either).

    And, accordingly, all other advantages from the post text.

    I'm saying this because: yes, undoubtedly, all this can be done more coolly in every aspect with queues, BUT I achieve all the necessary metrics for medium-sized projects, plus convenient development, plus distributed processing with just a couple of INSERT, SELECT, UPDATE statements, using the database already used in the project, from which I only need atomic UPDATE and that's it, and this is very pleasing.

    P.S.

    I will read the article with great pleasure)

  • @ Ivan ITK 🚫 Β· # 661

    πŸ€— Yes, in your case, Kafka alone simply cannot solve this; you need to build materialized views for working with data and all the solutions around that stack. As for regular tasks, the article provides a complete solution with many aspects, and implementing just the regular tasks reliably turned out to be a non-trivial task. I've set a reminder for myself and will send you the link in a private message. I'm currently on a business trip and having internet issues; Telegram is barely loading.

    Databases are much cheaper and faster to implement all this logic with; I wrote about this in the pros.