[ $davids.sh ] — david shekunts blog

🧠 Bugnetka v0.1 🧠

# [ $davids.sh ] · message #215

🧠 Bugnetka v0.1 🧠

I want to play "danetka" with you about the discovered bug

UPD: solved by @gennadiixd

#bugnetka #pain

  • @ [ $davids.sh ] · # 1136

    Rules like in the game "Yes/No": I give you the premise, and you have to guess what happened. During the process, you can ask me questions to which I can answer "yes", "no", "irrelevant", "golang".

    Let's go:

    There is a service in 3 replicas, serving the main HTTP API, it is part of a modular monolith (meaning it has other applications under a common name).

    After another release, it returns a 404 code on the main endpoint (we have one because it's RPC), with a 90% probability on the most loaded geo-contour, in 70% of cases on a less loaded one, and in 30% of cases on development.

    There are no errors, metrics are normal, and its healthcheck, as well as other services', are also fine.

    What happened?

  • @ Vova hardvair smartvend 🛍️💻 · # 1137

    Is the 404 being forwarded from another request?

  • @ [ $davids.sh ] · # 1138

    I don't understand

  • @ [ $davids.sh ] · # 1139

    You were in the thread where we discussed this, you have an advantage, so don't tell)))

  • @ Vova hardvair smartvend 🛍️💻 · # 1140

    Forwarding errors from internal requests (API, Redis, etc.) to external requests

  • @ [ $davids.sh ] · # 1141

    No

  • @ Vova hardvair smartvend 🛍️💻 · # 1142

    Well, I haven't read it, so there's nothing. And besides, I dunno)

  • @ [ $davids.sh ] · # 1143

    You were the one who first reported the bug)))

  • @ Vova hardvair smartvend 🛍️💻 · # 1144

    So, does it work in the end? Or are we waiting again?

  • @ [ $davids.sh ] · # 1145

    Works

  • @ Vova hardvair smartvend 🛍️💻 · # 1146

    I filed a report and went to sleep

  • @ [ $davids.sh ] · # 1147

    It's like: "cool guys don't look at explosions"

  • @ Vova hardvair smartvend 🛍️💻 · # 1148

    I was poking around for 4 hours on and off, and the report was the last thing I did.

  • @ Gennadii IT-K Khotovytskyi · # 1149

    Well, if I understand correctly, you have an API gateway, behind it a load balancer that distributes requests to replicas. In this case, first of all, I would check the load balancer. It's possible that it's mistakenly redirecting some requests to a replica of a neighboring service and consequently returning a 404. Then, the higher the load on the correct replicas, the more often the balancer will encounter an erroneous one when searching for the least loaded.

  • @ [ $davids.sh ] · # 1150

    And that's the right answer!) I can take you with me for debugging))