[ $davids.sh ] — david shekunts blog

🪲 Errors should be returned, not thrown

# [ $davids.sh ] · message #165

🪲 Errors should be returned, not thrown

In short, I used to approach this topic delicately, like "it's useful", "it would be cool if you did it this way", "well, if you decide to" and so on, but now I'm completely convinced that this is the right approach and I don't want to suggest anything else.

This approach has a lot of advantages: (1) you can clearly see what types of errors are being returned (especially if you create custom errors), (2) there are no implicit error catchers, (3) try is a cumbersome and ugly construct that can add several levels of nesting and many other useful points.

And there was always a conceptual question, which is the subject of many conferences and articles: why should I throw an error that is a normal part of business logic (for example, NotFound on an entity that may not exist) when it can kill my program if it's not caught.

What convinced me so strongly?

  1. Historically, it was this way in C.
  2. Then Go adopted this feature.
  3. In Rust, it's the same story, BUT they return a Result type monad <T, Error>.
  4. And then I found out that in Zig (the language I'm currently betting on) there's a built-in Union Type operator for returning errors.

And I won't even mention functional languages, where this is just the standard, also through monads.

If you use TS, the most adequate option is: type Result<T, E extends Error = Error> = T | E – I can explain why later.

Yes, there are some places (for example, validating individual properties) where this can make the code messy, but there's a solution for that too (I'll talk about it in the next posts)

Now I'm curious: how much do you like the idea of returning errors instead of throwing them?

  • @ Maxim ITK Manylov · # 339

    Should I return new Error(...)? Or should I return {status: 'error', message: ...}?

  • @ [ $davids.sh ] · # 340

    In 90% of cases, the first option will be the most advantageous and convenient.

  • @ Ivan ITK 🚫 · # 341

    What about exception tracing? If we do a return, then there's actually no error.

  • @ Ivan ITK 🚫 · # 342

    Nice to unexpectedly see you here👋

  • @ [ $davids.sh ] · # 343

    It all depends on the level of tracing you need.

    That is, if it's a critical error (for example, the database went down), it can be thrown up the stack to a general Error Handler, where tracing will occur.

    At the level of return handling (for example, a constraint wasn't met), you could, for instance, log it.

    Or did I misunderstand the case?

  • @ Ivan ITK 🚫 · # 345

    What is the global handler in your system?
    I can assume two options:

    1. It's something near the API gateway that transforms responses for the client
    2. It's about the monolith, and we're not considering distributed tracing. This principle could work within a distributed system, like how Moleculer's regenerator recreates classes from serialized logic (if the serializer doesn't support it out of the box, like cbor-x). Then, errors can be passed through as-is over the transport layer. But this adds an extra layer of logic, overhead, and abstractions. Is this really necessary from an architectural standpoint?

    By forwarding calls further, we add unnecessary steps to the stack, and it will be limited anyway, since storing excessively large traces is inefficient.
    Metrics and tracing in Node.js evolve in their own thread at the V8 level through channels and will eventually be integrated out of the box with OpenTelemetry.

  • @ [ $davids.sh ] · # 346

    I'll try to answer if I understood correctly:

    1. A Global Error Handler is a handler for errors that we catch at the trigger level (HTTP request, Cron, message from MQ, buffer from a socket, etc.). At this level, I can log/trace the error, serialize it, and send it in the response.
    2. Any node that receives an error in response (client application/another service) deserializes it and can understand what kind of error it is thanks to a shared error code dictionary.
    3. Then, this node decides what to do with the error: if it makes sense for it, continue execution; if not, throw it OR a modified version (e.g., when we don’t want to show this error to users).

    Thus, at the level of each individual node, you can see what errors you had and how they were transformed across services.

    • I only cover IO and CPU-intensive code with traces.

    Example: The 5th deepest service throws a "transaction not found" error, the 1st one, which received the request from the client, will receive it, understand what kind of error it is using the dictionary, and based on a second dictionary of errors available to the client, decide to transform it into a 403 / 404, depending on the security policy. The traces will show all IO requests (DB, MQ, HTTP, etc.) with serialized errors in JSON (well, more precisely, with a link to the log where you can view this error).

  • @ Ivan ITK 🚫 · # 347

    Got it about the handler, but I’m not entirely clear on how you plan to trace the original error in service 5 before it’s transformed in service 1.

  • @ [ $davids.sh ] · # 350

    And by "tracing," do you mean traces like those from Jaeger, or as the concept of tracking?

  • @ Ivan ITK 🚫 · # 351

    So what's the difference? 🤔 Tracing is an approach, and right now we're discussing exactly how traces pass through this approach in your application, or more specifically, stack traces. You can create multiple spans and link them by trace ID while considering parent ID and depth level.

    For each span, we store information. In your case, the information in the span exists only on 1 service. But how do we collect an error in a span on the 5th service and trace it back through 4 spans to the 1st?

  • @ [ $davids.sh ] · # 352

    Damn, okay, it's kinda hard to fully grasp where exactly our misalignment is here.

    Usually, here's how I do it: when sending a message to any transport (HTTP, MQ, etc.), I assign it a unique ID + a trace ID. So by the time we reach the 5th service, each request has 5 unique IDs, but only 1 trace ID. When the 5th service returns an error via the same transport, it gets serialized, and I can see in the spans that the socket entry contained an error—and this propagates all the way back from the 5th to the 1st service.

    This lets me:

    1. See a set of spans with all service interactions + I/O, starting from some trigger.
    2. If there’s an error somewhere, I usually check the Grafana dashboards (which pull from logs), grab the trace ID from the log, and then go to the tracing system to inspect the spans for that trace ID.

    And just to reiterate, going back to the original topic: if it’s a business logic error (e.g., "this user can’t do X," "entity not found," "missing required data," validation fails, etc.), then we return an error and send a response (because it’s not really an error). But if it’s an error we consider severe enough to crash the app (e.g., DB is down), then we throw it.

  • @ Ivan ITK 🚫 · # 353

    Yes, we're talking about this specific user error. For debugging, we should have the original error and its location in the trace. I can see you're writing to a span somewhere where it's being created. But then I wonder how this will appear in Jaeger - we have a parent call and several child spans containing 2 errors in different places. This usually doesn't happen - there's typically one error, and then all subsequent spans propagate that same error, since in Jaeger we can use the filter error=true to view all errors or narrow it down to a specific service.

    Regarding Grafana and log searching, the OpenTelemetry protocol standard was specifically created to unify three entities: metrics, logs, and traces by linking them together with a single UUID to enrich data in the UI for each incident.

  • @ [ $davids.sh ] · # 354

    Here, now it's clear)

    You know, maybe it's like this: the real "errors"—the ones we allow to crash our app—should be thrown, while all other "errors" should actually be treated as "failures" and returned. So, we're following the "unsuccessful" branch of operation completion.

    And when we talk about "failures," it seems to me there can be many of them within a single chain of actions (trace), because the 4th service might very well expect a failure from the 5th and then just proceed with a different logic path.

    That's why I disagree with the statement: "This usually doesn't happen, there's always just one error"—in my opinion, each service, upon receiving an error/failure from another, decides what to do with it: either create a new error/failure (which is a separate entity) or handle the situation normally.

    And yes, because of this, my spans will have a bunch of repeated errors, since in 70-80% of cases, we just propagate the errors upward to the entry point.

    Again, maybe I do this out of habit and not knowing how it could actually be better and more convenient.

  • @ Ivan ITK 🚫 · # 356

    Separating errors is indeed the right approach. Typically, this is done in places with custom gateways. There, we define the types of errors that are acceptable to return to the user. Everything else is converted into a generic error.

    Regarding different errors in different spans, this can happen, but it should be a case where we asynchronously performed several actions, and each of them encountered its own errors during execution. Then we log these errors in their respective spans. If we propagate both errors back according to your logic, then we need another layer of logic above to understand these errors and decide what to send further. However, this creates tighter coupling between services, which already contradicts the practice of low coupling and separation of responsibilities. Thus, every service we interact with would need to be aware of these errors and implement additional mandatory logic, instead of simply propagating the exception up the stack.

  • @ [ $davids.sh ] · # 358

    Overall, I agree, but regarding service coupling: we're specifically discussing the "Request-Response" case where our service sends a request somewhere and waits for a response.

    Imagine this is a third-party API (like Stripe), and we're using its SDK. Shouldn't we know what errors it might return? I’ll go further—most likely, the SDK itself should have a separate file enumerating the errors we can catch and handle.

    For our service, it doesn’t matter (in the context of this discussion) whether it’s calling an external or neighboring service—to it, everything is an API. An API has documentation, and within that documentation, the possible errors should be specified.

    So, no matter what, if our service can return errors, then everyone interacting with it should know about them (whether through documentation, an SDK, a ready-made error dictionary in JSON, etc.).

    Accordingly, I’d call this "natural" coupling because it logically follows from the Request-Response pattern, which itself assumes knowledge of the endpoint you’re calling (as opposed to, say, an Event pattern, which is the opposite of Request-Response).

  • @ [ $davids.sh ] · # 359

    Regarding further propagation: this is exactly how Go implements it, for example: you would write foo, err := bar() /n if (err != nil) return nil, err, and this is precisely about not "throwing further up the stack," but "explicitly handling (even if it's just a nil check) and doing what you need (even if it's just a return)."

    In Zig, this is implemented more elegantly, where you can simply do try bar(), and it will automatically return from the function where bar was called in case of an error.

  • @ Ivan ITK 🚫 · # 360

    The interaction of a service with a third-party is an isolated responsibility. Only this service will know about it.
    But I understand your approach—everyone has their own patterns.

  • @ Ivan ITK 🚫 · # 361

    The callback is implemented as such, and in async/await, a convenient approach was devised for it using throw/catch/finally.

  • @ [ $davids.sh ] · # 363

    By the way, about callbacks – yeah, that's the funniest part. Before async/await, it was normal for us to have the error as the first argument, which is closer to what I'm talking about in the post))

    You can skip writing throw/catch/finally (either forget or do it intentionally), which makes it implicit behavior. That's exactly why I suggest returning errors where you can't avoid handling them—you're forced to deal with it directly.

    I hope we'll get something like tryasync that returns an error if there was a throw, and then everyone will be happy (though, who knows how to elegantly handle finally here).