[ $davids.sh ] — david shekunts blog

🛣 No Way Back or Life Without Rollback

# [ $davids.sh ] · message #172

🛣 No Way Back or Life Without Rollback

I recently started trying to work with a new, but very ambitious ORM drizzle.

And I stumbled upon a thread discussing that drizzle only generates up migrations. People started complaining that they don't have down migrations, which means they can't rollback.

The author's response made me think a lot: he says that in general, he has never needed to and it's often much more complicated and incorrect to rollback, because all changes (including fixes) should be applied on top.

That is, even if you want to make a rollback, it should be a separate up migration.

And you know, I agree with him.

Most often, to be able to really make a rollback if an error occurs, we have to prepare a lot of things (starting with the codebase and ending with a lot of features in CI/CD pipelines), while in practice, I remember single cases when it helped without breaking the application and the database.

In 95% of cases, if the deployment was broken or the wrong version was released, we simply applied fixes on top, rather than rolling back.

It's easier than implementing a rollback, safer, and at the same time extremely effective.

So, my advice is to invest your efforts in creating conditions for quick fixes during a problematic rollout, rather than rolling back the deployment.

Everyone, powerful pumping 🦾

  • @ Ivan ITK 🚫 · # 505

    I would add some important nuances:

    1. Migrations should be tested, both up and down. This is a good practice, just like for all code.
    2. For breaking change migrations, there should be a step of several migrations where data will be saved for rollbacks/fixes.
    3. The rollback time for a migration on a high-load system is more effective than the cycle of writing and delivering a fix. Greater control of risks and resources. Therefore, the choice of approach directly depends on the criticality of the infrastructure, on which to spend time on thorough elaboration and development of rollbacks. Either a simple system waiting for a fix, or even worse, drowning in data.
  • @ [ $davids.sh ] · # 512

    Yes, if it's not an option to do without rollback, then I agree with every point.

    How I see the "No Way Back" paradigm:

    . If we don't focus on rollback, meaning the fix will be implemented on top, then I'll have more free time and motivation to create conditions under which we can deploy this fix. The team invests more time in logs and tracing for quick debugging of the situation. More state machines are created, which in case of failure will indicate the error that occurred. In every piece of code, we think: "What if it goes to hell?" – and write code with that in mind. We dedicate more time and effort to e2e and integration tests.

    Undoubtedly, this is good practice even if rollback is available, but honestly, the mere thought: "if anything, we'll roll back" – very much relaxes developers.

    . If we understand that the release is extremely risky, we can, instead of investing time in rollback, invest it in releasing to a % of the audience, so that in case of a problem, not everyone suffers at once.

    • In my opinion, it's much harder to come up with the right rollback strategy than a fix strategy (for example, very few people remember the approach you wrote about in the 2nd point).

    ++ Undoubtedly, this can be mixed: as a standard, we use the strategy WITHOUT rollback, but if the situation requires it, we prepare tools for rolling back this specific situation (because, in my experience, a complex rollback is a rollback of something extremely specific, which won't be in our standard rollback flow and will have to be done here and now).