[ $davids.sh ] — david shekunts blog

☹️ We need ORM again because of modern databases...

# [ $davids.sh ] · message #154

☹️ We need ORM again because of modern databases...

I don't like ORM over SQL databases.

My main complaint about most of them is that they create their own syntax, which you not only need to learn but also figure out what final SQL it will generate.

An example of such an ORM is Prisma.

This unpredictability (and most often non-optimization) and the fact that I'll have to learn something else besides SQL is what I strongly dislike.

I thought I could refuse to use ORM forever and use Query Builder / Typed SQL, but then I discovered one very interesting detail:

One of the important properties of modern databases is horizontal scalability and auto-sharding.

And for such databases, supporting relational consistency (following relational rules, like CONSTRAINT in PostgreSQL) is a very difficult task (it's also called Referential Integrity).

Examples of such databases are PlanetScale and YDB.

Therefore, they shift this responsibility to the user, which means we need to describe in our code which entities have pointers to each other and what to do when changing / deleting / creating a particular entity.

Damn, and that's exactly what most ORMs have, because they come with their own built-in schema...

So, it's too early to give up on ORM, BUT we definitely need to look at ORMs that are close to SQL in spirit, such as drizzle (I'll experiment with it and write back about what I find).

https://github.com/drizzle-team/drizzle-orm

  • @ Bogdan Node.js RentalClub · # 285

    Everything here is very delicate; I would introduce rules for working with such entities only through services/functions that implement the interface for managing this entity. That is, through code where all these checks are thoroughly worked out and described with comments. In cases where the DBMS does not control referential integrity, it's a must-have in my opinion. Those DBMSs that do control it properly won't allow you to delete something that shouldn't be deleted. And I wouldn't delete anything from such tables in principle, opting instead for soft delete.

    I find it more amusing when people argue in favor of ORMs by saying they make it easy to switch to another DBMS in the future.

  • @ Bogdan Node.js RentalClub · # 286

    Although in recent years I've been immersed in MongoDB and basically forgot about migrations, schemas, etc. (only adding indexes when scaling). For analytics, I create tables in ClickHouse and dump all somewhat important system events there. In 90% of cases, this covers all reporting requests from analysts/marketers. Sometimes I just give them direct ClickHouse access and don't worry - they can pull whatever they need themselves.

  • @ Bogdan Node.js RentalClub · # 287

    Yes, a dump, but a legal one 😎

  • @ [ $davids.sh ] · # 289

    I'll respond with bullet points:

    – If I'm working on a large, complex project, then yes, I undoubtedly want to control Referential Integrity myself in the format you described. However, I would likely end up writing some custom helpers for on delete / update / insert. But if I want to quickly prototype something and hire less advanced developers, I would use an ORM that handles this automatically, with the option to disable this feature (we can rewrite it once revenue starts coming in).

    – I was about to write that I don’t use soft delete very often because it often gets in the way, but I realized that when it comes to scalable data without relational support, soft delete is actually a pretty good solution...

    – I’ve been actively working with MongoDB for the past year, and I dislike it with every fiber of my being. You still have a schema, just in code; migrations will still be necessary (if you want to recalculate or move certain values), as well as seeds; aggregations in MongoDB are a nightmare compared to SQL, etc. But again, it’s a matter of personal taste/habit.

    – Moving data to ClickHouse is a cool idea until that data becomes invalidated 😢

    That’s why I usually just go with PostgreSQL, create a separate read replica for analysts, and if something more advanced is needed without invalidation, you can send data directly from PostgreSQL to ClickHouse and even clean it up in PostgreSQL.

    My only complaint with PostgreSQL is the lack of automatic sharding (I love inserting a ton of junk and not worrying about how it will scale), so I’m starting to look at CockroachDB, PlanetScale, YDB, and similar options.

  • @ Bogdan Node.js RentalClub · # 290

    100% agree on the first point)

    Regarding the second point, it’s interesting to hear about the cases where soft delete has caused inconvenience for you. For me, it’s partly a relic of the past, back when we were building a large social network and SSDs weren’t a thing. Disk fragmentation from frequent deletions was a real pain and significantly impacted performance.

    On the third point, it’s definitely a matter of taste. Interestingly, we managed to avoid large migrations even on a big project with 20k RPC and collections >30 million records. It ran on several servers with a replica set and worked fine with properly set indexes) But we spent a lot of time meticulously planning the collection architecture and rules for working with them. We don’t use aggregations—they’re a nightmare no matter how you look at them)

    As for cache invalidation (if I understood you correctly in this context—stale data \ outdated data), we don’t have that issue because we log historical events. That is, if something changes, there’s a record of that change. This also allows us to fetch the latest up-to-date data or review the history of changes) With this setup, we don’t have report tables—instead, we have queries\views that generate reports.

  • @ Bogdan Node.js RentalClub · # 291

    Automatic sharding in Postgres is, by the way, a very old pain, but apparently, it’s now a feature for fine-tuning everything.

  • @ [ $davids.sh ] · # 292

    Regarding soft delete, you're actually saying very sensible things. I would sit down and think more about why I tried to avoid it, because I realized that my old arguments no longer hold (I should read up on this topic).

    As for MongoDB, it seems you really know how to handle it, and that's awesome! We also have around 15k/rpc on 500GB, collections with 20-60 million records, and it holds up just fine.

    The issue here is more that most teams I've seen using MongoDB just dumped data into it without any backward compatibility rules, normalized everything excessively, never cleaned up, and yet took pride in being able to do all this (the fact that everything lagged terribly and it was impossible to implement features because the schema was always incomplete didn't bother anyone).

    And about events – that’s just beautiful 🎩 Do you have something like Event Sourcing?