🔤 Auto-Scaling DB - Relations + XXX = Awesomeness 🔤

🔤 Auto-Scaling DB - Relations + XXX = Awesomeness 🔤

"Auto-Scaling DB" refers to an auto-sharded database in a master-master format (CockroachDB, YDB, ScyllaDB, Clickhouse, PlanetScale, etc.), which allows for writing a large amount of data quickly and with conditionally unlimited parallelism.

"- Relations" because in 90% of such databases, there will be no CONSTRAINTS that allow the database to clean up or change related data when it is modified or deleted. Most often, maintaining relations in distributed databases is a task for which database authors cannot provide the required guarantees and execution time for each individual project.

I think that most developers, after working with relations for a year or two, love them very much, which is why it's hard to give up on such luxuries as: "if I delete a User, I want all related data to be deleted as well, and how can I organize this if there are no relations?"

And to start, a spoonful of tar: I've heard multiple times, and only believed over time, that relations (especially CASCADE DELETE) are a double-edged sword. This is poorly controlled, the entire cascade is hard to see in the database, and some accidental deletions can have irreversible consequences. There are articles 1, 2 that discuss this. Sometimes it's better to leave the data in place "until better times," because often when a user makes a request, they won't see the data that's no longer related to them (due to the lack of relations), or this data refers to "past facts" that shouldn't be deleted at all.

Okay, but what if you need to delete something, or remove a FOREIGN KEY, or set a deletedAt?

. A simple option is a cron that cleans up "orphaned" records.

. A reliable but laborious option is to do it manually, understanding what and how to delete or change in each individual place (on complex projects, you sometimes have to resort to this instead of relations).

. A working but imperfect option is an ORM or a custom plugin, which applies cascade actions when changing or deleting an entity at the code level. This is a very dangerous and often unpredictable story (although I've seen similar successful experiences).

. A more advanced option is to throw an entity deletion event into a Queue, read it with a separate service, and make the necessary decisions. You'll need to have and configure an MQ, and not forget to write all these events.

. And the coolest option is Change Data Capture (CDC) – a database mechanism that allows you to subscribe to data change events, which it will send to you on the fly.

And it's precisely the built-in CDC that is the XXX ingredient that adds convenience and reliability to the solution of the relations issue in these databases.

So, are you looking for a new top-notch database? Make sure it comes with relations or CDC, or better yet, both.

Powerful pumping to you 😎

P.S.

I only touched on data deletion, yes, there's also checking for existence when writing and similar things, but these are solved more by architecture than by CDC, and the post is already long, so maybe next time.

#postgresql #db #highload #ydb