Event sourcing instead of multiple resource transactions

In the previous post (here) I tried to describe the common transactional problems people deal with when it comes to communication via messages. In short - we need to dispatch or consume a message (or both) and store data in DB and we would like all those operations to be in one transaction.

In order to understand where event sourcing comes from we need to look at how most of the modern databases work. I`ll use MS SQL as an example but with some variations it can apply to almost any relational db.

If you look at the DB files you`ll find two types of files: DB file itself (.mdf in ms SQL) and a transactional log file(,ldf)

Why we need two? Because MS SQL uses Write-Ahead Transaction Log pattern. If we simplify the process it looks about this:

add transactions to commit log
tail commit-log and build DB (projection) based on that data

All writes go via commit log, all reads are hitting our projection. (Sounds like CQRS? Yep, that's it)

Why does it matter?

Because many years ago, some people tried to solve the same transactional problems with high load consumer producers: when one system needs to commit data to db and send domain events in one transaction and other systems have to atomically consume and process those:

And at some point they made a decision that changed everything: they decided to remove the projection part from the database and made the commit-log public. A write-ahead log at essence:

Or if we look at it from the other angle, instead of adding a broker between two systems and finding a way to deal with it transactionally they shared a part that can be used by any participant: commit log.

And by partitioning log they made writes and reads super scalable:

Those guys worked for LinkedIn and as you might guess what they created is known as Kafka:

The Log: What every software engineer should know about real-time data's unifying abstraction

Kafka is not the only system that can be used for event streaming but the concept stays the same: Dispatch Events and let each system eventually build its own projection of those.

Sounds good. What's the catch?

The first obvious one is that it changes the way we need to do things. No more writes to DB, your writes go-to event stream. Your app is 90% crud around some entities or most of the code is already written and 'all that's left' is some synchronization? Then forget about event sourcing.

The second catch is in the content of the message. Normally we want to have some level of encapsulation: store everything to DB but expose just a part of it outside or have internal and external(domain) events. But a shared commit log is the opposite of that, what you don't store will be lost, so your events should include every single detail. Any breaking change in your service now might impact the content of the event and require to cascade updates in otherr services. There are still ways to encapsulate the data by providing different schemas to external and internal readers but somehow it never worked well for SQL-DB though it had a much wider set of tools for that.

What are the other options? About that in the next chapter:

БК-0010-01 | .NET Dev blog

Search This Blog

Event sourcing instead of multiple resource transactions

Labels

Comments

Post a Comment

Popular posts from this blog

Using MinIO as on premises object storage with .NET and S3 SDK

Avoiding distributed transactions (DTC) with SQL Server and async code

Fluent-Bit and Kibana in Kubernetes cluster or minikube