Skip to main content

Event sourcing instead of multiple resource transactions

In the previous post (here) I tried to describe the common transactional problems people deal with when it comes to communication via messages. In short - we need to dispatch or consume a message (or both) and store data in DB and we would like all those operations to be in one transaction.

In order to understand where event sourcing comes from we need to look at how most of the modern databases work. I`ll use MS SQL as an example but with some variations it can apply to almost any relational db.

If you look at the DB files you`ll find two types of files: DB file itself (.mdf in ms SQL) and a transactional log file(,ldf)

Why we need two? Because MS SQL uses Write-Ahead Transaction Log pattern. If we simplify the process it looks about this:

  1. add transactions to commit log 
  2. tail commit-log and build DB (projection) based on that data 

All writes go via commit log, all reads are hitting our projection. (Sounds like CQRS? Yep, that's it)



Why does it matter? 

Because many years ago, some people tried to solve the same transactional problems with high load consumer producers: when one system needs to commit data to db and send domain events in one transaction and other systems have to atomically consume and process those:


And at some point they made a decision that changed everything: they decided to remove the projection part from the database and made the commit-log public. A write-ahead log at essence:


Or if we look at it from the other angle, instead of adding a broker between two systems and finding a way to deal with it transactionally they shared a part that can be used by any participant: commit log.

And by partitioning log they made writes and reads super scalable:


Those guys worked for LinkedIn and as you might guess what they created is known as Kafka:

Kafka is not the only system that can be used for event streaming but the concept stays the same: Dispatch Events and let each system eventually build its own projection of those.

Sounds good. What's the catch? 
The first obvious one is that it changes the way we need to do things. No more writes to DB, your writes go-to event stream. Your app is 90% crud around some entities or most of the code is already written and 'all that's left' is some synchronization? Then forget about event sourcing.

The second catch is in the content of the message. Normally we want to have some level of encapsulation: store everything to DB but expose just a part of it outside or have internal and external(domain) events. But a shared commit log is the opposite of that, what you don't store will be lost, so your events should include every single detail. Any breaking change in your service now might impact the content of the event and require to cascade updates in otherr services. There are still ways to encapsulate the data by providing different schemas to external and internal readers but somehow it never worked well for SQL-DB though it had a much wider set of tools for that.

What are the other options? About that in the next chapter:

Comments

Popular posts from this blog

Using MinIO as on premises object storage with .NET and S3 SDK

Ever tried to find a blob store that can work on-premises as well as in a cloud, support meta-data, scale well and have .NET client libraries? I did and stopped on MinIO . Well, honestly to my surprise I was quite limited in my choice. It's free, it's open-source, it can work on-premises and has helm charts for k8s. The best thing is that its S3 compatible, so if one day you move to the cloud the only thing you`ll need to change in your code is a connection string. The easiest way to start is by starting a docker image. Pull the image: docker pull minio/minio start for testing (data will be part of the container, so after a restart, all files will be gone docker run -p 9000:9000 minio/minio server /data Or start with a mapped image in windows: docker run -p 9000:9000 --name minio1 \ -v C:\data:/data \ minio/minio server /data When the server is up you can access it by http://127.0.0.1:9000/minio/login default user/password: minioadmin/minioadmin Working wi...

Avoiding distributed transactions (DTC) with SQL Server and async code

Wrapping async code in transaction scope is not as straightforward as sync one. Let's say we have some simple code: await using (var connection = new SqlConnection(connectionString)) { await using var command = new SqlCommand("select 1", connection); await connection.OpenAsync(); await command.ExecuteScalarAsync(); } We can wrap it in transaction scope and test that it still works: using var ts = new TransactionScope(); await using (var connection = new SqlConnection(connectionString)) { await using var command = new SqlCommand("select 1", connection); await connection.OpenAsync(); await command.ExecuteScalarAsync(); } ts.Complete(); But if you try to run this code you will get: "A TransactionScope must be disposed on the same thread that it was created" exception.  The fix is easy: we need to add TransactionScopeAsyncFlowOption.Enabled option to the constructor: var options = new TransactionOptions { IsolationLevel = IsolationLevel.ReadCom...

Fluent-Bit and Kibana in Kubernetes cluster or minikube

Agenda I`ll show how to setup a centralized logging solution running in k8s cluster that works beyond hello world examples.I`ll use local minikube but the same charts with adjustments could be used for normal k8s cluster (the real diff usually comes with usage of persistent storage). What you need to be installed: K8s Cluster (as I said, I use minikube ) Helm ( https://helm.sh/docs/intro/install/ ) Code: https://github.com/Vfialkin/vf-observability A bit of theory first: Let’s start with how logging works by default in Docker and Kubernetes. application log appender should forward logs to standard output, this way it will be passed to Docker container.  default container logging driver will forward them to Pod where logs are stored as JSON files (see: configure logging drivers ). There are other options for log drivers like  syslog, fluentd or splunk , but for now, I’ll limit scenario to default driver. at the end all those files will end-up in a node folde...