Designing and Planning an Event Store System
CQRS and Event Sourcing design patterns
I am designing and building a Data Product, this will expect to receive a lot of inmutable notifications from different social network platforms, so, I will need to perform write operations of these notifications fast, and sometimes I will have to perform read operations with complex agregations against this database, a little later I will create another article with details about the whole technology, but, for now I have a problem or rather a doubt, and this issue is called: DATA STORAGE. Although I am very clear about all concepts around the CAP Theorem, my main concern is selecting one among the hundreds of types of databases currently in existence, and this last argument dragged me to write this article and concentrate on build my own Event-Based Storage.
This article is not about simply presenting a technical concepts, so it is important that you read some concepts that I have highlighted in bold in order to have some context, this article is based on explaining each of my decisions quickly.
What is an Event Store System?
An event store system is a kind of storage to permanent persist and manage a log of inmutable (cannot be modified or deleted) events or actions that have occurred in a system or application. This log is called the event stream, and it is typically stored in a append-only fashion, meaning that new kind of events can be added to the stream, this concept is very different from OLTP applications, where events can be mutable, and most of the events are CRUD operations, which requires being more demanding with data consistency.
One of the main advantages of using an event store system is that it will allows applications to keep track, capture and store a whole history of all events that have occurred in the system by providing a detailed record of the state changes and actions that have taken place. This can be useful for various purposes, such as data analysis, debugging, auditing and more.
In the architecture shown above it is easy to recognize that there are three architectural patterns, (Event Sourcing, CQRS, Database per Service) which coincidentally are the most common to use to design this type of system. Each of these events in the event stream is represented as a record in the database, and it typically includes a unique identifier, which is a timestamp, and the data associated with it. The event store system provides APIs for applications to WRITE new events to the stream and to READ events from the stream for processing or analysis.
Event Sourcing pattern
As I said before, I will be receiving many notifications from social media platforms, each notification is a post and has a context, these posts will be immutable facts, events or actions.
Having a common place to store events or facts has the following advantages:
- Temporary queries
- Last status of a specific post
- Audit a specific post in real time
- Execute queries in parallel against the event log.
- Multiple systems reading from the same event log.
It is common to see this architectural pattern and CQRS working together.
Command and Query Responsibility Segregation (CQRS) pattern
This pattern practically means that the place where the data is written is different from the place where the data is read. Sometimes you will have more reads than writes or more writes than reads, then you could segregate both behaviors by creating an exact copy of the main database which would cause both services to scale independently and it could adapt to sudden different workloads.
Not all services can read the last event from the event log at the same time, they will have a delay between reads, sometimes some services might be out of sync when there is a lot of data being written, a real consistency is reached when there is no writes occurring, and we call this eventual consistency.
Database per Service pattern
In the architecture shown above it is hard to recognize this pattern, but it is there, in this case more granular control for scaling horizontally is required and decoupling databases also will improve the resiliency of your whole application and will ensure that a single database cannot be a single point of failure, and the last and no less important feature of this pattern will be that microservices have different security requirements, so there is no need to share data in the same database for different kind of services.
The only challenge would be to perform complex queries that aggregate results from multiple microservices or databases.
Proposed Architecture
Now that we have an understanding of why these three patterns are present in the candidate architecture, then I am going to proceed by explaining the whole process, but first I will give a brief explanation of the meaning of some symbols in the proposed architecture.
Transitions between Events and Queries
Once a request is received to try to write an event to the Master Event Store, it will first be inserted and marked as pending event with the color gray in a common shared memory for all services. At the same time, there is another service called “producer” that will be in charge of looking for all the messages that are pending to be written (grey) and once they have been inserted in the Master Event Store these events will now be marked in the shared memory with a status indicating that the event already exists in the Master Event Store (blue). Another service called “consumer” responsible for maintaining the sync of the databases will take all the events that are in the memory with the blue status and once they are replicated in the Slave Event Store, then they will be marked with a status (green), events in shared memory with a green status exist in both, Master and Slave Event Stores.
Once a request is received to try to read data from the Slave Event Store, it will first be inserted and marked as pending query with the color yellow in a common shared memory for all services. At the same time, there is another service called “query handler” that will be in charge of looking for all the queries that are pending to attend (yellow), the shared memory will now have events that are already sync in both Events store (green), and this can be taken advantage of by the “read services”, remember that we follow the CQRS pattern, once a query is requesting data for a specific event then it could ask to the shared memory first, this could serve as a cache layer to improve the overall latency for responses, if the event has already expired from memory or does not exist due to eventual consistency or by deletion (red), then it will search for the records or events directly in the Slave Event Store, all the attended queries will be marked with the color orange and will be cached for an suitable period of time.
REST and gRPC Services
I will use gRPC for internal communications between my microservices, and REST as a endpoint to the outside world. I am proposing two REST services, one for writes and one for reads, both services can be built with FastAPI and they should have the ability to scale horizontally, many external platforms will be consuming this services in parallel.
The blue box with the word gRPC stands for the grpc client, and the red box with the word gRPC stands for the grpc server, as you can notice there is gRPC client interface in both REST services which will allow to the gRPC server knows when a new request should be attended
There will be three gRPC servers (publisher, consumer, query handler), each one running on its own serverless layer to allow horizontal scaling, but so far I have two different communication strategies:
- Attend requests in batches: when requests are received via the REST interface, they can be placed directly in a shared memory, and the services “Publisher”, “Consumer” and “Query Handler” can read unattended requests from this shared memory in batches.
- Bidirectional request and response: whenever a request is received, it is placed in shared memory and the REST service invokes the “Producer” gRPC server with the request ID, the gRPC server takes the request that matches the ID from the shared memory and serves it. At the same time, this gRPC server could call the “Consumer” gRPC server to notify that there is a record in the shared memory that needs to be synchronized.
Columnar and Row-Based Data Storage
The storage strategy will use a hybrid approach, row-based storage to speed up writes, more efficient data integrity checks, as the entire row of data can be checked to ensure that all the values are valid and consistent, and columnar-based storage to speed up reads, only the specific columns of data that are needed for a particular query, more efficient data compression, which can reduce the amount of space required to store the data, reduce storage costs and overall performance.
The query handler service will act as a compute node in a serverless mode, the design of this kind of product will not require add an extra layer to distribute the computation in parallel in a MPP fashion, at each moment the service will be in charge of executing one query at time on its own machine and will distribute the computation between all its processors, the execution of the query will be multiprocessor for now, this is a simple MVP.
This is the first part of this article, I will be adding more details in the next ones and pieces of code from each section, for now I plan to use AWS S3 to store the data, because S3 offers high availability, but this would add more complexity to be able to create an efficient read and write approach, since objects in S3 are binary and immutable.
