Cassandra Read and Write Path – Architecture Explained

Apache Cassandra is a highly scalable, distributed NoSQL database designed for high availability and fault tolerance. It’s widely used in applications where speed, reliability, and scalability are essential—such as Netflix, Instagram, and Apple’s iCloud services.

To understand Cassandra’s performance and fault-tolerance capabilities, it’s crucial to explore how data flows inside the system through its read and write paths.

This article dives deep into Cassandra’s architecture, explaining how data is written, stored, and read efficiently across nodes.

Cassandra Write Path Architecture

Cassandra follows a write-optimized architecture, ensuring fast and durable writes. Here’s how the data flows during a write operation:

1. Client Request

When a client sends a write request, it goes to a coordinator node — any node in the cluster can act as a coordinator. The coordinator node manages the request and forwards it to other replicas based on the replication strategy.

2. Commit Log

Before anything else, Cassandra writes the data to a commit log on disk.
This ensures durability — even if the node crashes, the data can be recovered from the commit log.

The commit log is append-only.
It stores write operations sequentially, which makes it very fast.

3. Memtable

After writing to the commit log, Cassandra stores the data in memory — inside a data structure called the Memtable.
Memtables are in-memory caches that hold recent writes.

Each table (column family) has its own Memtable.
When a Memtable is full, it is flushed to disk as an SSTable.

4. SSTable (Sorted String Table)

When Memtables reach a certain size, Cassandra writes them to immutable SSTables on disk.
Each SSTable is a sorted file containing key-value pairs and indexes for quick lookup.

Multiple SSTables are later compacted to reduce disk space and improve read performance.
SSTables are immutable, making write operations non-blocking and efficient.

5. Hinted Handoff (for Fault Tolerance)

If a replica node is down during a write, the coordinator stores a hint — a small record that will later be delivered to the unavailable node once it comes back online.
This mechanism maintains high availability even in case of node failures.

Cassandra Read Path Architecture

While the write path is simple and fast, Cassandra’s read path is more complex. The read path ensures that the most recent and consistent version of the data is returned to the client.

1. Client Request and Coordinator Node

When a read request is received, the coordinator node is chosen (just like in writes). It determines which replica nodes should be contacted, based on the consistency level specified by the client.

2. Replica Nodes and Consistency Level

Cassandra uses tunable consistency — you can control how many replicas must respond before a result is returned.

Common consistency levels:

ONE: Returns data from the first replica that responds.
QUORUM: Waits for the majority of replicas to respond.
ALL: Waits for responses from all replicas.

3. Bloom Filter and Partition Index Lookup

Each replica node checks:

Bloom Filter: Quickly determines if the requested key might exist in an SSTable.
Partition Index: Locates the partition key in SSTables for faster reads.

These mechanisms minimize disk I/O, improving query performance.

4. Read from Memtable + SSTables

Cassandra first looks for data in:

Memtable (in-memory)
SSTables (on disk)

If the data exists in multiple SSTables, Cassandra merges them using timestamp-based reconciliation to get the latest version.

5. Read Repair

To maintain consistency across replicas, Cassandra performs a read repair in the background.
If it finds outdated replicas during a read, it updates them automatically to ensure consistency over time.

Cassandra’s Architecture Components

Component	Function
Commit Log	Ensures durability of writes
Memtable	In-memory buffer for recent writes
SSTable	Persistent, immutable disk storage
Coordinator Node	Manages client read/write requests
Bloom Filter	Reduces unnecessary disk lookups
Read Repair	Fixes inconsistent data across replicas
Hinted Handoff	Stores temporary write hints for down nodes

Advantages of Cassandra’s Read-Write Design

High Availability: Even if nodes fail, data remains accessible.
Durability: Commit log ensures no data loss.
Scalability: Write and read operations scale horizontally.
Performance: In-memory Memtables and Bloom filters optimize speed.
Tunable Consistency: Control between availability and consistency.

Conclusion

Cassandra’s read and write path architecture reflects its design philosophy: “write fast, read smart.”
By separating the write and read processes and optimizing each for speed and reliability, Cassandra achieves outstanding performance and scalability — making it one of the most powerful NoSQL databases for distributed systems.