Introduction to Compaction Strategies in Cassandra
Cassandra Compaction Strategy Diagram – STCS vs LCS vs TWCS
In Apache Cassandra, data is written to disk in immutable files called SSTables. Over time, multiple SSTables accumulate for the same table, leading to redundancy and increased read latency. To manage this efficiently, Cassandra uses a process called compaction.
Compaction merges multiple SSTables into fewer, larger ones — removing outdated or deleted data and improving read performance. The strategy Cassandra uses to determine when and how to perform compaction is known as a Compaction Strategy.
Compaction is the process of merging SSTables to optimize disk usage and read efficiency. It helps by:
Removing tombstones (markers for deleted data)
Merging overlapping SSTables
Reducing disk fragmentation
Improving read performance
Every compaction results in fewer, more optimized SSTables.
Data is first written to a memtable (in-memory structure).
When the memtable is full, it’s flushed to disk as an SSTable.
Over time, multiple SSTables accumulate.
Cassandra triggers a compaction process to merge and clean them.
Cassandra provides several compaction strategies, each suited to different workloads and data patterns.
Default strategy for most write-heavy workloads.
Merges SSTables of similar sizes into larger SSTables.
Best suited for write-intensive workloads.
Simple and effective for write-heavy systems.
Efficiently merges small SSTables.
Can lead to high disk I/O during major compactions.
Increased read latency as SSTables accumulate.
ALTER TABLE users WITH compaction = {
'class': 'SizeTieredCompactionStrategy'
};
Designed for read-heavy workloads.
Organizes SSTables into levels, where each level has SSTables of a fixed size.
Each SSTable in a level contains non-overlapping token ranges.
Fewer SSTables per read → improved read performance.
Predictable disk usage.
Higher write amplification.
Requires more disk space during compaction.
ALTER TABLE orders WITH compaction = {
'class': 'LeveledCompactionStrategy'
};
Best for time-series data (e.g., logs, IoT data, metrics).
Groups SSTables into time windows (e.g., hourly, daily) and compacts within each window.
Avoids rewriting old data unnecessarily.
Ideal for data with a TTL (time to live).
Reduces I/O by only compacting recent data.
Not ideal for random-access data patterns.
ALTER TABLE metrics WITH compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 1
};
| Strategy | Best For | Read Performance | Write Performance | Disk Usage | Notes |
|---|---|---|---|---|---|
| STCS | Write-heavy workloads | Moderate | High | Moderate | Default strategy |
| LCS | Read-heavy workloads | High | Moderate | High | Levels control overlap |
| TWCS | Time-series data | High | Moderate | Efficient | Best for TTL data |
| Use Case | Recommended Strategy |
|---|---|
| High write volume, minimal reads | SizeTieredCompactionStrategy |
| Read-heavy, low-latency queries | LeveledCompactionStrategy |
| Time-series or TTL-based data | TimeWindowCompactionStrategy |
DESCRIBE TABLE user_activity;
Output:
compaction = {
'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
}
To change it:
ALTER TABLE user_activity WITH compaction = {
'class': 'LeveledCompactionStrategy'
};
✅ Choose a strategy based on data pattern (writes, reads, or time-series).
✅ Avoid frequent major compactions—they’re I/O intensive.
✅ Monitor compaction metrics using nodetool compactionstats.
✅ Use TWCS for expiring data with TTLs.
✅ Always test compaction changes on a staging cluster first.
Compaction is a vital process in Cassandra that keeps your data efficient, consistent, and fast to read. Understanding and choosing the right compaction strategy ensures optimal performance and storage management.
Use STCS for write-heavy tables.
Use LCS for read-intensive workloads.
Use TWCS for time-series or expiring data.
By fine-tuning compaction strategies, you can greatly improve cluster health, query speed, and storage utilization.