Materialized Views in Cassandra: A Complete Guide
Materialized View architecture in Cassandra showing synchronization between base table and view
In Apache Cassandra, Materialized Views (MVs) provide a powerful mechanism for automatically maintaining alternate query patterns without duplicating data manually. They simplify querying by creating a new table that automatically stays in sync with the base table.
Introduced in Cassandra 3.x, Materialized Views help reduce complexity in application logic by letting you query the same data using different primary key configurations.
A Materialized View in Cassandra is a predefined query result that is automatically updated as the base table changes. Unlike a traditional view in relational databases, Cassandra’s MVs physically store data, improving query speed at the cost of additional storage and write overhead.
Suppose you have a users table that stores user details:
CREATE TABLE users (
user_id UUID PRIMARY KEY,
name text,
email text,
city text
);
By default, you can only query by user_id. If you also want to query users by email, you can create a Materialized View instead of creating a separate denormalized table.
When you create a Materialized View, Cassandra:
Creates a new table based on a different primary key.
Automatically propagates updates, inserts, and deletes from the base table.
Ensures that the view stays consistent with the base table.
Each node responsible for the base table data also manages the corresponding view data to ensure consistency.
CREATE MATERIALIZED VIEW keyspace_name.view_name AS
SELECT <columns>
FROM keyspace_name.base_table
WHERE <column> IS NOT NULL
PRIMARY KEY (<new_primary_key>);
CREATE MATERIALIZED VIEW users_by_email AS
SELECT user_id, name, email, city
FROM users
WHERE email IS NOT NULL
PRIMARY KEY (email, user_id);
The WHERE clause ensures the indexed column (email) is not null.
The PRIMARY KEY defines how the data is partitioned and clustered in the view.
Now you can query the users_by_email view directly:
SELECT * FROM users_by_email WHERE email = 'john@example.com';
All data changes in the base table are automatically reflected in the Materialized View.
UPDATE users SET city = 'Mumbai' WHERE user_id = 1a2b3c4d;
This update will also modify the users_by_email view automatically.
Similarly, if you delete a row:
DELETE FROM users WHERE user_id = 1a2b3c4d;
The corresponding entry in the view is also removed.
| Benefit | Description |
|---|---|
| Simplified query logic | No need to manually maintain multiple denormalized tables. |
| Automatic synchronization | Base table and view stay consistent automatically. |
| Improved query performance | Enables fast lookups using alternate keys. |
| Reduced development complexity | Eliminates the need for maintaining synchronization logic at the application level. |
Despite their convenience, Materialized Views come with certain limitations.
| Limitation | Explanation |
|---|---|
| Eventual consistency | Updates may take time to propagate; strong consistency isn’t guaranteed. |
| Write amplification | Each write to the base table triggers writes to associated views. |
| Schema restrictions | Cannot create a view with arbitrary WHERE clauses or aggregation. |
| Performance overhead | Can increase disk I/O and storage usage. |
| Limited support for complex queries | Views don’t support filtering beyond their defined key. |
DESCRIBE MATERIALIZED VIEWS;
DROP MATERIALIZED VIEW users_by_email;
DESCRIBE MATERIALIZED VIEW users_by_email;
✅ Recommended:
Use Materialized Views for simple lookups (e.g., querying by a secondary key).
Always include non-null constraints in the WHERE clause.
Ensure that the base table’s partition key is part of the view’s primary key.
Monitor performance using nodetool cfstats.
🚫 Avoid:
Using Materialized Views for high write throughput workloads.
Creating multiple views per base table.
Using views for range queries or aggregations.
CREATE TABLE orders (
order_id UUID PRIMARY KEY,
customer_id UUID,
order_date timestamp,
total_amount double
);
CREATE MATERIALIZED VIEW orders_by_customer AS
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE customer_id IS NOT NULL AND order_id IS NOT NULL
PRIMARY KEY (customer_id, order_date, order_id)
WITH CLUSTERING ORDER BY (order_date DESC);
SELECT * FROM orders_by_customer WHERE customer_id = 12345 ORDER BY order_date DESC;
This query retrieves all orders placed by a specific customer in descending order of date.
Limit Materialized Views to low-to-medium write workloads.
Use consistent hashing for even data distribution.
Rebuild views if inconsistencies occur using:
nodetool rebuild_materialized_view <keyspace> <view_name>
Periodically check view lag or stale data issues in high-throughput systems.
Materialized Views in Cassandra simplify data modeling by enabling alternate query access paths without complex denormalization. However, they should be used with caution — particularly in high-write or latency-sensitive environments.
By understanding their benefits, limitations, and proper design patterns, you can leverage Materialized Views effectively to optimize your Cassandra-based applications.