Secondary Indexes in Cassandra: A Complete Guide

10/12/2025
All Articles

Cassandra Secondary Index architecture diagram showing index table and base table relationship

Secondary Indexes in Cassandra: A Complete Guide

Secondary Indexes in Cassandra: A Complete Guide

Introduction

In Apache Cassandra, queries are usually performed using the primary key (partition key + clustering key). However, in many real-world scenarios, you may need to query data using non-primary key columns. This is where Secondary Indexes come into play.

A Secondary Index allows you to efficiently query columns that are not part of the primary key, expanding the flexibility of data retrieval in Cassandra.


1. What is a Secondary Index?

A Secondary Index in Cassandra is a data structure that provides an alternate lookup path for querying a table using non-primary key columns.

Example:

Let’s say you have the following table:

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    name text,
    email text,
    city text
);

If you frequently need to query users by city, you can create an index on the city column:

CREATE INDEX ON users (city);

Now, you can run queries like:

SELECT * FROM users WHERE city = 'Delhi';

Without this index, Cassandra would not allow such a query because city is not part of the primary key.


2. How Secondary Indexes Work in Cassandra

Internally, Cassandra creates a hidden table to store mappings of indexed column values to the primary key of the original table.

When you query an indexed column:

  1. Cassandra looks up the value in the index table.

  2. It retrieves the corresponding primary key(s).

  3. Then, it fetches the full data from the main table.

This lookup mechanism makes Secondary Indexes convenient but not always optimal for large datasets.


3. When to Use a Secondary Index

Secondary Indexes can be helpful in specific cases, but they should be used with caution.

Recommended Use Cases:

  • When the cardinality (number of unique values) of the indexed column is low or moderate.

  • When queries are not frequent and data volume is relatively small.

  • When the indexed column is evenly distributed across partitions.

Avoid Using When:

  • You have a high number of unique values (e.g., indexing a timestamp column).

  • The table has a very large number of rows.

  • Queries involve multiple conditions or filtering across different indexed columns.


4. Example: Creating and Using Secondary Index

Step 1: Create a Table

CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name text,
    category text,
    price double
);

Step 2: Create an Index on Category

CREATE INDEX category_index ON products (category);

Step 3: Query by Category

SELECT * FROM products WHERE category = 'Electronics';

This will return all products belonging to the 'Electronics' category.


5. Types of Indexes in Cassandra

Cassandra supports multiple indexing mechanisms depending on the version and use case.

1. Regular Secondary Index

  • Default index type created using CREATE INDEX.

  • Best for small, evenly distributed datasets.

2. Custom Index (SASI - SSTable Attached Secondary Index)

  • Introduced in Cassandra 3.x.

  • Supports text search and range queries.

Example:

CREATE CUSTOM INDEX ON products (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';

3. Storage-Attached Index (SAI) (Cassandra 4.x and later)

  • Highly optimized indexing engine.

  • Supports multiple columns, numeric, and range queries.

Example:

CREATE CUSTOM INDEX ON products (price) USING 'StorageAttachedIndex';

6. Managing Secondary Indexes

Drop an Index:

DROP INDEX category_index;

List All Indexes in a Table:

DESCRIBE TABLE products;

Rebuild an Index:

REBUILD INDEX ON products (category_index);

7. Performance Considerations

Factor Recommendation
Cardinality Use for low to medium cardinality columns only.
Query frequency Avoid for frequent queries on large datasets.
Write-heavy workloads Indexes can slow down writes because they must update both the base and index tables.
Replication factor Choose replication carefully to avoid overloading specific nodes.
Monitoring Use nodetool cfstats and nodetool tablestats to analyze index performance.

8. Alternatives to Secondary Indexes

If Secondary Indexes are not suitable, consider these alternatives:

1. Materialized Views

Automatically maintain a denormalized version of data for specific queries.

CREATE MATERIALIZED VIEW users_by_city AS
SELECT * FROM users WHERE city IS NOT NULL PRIMARY KEY (city, user_id);

2. Denormalization

Store the same data in multiple tables based on access patterns.

3. Search Integration (e.g., Elasticsearch)

For complex queries, integrate Cassandra with a search engine like Elasticsearch or Apache Solr.


9. Best Practices for Using Secondary Indexes

  • Avoid indexing columns with high cardinality.

  • Do not create multiple indexes on a single table.

  • Use SASI or SAI for range and full-text queries.

  • Regularly monitor index performance and rebuild if necessary.

  • Consider query volume and write throughput before adding an index.


Conclusion

Secondary Indexes in Cassandra provide flexibility for querying non-primary key columns, but they must be used judiciously. For large-scale or high-throughput systems, they can become a performance bottleneck if not designed properly.

Always analyze your query patterns, data distribution, and workload before implementing an index. In most cases, denormalization or materialized views offer more scalable alternatives.

Article