Creating and Managing Tables in Cassandra: A Complete Guide
Creating and managing tables in Cassandra using CQL
Introduction
In Apache Cassandra, data is organized into keyspaces and tables, similar to databases and tables in relational systems. However, unlike traditional RDBMS, Cassandra’s table design is focused on scalability and fast read/write performance across distributed nodes.
In this guide, you’ll learn how to create, alter, and manage tables in Cassandra using CQL (Cassandra Query Language), along with practical examples and performance tips.
In Cassandra, a table stores rows of data organized by a primary key. The primary key determines how data is distributed across the cluster and how efficiently it can be retrieved.
Partition Key: Determines how data is distributed across nodes.
Clustering Columns: Define the order of data within a partition.
Primary Key: Combination of partition and clustering keys that uniquely identify a row.
Example:
CREATE TABLE users (
user_id UUID PRIMARY KEY,
name text,
email text,
age int
);
This example creates a table named users with user_id as the partition key.
Tables are created inside a keyspace. To start, create a keyspace first:
CREATE KEYSPACE company WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE company;
Now, create a table:
CREATE TABLE employees (
emp_id UUID PRIMARY KEY,
name text,
department text,
salary decimal
);
For advanced querying and clustering:
CREATE TABLE orders (
order_id UUID,
customer_id UUID,
order_date date,
amount double,
PRIMARY KEY (customer_id, order_date)
) WITH CLUSTERING ORDER BY (order_date DESC);
✅ Explanation:
customer_id = Partition key
order_date = Clustering key (sorted descending)
You can insert data using the INSERT statement:
INSERT INTO employees (emp_id, name, department, salary)
VALUES (uuid(), 'John Doe', 'IT', 60000);
To insert multiple rows:
INSERT INTO employees (emp_id, name, department, salary) VALUES (uuid(), 'Alice', 'HR', 55000);
INSERT INTO employees (emp_id, name, department, salary) VALUES (uuid(), 'Bob', 'Finance', 70000);
Cassandra supports powerful querying through CQL:
SELECT * FROM employees;
SELECT name, salary FROM employees WHERE emp_id = 1a2b3c4d-5e6f-7g8h-9i0j;
⚠️ Note: Queries must include the partition key — Cassandra does not allow arbitrary WHERE conditions like SQL databases.
UPDATE employees SET salary = 75000 WHERE emp_id = 1a2b3c4d-5e6f-7g8h-9i0j;
DELETE FROM employees WHERE emp_id = 1a2b3c4d-5e6f-7g8h-9i0j;
TRUNCATE employees;
You can modify an existing table structure using ALTER TABLE.
ALTER TABLE employees ADD experience int;
ALTER TABLE employees RENAME department TO dept;
ALTER TABLE employees DROP age;
To delete a table completely:
DROP TABLE employees;
To delete an entire keyspace (including all its tables):
DROP KEYSPACE company;
| Tip | Description |
|---|---|
| Use meaningful partition keys | Choose keys that evenly distribute data across nodes. |
| Avoid large partitions | Keep partition sizes below 100MB for optimal performance. |
| Design for queries | Model tables based on access patterns, not relationships. |
| Use clustering columns wisely | Define ordering for fast query performance. |
| Avoid schema changes in production | Plan schema early to prevent costly migrations. |
CREATE TABLE ecommerce.orders (
user_id UUID,
order_id UUID,
order_date timestamp,
total_amount double,
items list<text>,
PRIMARY KEY (user_id, order_date)
) WITH CLUSTERING ORDER BY (order_date DESC);
Query Example:
SELECT * FROM ecommerce.orders WHERE user_id = 1a2b3c4d ORDER BY order_date DESC;
This query returns all orders for a specific user, sorted by the most recent first.
Cassandra table creation and management revolve around understanding how data is partitioned and replicated across nodes. By following this guide, you can confidently create, query, and maintain Cassandra tables optimized for scalability and speed.
Designing your schema around query patterns and partitioning logic ensures high performance and reliability in distributed environments.