Configuring Cassandra Cluster: A Complete Step-by-Step Guide

10/12/2025
All Articles

Cassandra cluster configuration architecture diagram

Configuring Cassandra Cluster: A Complete Step-by-Step Guide

Configuring Cassandra Cluster: A Complete Step-by-Step Guide

Introduction

Apache Cassandra is a distributed NoSQL database designed for scalability, fault tolerance, and high availability. To fully utilize its power, you need to configure it as a cluster — a group of interconnected nodes that share data and handle workloads efficiently.

In this guide, we’ll explain how to configure a Cassandra cluster from scratch, including network setup, configuration files, replication, and verification.


1. Understanding Cassandra Cluster Architecture

A Cassandra cluster is a collection of multiple nodes (servers) working together. Each node stores a part of the data and communicates with others using the gossip protocol.

Key Components:

  • Node: The basic unit in Cassandra that stores data.

  • Cluster: A collection of nodes working together.

  • Keyspace: The top-level namespace defining data replication strategy.

  • Data Center: A logical grouping of nodes for replication and load balancing.

Example Setup:

  • Node1 → 192.168.1.101

  • Node2 → 192.168.1.102

  • Node3 → 192.168.1.103


2. Pre-requisites for Cluster Configuration

Before configuring Cassandra nodes:

Install Cassandra on all nodes

Follow the Cassandra installation guide for Linux or Windows.

Ensure the following:

  • All nodes use the same Cassandra version.

  • All nodes have unique IP addresses.

  • Firewall allows communication on these ports:

    • 7000 – intra-node communication

    • 7001 – encrypted intra-node communication

    • 7199 – JMX monitoring

    • 9042 – CQL clients

    • 9160 – Thrift clients (optional)


3. Editing cassandra.yaml Configuration File

The main configuration file for Cassandra is located at:

/etc/cassandra/cassandra.yaml

You must modify this file on each node.

Important Parameters to Configure:

Parameter Description Example
cluster_name Defines the cluster’s name cluster_name: 'MyCassandraCluster'
listen_address Node’s local IP for communication listen_address: 192.168.1.101
seeds List of seed nodes for gossip - seeds: "192.168.1.101,192.168.1.102"
rpc_address IP address to connect clients rpc_address: 0.0.0.0
endpoint_snitch Network topology setting endpoint_snitch: GossipingPropertyFileSnitch

Note:

  • The seed node helps other nodes discover the cluster.

  • Use at least two seed nodes for fault tolerance.


4. Setting up Seed Nodes

Select one or two nodes as seed nodes.

In cassandra.yaml on all nodes, set:

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "192.168.1.101,192.168.1.102"

5. Configure Environment Variables

Edit cassandra-env.sh (Linux) or cassandra-env.ps1 (Windows) to set the IP for each node:

JVM_OPTS="$JVM_OPTS -Dcassandra.listen_address=192.168.1.101"
JVM_OPTS="$JVM_OPTS -Dcassandra.rpc_address=0.0.0.0"

6. Starting the Cluster

On each node, start Cassandra:

sudo systemctl start cassandra

Verify node status:

nodetool status

Example output:

Datacenter: dc1
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.1.101    200 MB     256     33.3%             e91f1c9f-87b6-44c5-b98d-77a6512e53d2  rack1
UN  192.168.1.102    210 MB     256     33.3%             a02e1b1b-6a0d-4eab-b3db-8d2a68db6572  rack1
UN  192.168.1.103    220 MB     256     33.3%             b21e3c9d-77a1-4f67-9e9a-223a121e623f  rack1

If all nodes show UN (Up and Normal) — your cluster is configured successfully!


7. Configure Replication and Keyspace

Use CQLSH to define data replication across nodes:

CREATE KEYSPACE company
WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3};

Then:

USE company;
CREATE TABLE employees (id UUID PRIMARY KEY, name text, department text);

Replication ensures that your data remains available even if one node fails.


8. Testing Cluster Connectivity

Run from one node:

cqlsh 192.168.1.101

Insert test data:

INSERT INTO company.employees (id, name, department) VALUES (uuid(), 'John', 'IT');

Then check on another node:

SELECT * FROM company.employees;

If data appears — replication works!


9. Monitoring and Maintenance

Use these commands regularly:

Command Description
nodetool status Shows cluster health
nodetool repair Repairs inconsistencies
nodetool cleanup Removes deleted data
nodetool ring Displays token distribution

You can also integrate Prometheus + Grafana for advanced monitoring.


Conclusion

You’ve successfully learned how to configure a Cassandra cluster across multiple nodes. By setting up proper seed nodes, replication strategies, and snitches, you ensure that your cluster is scalable, fault-tolerant, and high-performing.

This foundation will help you manage large-scale distributed applications effectively.

Article