How to upgrade Apache Cassandra

An Apache Cassandra upgrade is a rolling maintenance job, not just a package replacement. Each node must leave the write path cleanly, start on the target release, rejoin as UN, and pass a client-facing check before the next node is touched.

The safest flow keeps one node under maintenance at a time. Use nodetool status to confirm the ring before the change, take a named snapshot, drain the local node, stop the service, replace the Cassandra package or binary, start the service, and verify the node from both JMX and CQL.

Major-version upgrades need a rollback boundary before newer on-disk formats are introduced. For a 4.x to 5.0 upgrade, keep storage_compatibility_mode at CASSANDRA_4 during the first binary roll, then plan separate rolling restarts for UPGRADING and NONE after the cluster is stable on the new release.

Steps to upgrade Apache Cassandra node by node:

Confirm that the target release, Java runtime, driver versions, and package repository are staged before the maintenance window.

Apache Cassandra 5.0 binary releases run on Java 11 or Java 17. Check application drivers and release notes before the first production node is drained.

Check the cluster from the node that will be upgraded.

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load        Tokens  Owns (effective)  Host ID                               Rack
UN  10.0.0.11  114.7 KiB   16      33.3%             8f4f6e2d-9f74-4f5a-a85f-43df5d4fcb21  rack1
UN  10.0.0.12  112.1 KiB   16      33.3%             14c69f94-47d1-4dc2-ae16-6b232a2cf8dd  rack1
UN  10.0.0.13  117.4 KiB   16      33.3%             d0b8b726-c680-4b42-bb57-3fa50db70e6a  rack1

Delay the upgrade if any expected serving node is down, joining, leaving, moving, overloaded, or already under repair or heavy compaction pressure.

Take a named snapshot on the node being upgraded.

$ nodetool snapshot --tag before-5-0-upgrade
Requested creating snapshot(s) for [all keyspaces] with snapshot name [before-5-0-upgrade] and options {skipFlush=false}
Snapshot directory: before-5-0-upgrade

Copy or catalog the snapshot according to the rollback plan before the live data files are compacted or rewritten.

Drain the local node immediately before stopping Cassandra.
```
$ nodetool drain
```
drain flushes memtables and stops accepting writes on the node. Do not leave the drained node in service for normal traffic.
Stop the Cassandra service on the drained node.
```
$ sudo systemctl stop cassandra
```
Use the service manager that owns the node's Cassandra process. Package installs on systemd-based Linux commonly use the cassandra unit.
Upgrade the Cassandra package on the stopped node.
```
$ sudo apt-get install --only-upgrade cassandra
```
Use the package command for the repository already staged on that host, such as sudo dnf upgrade cassandra on RPM-based systems. Keep repository setup and Java changes out of the live drain window when possible.
Merge required settings into the upgraded configuration file.
```
$ sudoedit /etc/cassandra/cassandra.yaml
```
Keep the existing cluster_name, token settings, snitch, seed list, data paths, authentication, TLS, and topology settings unless the upgrade plan explicitly changes them. For a 4.x to 5.0 roll, keep storage_compatibility_mode: CASSANDRA_4 for the first binary upgrade.
Start Cassandra on the upgraded node.
```
$ sudo systemctl start cassandra
```
Check that the service manager sees Cassandra as active.
```
$ sudo systemctl is-active cassandra
active
```
An active service only proves that the process is running. The node still needs to rejoin the ring and accept CQL traffic before the next node is upgraded.
Related: How to check Apache Cassandra service status

Confirm that the upgraded node is up and normal in the ring.

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load        Tokens  Owns (effective)  Host ID                               Rack
UN  10.0.0.11  116.2 KiB   16      33.3%             8f4f6e2d-9f74-4f5a-a85f-43df5d4fcb21  rack1
UN  10.0.0.12  112.1 KiB   16      33.3%             14c69f94-47d1-4dc2-ae16-6b232a2cf8dd  rack1
UN  10.0.0.13  117.4 KiB   16      33.3%             d0b8b726-c680-4b42-bb57-3fa50db70e6a  rack1

The upgraded node should return to UN before moving to another node.
Related: How to check Apache Cassandra cluster status with nodetool

Check the Cassandra release through the running node.
```
$ cqlsh -e "SELECT release_version FROM system.local;"

 release_version
-----------------
           5.0.8

(1 rows)
```
If authentication or client TLS is enabled, run the same check with the operator cqlshrc, credentials file, and TLS options used for normal administration.
Related: How to connect to Apache Cassandra with cqlsh
Run the application read-only smoke test against the upgraded node or load-balanced service.
```
$ cqlsh 10.0.0.11 9042 -e "SELECT event_id FROM app_data.events WHERE event_id = 1001;"

 event_id
----------
     1001

(1 rows)
```
Replace the keyspace, table, key column, and value with a narrow query that proves the upgraded node can serve the application's normal read path without scanning a large table.
Repeat the drain, package upgrade, restart, and verification sequence on the remaining nodes.

Do not enable newer storage compatibility modes, run cluster-wide SSTable rewrites, or remove the rollback path until every node is stable on the target Cassandra release.
Rewrite older SSTables after the upgraded cluster has passed the rollback window.
```
$ nodetool upgradesstables --jobs 2 app_data events
```
nodetool upgradesstables rewrites SSTables that are not on the current version. Run it node by node during an I/O window, and omit the keyspace or table only when the whole local data set is the intended scope.

Check compaction pressure after the SSTable rewrite command.

$ nodetool compactionstats -H
concurrent compactors            2
pending tasks                    0
compactions completed            1
data compacted                   841 bytes
compactions aborted              0
compactions reduced              0
sstables dropped from compaction 0
15 minute rate                   0.05/minute
mean rate                        15.00/hour
compaction throughput (MiB/s)    64.0

Pending tasks should return to the expected level before the maintenance window is closed. Investigate Cassandra logs if compactions stall or the node leaves UN.
Related: How to view Apache Cassandra logs

Finalize 5.0 storage compatibility only after the binary upgrade is stable on every node.

Use separate rolling restarts for storage_compatibility_mode: UPGRADING and then storage_compatibility_mode: NONE when the release plan says rollback to 4.x is no longer required.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.