An Apache Cassandra upgrade is a rolling maintenance job, not just a package replacement. Each node must leave the write path cleanly, start on the target release, rejoin as UN, and pass a client-facing check before the next node is touched.
The safest flow keeps one node under maintenance at a time. Use nodetool status to confirm the ring before the change, take a named snapshot, drain the local node, stop the service, replace the Cassandra package or binary, start the service, and verify the node from both JMX and CQL.
Major-version upgrades need a rollback boundary before newer on-disk formats are introduced. For a 4.x to 5.0 upgrade, keep storage_compatibility_mode at CASSANDRA_4 during the first binary roll, then plan separate rolling restarts for UPGRADING and NONE after the cluster is stable on the new release.
Steps to upgrade Apache Cassandra node by node:
- Confirm that the target release, Java runtime, driver versions, and package repository are staged before the maintenance window.
Apache Cassandra 5.0 binary releases run on Java 11 or Java 17. Check application drivers and release notes before the first production node is drained.
- Check the cluster from the node that will be upgraded.
$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.0.0.11 114.7 KiB 16 33.3% 8f4f6e2d-9f74-4f5a-a85f-43df5d4fcb21 rack1 UN 10.0.0.12 112.1 KiB 16 33.3% 14c69f94-47d1-4dc2-ae16-6b232a2cf8dd rack1 UN 10.0.0.13 117.4 KiB 16 33.3% d0b8b726-c680-4b42-bb57-3fa50db70e6a rack1
Delay the upgrade if any expected serving node is down, joining, leaving, moving, overloaded, or already under repair or heavy compaction pressure.
- Take a named snapshot on the node being upgraded.
$ nodetool snapshot --tag before-5-0-upgrade Requested creating snapshot(s) for [all keyspaces] with snapshot name [before-5-0-upgrade] and options {skipFlush=false} Snapshot directory: before-5-0-upgradeCopy or catalog the snapshot according to the rollback plan before the live data files are compacted or rewritten.
- Drain the local node immediately before stopping Cassandra.
$ nodetool drain
drain flushes memtables and stops accepting writes on the node. Do not leave the drained node in service for normal traffic.
- Stop the Cassandra service on the drained node.
$ sudo systemctl stop cassandra
Use the service manager that owns the node's Cassandra process. Package installs on systemd-based Linux commonly use the cassandra unit.
- Upgrade the Cassandra package on the stopped node.
$ sudo apt-get install --only-upgrade cassandra
Use the package command for the repository already staged on that host, such as sudo dnf upgrade cassandra on RPM-based systems. Keep repository setup and Java changes out of the live drain window when possible.
- Merge required settings into the upgraded configuration file.
$ sudoedit /etc/cassandra/cassandra.yaml
Keep the existing cluster_name, token settings, snitch, seed list, data paths, authentication, TLS, and topology settings unless the upgrade plan explicitly changes them. For a 4.x to 5.0 roll, keep storage_compatibility_mode: CASSANDRA_4 for the first binary upgrade.
- Start Cassandra on the upgraded node.
$ sudo systemctl start cassandra
- Check that the service manager sees Cassandra as active.
$ sudo systemctl is-active cassandra active
An active service only proves that the process is running. The node still needs to rejoin the ring and accept CQL traffic before the next node is upgraded.
Related: How to check Apache Cassandra service status - Confirm that the upgraded node is up and normal in the ring.
$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.0.0.11 116.2 KiB 16 33.3% 8f4f6e2d-9f74-4f5a-a85f-43df5d4fcb21 rack1 UN 10.0.0.12 112.1 KiB 16 33.3% 14c69f94-47d1-4dc2-ae16-6b232a2cf8dd rack1 UN 10.0.0.13 117.4 KiB 16 33.3% d0b8b726-c680-4b42-bb57-3fa50db70e6a rack1
The upgraded node should return to UN before moving to another node.
Related: How to check Apache Cassandra cluster status with nodetool - Check the Cassandra release through the running node.
$ cqlsh -e "SELECT release_version FROM system.local;" release_version ----------------- 5.0.8 (1 rows)If authentication or client TLS is enabled, run the same check with the operator cqlshrc, credentials file, and TLS options used for normal administration.
Related: How to connect to Apache Cassandra with cqlsh - Run the application read-only smoke test against the upgraded node or load-balanced service.
$ cqlsh 10.0.0.11 9042 -e "SELECT event_id FROM app_data.events WHERE event_id = 1001;" event_id ---------- 1001 (1 rows)Replace the keyspace, table, key column, and value with a narrow query that proves the upgraded node can serve the application's normal read path without scanning a large table.
- Repeat the drain, package upgrade, restart, and verification sequence on the remaining nodes.
Do not enable newer storage compatibility modes, run cluster-wide SSTable rewrites, or remove the rollback path until every node is stable on the target Cassandra release.
- Rewrite older SSTables after the upgraded cluster has passed the rollback window.
$ nodetool upgradesstables --jobs 2 app_data events
nodetool upgradesstables rewrites SSTables that are not on the current version. Run it node by node during an I/O window, and omit the keyspace or table only when the whole local data set is the intended scope.
- Check compaction pressure after the SSTable rewrite command.
$ nodetool compactionstats -H concurrent compactors 2 pending tasks 0 compactions completed 1 data compacted 841 bytes compactions aborted 0 compactions reduced 0 sstables dropped from compaction 0 15 minute rate 0.05/minute mean rate 15.00/hour compaction throughput (MiB/s) 64.0
Pending tasks should return to the expected level before the maintenance window is closed. Investigate Cassandra logs if compactions stall or the node leaves UN.
Related: How to view Apache Cassandra logs - Finalize 5.0 storage compatibility only after the binary upgrade is stable on every node.
Use separate rolling restarts for storage_compatibility_mode: UPGRADING and then storage_compatibility_mode: NONE when the release plan says rollback to 4.x is no longer required.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.