Failover testing proves that a Pacemaker cluster can keep services available when a node is drained for maintenance or becomes unreachable. Controlled failovers expose missing dependencies, fragile constraints, and slow resource start times before a real outage forces an unplanned cutover.
The pcs CLI talks to the cluster configuration and drives state changes such as putting a node into standby. When a node enters standby, Pacemaker recalculates placement, stops resources on that node, and starts them on other eligible nodes while Corosync maintains membership and quorum.
Node-level failovers can interrupt active sessions and can trigger fencing if quorum is lost or the cluster decides a node is unsafe. Run the test during a maintenance window, confirm remaining nodes can carry the workload, and prefer a single-resource move when only one service needs to be exercised.
Related: How to move a Pacemaker resource
Related: How to check Pacemaker cluster status
Steps to run a Pacemaker failover test with PCS:
- Confirm the cluster has quorum with no failed actions.
$ sudo pcs status Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum * 3 nodes configured ##### snipped #####
Look for partition with quorum and no failed actions before proceeding.
- List resources to record the current placement of the target service.
$ sudo pcs status resources * Resource Group: web-stack: * cluster_ip (ocf:heartbeat:IPaddr2): Started node-01 * web-service (systemd:nginx): Started node-01 - Put the hosting node into standby to drain its resources.
$ sudo pcs node standby node-01
Standby can restart services on other nodes and drop active sessions; loss of quorum can stop resources or trigger fencing depending on cluster policy.
- Verify resources relocated off the standby node.
$ sudo pcs status Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum ##### snipped ##### Node List: * Node node-01: standby * Online: [ node-02 node-03 ] Full List of Resources: * Resource Group: web-stack: * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02 * web-service (systemd:nginx): Started node-02 - Return the node to active service.
$ sudo pcs node unstandby node-01
- Confirm the cluster is healthy at the end of the failover test.
$ sudo pcs status ##### snipped ##### Node List: * Online: [ node-01 node-02 node-03 ] Full List of Resources: * Resource Group: web-stack: * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02 * web-service (systemd:nginx): Started node-02Resources may remain on the new node after unstandby due to stickiness and placement rules. If failures appear, clear them with pcs resource cleanup <resource> before re-testing.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
