How to run a Pacemaker failover test with PCS

Failover testing proves that a Pacemaker cluster can keep services available when a node is drained for maintenance or becomes unreachable. Controlled failovers expose missing dependencies, fragile constraints, and slow resource start times before a real outage forces an unplanned cutover.

The pcs CLI talks to the cluster configuration and drives state changes such as putting a node into standby. When a node enters standby, Pacemaker recalculates placement, stops resources on that node, and starts them on other eligible nodes while Corosync maintains membership and quorum.

Node-level failovers can interrupt active sessions and can trigger fencing if quorum is lost or the cluster decides a node is unsafe. Run the test during a maintenance window, confirm remaining nodes can carry the workload, and prefer a single-resource move when only one service needs to be exercised.

Steps to run a Pacemaker failover test with PCS:

Confirm the cluster has quorum with no failed actions.

$ sudo pcs status
Cluster name: clustername
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum
  * 3 nodes configured
##### snipped #####

Look for partition with quorum and no failed actions before proceeding.

List resources to record the current placement of the target service.

$ sudo pcs status resources
  * Resource Group: web-stack:
    * cluster_ip (ocf:heartbeat:IPaddr2): Started node-01
    * web-service (systemd:nginx): Started node-01

Put the hosting node into standby to drain its resources.
```
$ sudo pcs node standby node-01
```
Standby can restart services on other nodes and drop active sessions; loss of quorum can stop resources or trigger fencing depending on cluster policy.

Related: How to put a Pacemaker node in standby mode

Verify resources relocated off the standby node and record the elapsed relocation time if the exercise is being compared with an RTO.

$ sudo pcs status
Cluster name: clustername
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum
##### snipped #####
Node List:
  * Node node-01: standby
  * Online: [ node-02 node-03 ]

Full List of Resources:
  * Resource Group: web-stack:
    * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02
    * web-service (systemd:nginx): Started node-02

Tool: HA Failover Budget Calculator

Return the node to active service.
```
$ sudo pcs node unstandby node-01
```

Confirm the cluster is healthy at the end of the failover test.

$ sudo pcs status
##### snipped #####
Node List:
  * Online: [ node-01 node-02 node-03 ]
Full List of Resources:
  * Resource Group: web-stack:
    * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02
    * web-service (systemd:nginx): Started node-02

Resources may remain on the new node after unstandby due to stickiness and placement rules. If failures appear, clear them with pcs resource cleanup <resource> before re-testing.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.