Maintenance windows prevent automated recovery from fighting planned changes such as patching, reboots, and service restarts. In a Pacemaker cluster, that reduces surprise failovers and avoids recovery loops while work is in progress.
Clusters managed with pcs rely on Pacemaker to start, stop, monitor, and relocate resources based on node health and configured constraints. Setting the maintenance-mode cluster property pauses resource start/stop activity and stops recurring monitors for affected resources, while placing a node into standby removes it from scheduling and moves its resources to other eligible nodes.
Cluster-wide maintenance mode removes automated recovery until it is cleared, so an outage during the window can remain unresolved by the cluster. Maintenance mode overrides other administrative modes, so draining a node should complete before enabling maintenance-mode when resource migration is required. Maintenance mode disables Pacemaker resource management but does not inherently disable membership and fencing components, so interconnect or shared-storage work can still trigger fencing in some environments.
Steps to run a Pacemaker maintenance window with PCS:
- Confirm cluster status before the maintenance window.
$ sudo pcs status Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-03 (version 2.1.6-6fdc9deea29) - partition with quorum * Last updated: Wed Dec 31 11:02:24 2025 on node-01 * Last change: Wed Dec 31 09:50:24 2025 by root via cibadmin on node-01 * 3 nodes configured * 7 resource instances configured Node List: * Online: [ node-01 node-02 node-03 ] Full List of Resources: * Resource Group: web-stack: * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02 * web-service (systemd:nginx): Started node-02 * Clone Set: dummy-check-clone [dummy-check]: * Started: [ node-01 node-02 node-03 ] ##### snipped ##### - Place the target node into standby.
$ sudo pcs node standby node-01
- Verify cluster resources migrated off the standby node.
$ sudo pcs status ##### snipped ##### Node List: * Node node-01: standby * Online: [ node-02 node-03 ] Full List of Resources: * Resource Group: web-stack: * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02 * web-service (systemd:nginx): Started node-02 * Clone Set: dummy-check-clone [dummy-check]: * Started: [ node-02 node-03 ] * Stopped: [ node-01 ] * Clone Set: stateful-demo-clone [stateful-demo] (promotable): * Promoted: [ node-03 ] * Unpromoted: [ node-02 ] ##### snipped #####A resource can stop instead of moving when no other node is eligible to run it.
- Enable the maintenance-mode cluster property.
$ sudo pcs property set maintenance-mode=true
Maintenance mode pauses automated start/stop and monitoring, so failures are not automatically recovered until the property is cleared.
- Confirm the cluster is in maintenance mode.
$ sudo pcs status ##### snipped ##### *** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services Full List of Resources: * Resource Group: web-stack (maintenance): * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02 (maintenance) * web-service (systemd:nginx): Started node-02 (maintenance) * Clone Set: dummy-check-clone [dummy-check] (maintenance): * dummy-check (ocf:pacemaker:Dummy): Started node-03 (maintenance) * dummy-check (ocf:pacemaker:Dummy): Started node-02 (maintenance) ##### snipped ##### - Apply the maintenance changes on the standby node.
- Disable the maintenance-mode cluster property after maintenance is complete.
$ sudo pcs property set maintenance-mode=false
Pacemaker resumes monitoring and recovery actions after maintenance mode is disabled.
- Return the node to active service.
$ sudo pcs node unstandby node-01
Resources do not necessarily move back to the node after unstandby, depending on constraints and preferred locations.
- Verify the cluster returned to normal operation.
$ sudo pcs status Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-03 (version 2.1.6-6fdc9deea29) - partition with quorum * Last updated: Wed Dec 31 11:02:37 2025 on node-01 * Last change: Wed Dec 31 11:02:32 2025 by root via cibadmin on node-01 * 3 nodes configured * 7 resource instances configured Node List: * Online: [ node-01 node-02 node-03 ] Full List of Resources: * Resource Group: web-stack: * cluster_ip (ocf:heartbeat:IPaddr2): Started node-02 * web-service (systemd:nginx): Started node-02 * Clone Set: dummy-check-clone [dummy-check]: * Started: [ node-01 node-02 node-03 ] ##### snipped #####
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
