Active-active high availability runs the same service on multiple nodes so capacity scales and a single node failure becomes a partial slowdown instead of a full outage. It reduces failover impact and makes maintenance less disruptive because traffic can stay on surviving instances. The trade-off is that concurrency bugs and split-brain are extremely efficient at ruining a day.
In a Pacemaker cluster managed through pcs, active-active is usually modeled as a cloned resource so the scheduler can start and monitor parallel instances across nodes. For replicated state, a promotable resource adds roles (Promoted/Unpromoted) so the cluster can enforce writer/reader behavior while still handling failover. Constraints, resource meta attributes, and monitoring operations determine where instances run and how the cluster reacts to faults.
Traffic distribution, client affinity, and data consistency are not solved by Pacemaker alone. A single floating VIP is inherently active-passive, so active-active requires multiple front-end endpoints (node IPs behind a load balancer, anycast, or a DNS strategy), and any shared state (files, databases, caches) must be protected with shared storage, replication, or an application-native clustering model. Cluster safety features such as quorum and STONITH fencing are critical for preventing multiple writers after a partition.
Related: How to create a Pacemaker cluster
Active-active high availability checklist for Pacemaker:
- Confirm the cluster is healthy with quorum before changing resource behavior.
$ pcs status --full Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-01 (1) (version 2.1.6-6fdc9deea29) - partition with quorum * Last updated: Wed Dec 31 23:17:25 2025 on node-01 * Last change: Wed Dec 31 12:07:51 2025 by root via cibadmin on node-01 * 3 nodes configured * 8 resource instances configured ##### snipped ##### Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Active-active changes without quorum and fencing can create split-brain and corrupt shared state.
- List services that are safe to run on multiple nodes at the same time.
Stateless frontends (HTTP APIs, reverse proxies, queue consumers) are typical active-active candidates.
- Map each candidate service to its state requirements (storage, sessions, locks, leader election) before enabling parallel starts.
Multiple instances writing to the same data without a concurrency model is a common cause of divergent state and corruption.
- Plan how client traffic will be distributed across nodes.
Health checks should represent real readiness, and affinity should be planned when session state cannot be shared.
- Choose between cloned resources and promotable resources for each service.
Cloned resources fit parallel stateless instances, while promotable resources fit replicated services that still require controlled writer roles.
- Define clone limits and scheduler behavior for parallel services.
Common guardrails include clone-node-max=1 to prevent double-starts on one node and clone-max to cap total instances.
- Tune monitoring and recovery defaults so failures are handled consistently.
Monitor intervals and timeouts should match application startup time and load balancer failover timing to reduce flapping.
- Put a node into standby to simulate a controlled removal during maintenance.
$ pcs node standby node-02
Return the node after testing with pcs node unstandby node-02.
- Confirm resources relocate away from the standby node.
$ pcs status --full Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-01 (1) (version 2.1.6-6fdc9deea29) - partition with quorum ##### snipped ##### Node List: * Node node-01 (1): online, feature set 3.17.4 * Node node-02 (2): standby, feature set 3.17.4 * Node node-03 (3): online, feature set 3.17.4 ##### snipped #####
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
