Keeping a single Redis cache endpoint available during node failures prevents application outages and avoids reconfiguring clients after maintenance or unexpected reboots.

A Pacemaker and Corosync cluster managed with pcs treats high availability as resources that start, stop, and move between nodes. The cache endpoint is typically modeled as a floating IP address (VIP) plus a systemd resource for Redis, grouped so the VIP comes up first and the service starts on the same node.

This setup is active-passive and does not provide automatic data replication by itself. Persistence (RDB or AOF) and any replication strategy must be configured separately, and the Redis listener (bind/protected-mode), firewall rules, and SELinux policy must allow client connections to the VIP on port 6379.

Steps to set up Redis high availability with PCS:

  1. Confirm the cluster is online with quorum.
    $ sudo pcs status
    Cluster name: clustername
    Cluster Summary:
      * Stack: corosync (Pacemaker is running)
      * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum
      * 3 nodes configured
      * 0 resource instances configured
    ##### snipped #####
  2. Identify the Redis service unit name on all nodes.
    $ systemctl list-unit-files --type=service | grep -E '^(redis|redis-server)\.service'
    redis-server.service                         enabled         enabled
    redis.service                                alias           -
  3. Disable the detected Redis service unit on all nodes so it does not start outside cluster control.
    $ sudo systemctl disable --now redis-server
    Synchronizing state of redis-server.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
    Executing: /usr/lib/systemd/systemd-sysv-install disable redis-server
    Removed "/etc/systemd/system/multi-user.target.wants/redis-server.service".
    Removed "/etc/systemd/system/redis.service".
    ##### snipped #####

    The --now flag stops the service immediately.

    Leaving Redis enabled can start multiple independent instances at boot and cause unexpected split-brain writes outside the VIP.

  4. Create a floating IP resource for the cache endpoint.
    $ sudo pcs resource create redis_ip ocf:heartbeat:IPaddr2 ip=192.0.2.63 cidr_netmask=24 op monitor interval=30s

    Using an IP already assigned on the network causes address conflicts and intermittent outages.

  5. Create the Redis service resource using the detected unit name.
    $ sudo pcs resource create redis_service systemd:redis-server op monitor interval=30s

    Use systemd:redis when that unit is present.

  6. Group the VIP and service resources so the VIP starts before Redis.
    $ sudo pcs resource group add redis-stack redis_ip redis_service

    The resource group keeps the VIP and Redis on the same node during failover.

  7. Verify the resource group placement.
    $ sudo pcs status resources
      * Resource Group: redis-stack:
        * redis_ip	(ocf:heartbeat:IPaddr2):	 Started node-01
        * redis_service	(systemd:redis-server):	 Started node-01
  8. Confirm Redis is reachable through the VIP.
    $ redis-cli -h 192.0.2.63 ping
    PONG

    Run from an application host or from the active node when redis-cli is available.

  9. Move the resource group to another node to validate failover.
    $ sudo pcs resource move redis-stack node-02

    The move command creates a temporary location constraint until cleared.

  10. Verify the VIP and service are started on the target node after the move.
    $ sudo pcs status resources
      * Resource Group: redis-stack:
        * redis_ip	(ocf:heartbeat:IPaddr2):	 Started node-02
        * redis_service	(systemd:redis-server):	 Started node-02
  11. Clear the temporary move constraint to restore normal placement decisions.
    $ sudo pcs resource clear redis-stack

    Forgetting to clear the constraint can pin the VIP to a single node and prevent automatic recovery.