How to set up NFS server high availability with PCS

High availability for an NFS server keeps shared exports reachable during node reboots, patch cycles, and unexpected failures. Pairing the NFS workload with a floating IP (virtual IP) keeps the client mount target stable while the active node changes.

In a pcs-managed Pacemaker cluster, the NFS stack is modeled as a resource group: an OCF Filesystem resource mounts the shared export volume, a systemd resource starts the NFS daemon, and an OCF IPaddr2 resource assigns the floating IP. Grouping enforces start/stop ordering and colocation so the mount, daemon, and address move together as a unit.

NFS HA is active/passive and depends on shared storage with single-writer semantics for the export directory, such as a SAN LUN or a replicated block device presented to only one node at a time. The same /etc/exports entries must exist on every node, and the export filesystem must not be auto-mounted via /etc/fstab outside cluster control to avoid concurrent mounts and data corruption. Failover briefly interrupts client I/O and may require lock recovery, so short service interruptions should be expected during moves.

Steps to set up NFS server high availability with PCS:

  1. Confirm the cluster is online and has quorum.
    $ sudo pcs status
    Cluster name: clustername
    Cluster Summary:
      * Stack: corosync (Pacemaker is running)
      * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum
      * 3 nodes configured
      * 0 resource instances configured
    ##### snipped #####
  2. Identify the NFS server service unit name.
    $ systemctl list-unit-files --type=service | grep -E '^(nfs-server|nfs-kernel-server)\.service'
    nfs-kernel-server.service                    alias           -
    nfs-server.service                           disabled        enabled
  3. Stop and disable the NFS server service on every cluster node.
    $ sudo systemctl disable --now nfs-kernel-server.service
    Synchronizing state of nfs-kernel-server.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
    Executing: /usr/lib/systemd/systemd-sysv-install disable nfs-kernel-server
    ##### snipped #####

    Leaving the NFS service enabled outside cluster control can result in exports running on multiple nodes, which risks filesystem corruption on single-writer storage.

  4. Create the export mount point directory on every cluster node.
    $ sudo mkdir -p /srv/nfs

    The mount point must exist even when the export filesystem is not mounted.

  5. Create a filesystem resource for the shared export path.
    $ sudo pcs resource create nfs_fs ocf:heartbeat:Filesystem device=/dev/loop10 directory=/srv/nfs fstype=xfs op monitor interval=20s

    Use the shared block device and mount path for the cluster.

    Do not mount /srv/nfs via /etc/fstab on boot when using a non-cluster filesystem such as xfs or ext4.

  6. Confirm the same /etc/exports entries exist on every node for the shared export path.
    $ sudo awk '!/^\s*($|#)/ {print}' /etc/exports
    /srv/nfs 192.0.2.0/24(rw,sync,no_subtree_check,root_squash)

    The cluster moves the mount, service, and IP, but does not synchronize export definitions.

  7. Create a floating IP resource for the NFS endpoint.
    $ sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=192.0.2.71 cidr_netmask=24 op monitor interval=30s
  8. Create the NFS server service resource.
    $ sudo pcs resource create nfs_service systemd:nfs-kernel-server op monitor interval=30s

    Use systemd:nfs-kernel-server when that unit is present.

  9. Group the filesystem, service, and IP resources.
    $ sudo pcs resource group add nfs-stack nfs_fs nfs_service nfs_ip
  10. Verify the resource group placement.
    $ sudo pcs status resources
      * Resource Group: nfs-stack:
        * nfs_fs	(ocf:heartbeat:Filesystem):	 Started node-01
        * nfs_service	(systemd:nfs-kernel-server):	 Started node-01
        * nfs_ip	(ocf:heartbeat:IPaddr2):	 Started node-01
  11. Confirm the export filesystem is mounted on the node hosting the resource group.
    $ df -h /srv/nfs
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/loop10     336M   27M  310M   8% /srv/nfs
  12. Confirm the exports are reachable through the floating IP.
    $ showmount --exports 192.0.2.71
    Export list for 192.0.2.71:
    /srv/nfs 192.0.2.0/24

    A full client mount test proves end-to-end access when rpc.mountd is filtered by firewall rules.

  13. Run a failover test after the group is running.