High availability for an NFS server keeps shared exports reachable during node reboots, patch cycles, and unexpected failures. Pairing the NFS workload with a floating IP (virtual IP) keeps the client mount target stable while the active node changes.
In a pcs-managed Pacemaker cluster, the NFS stack is modeled as a resource group: an OCF Filesystem resource mounts the shared export volume, a systemd resource starts the NFS daemon, and an OCF IPaddr2 resource assigns the floating IP. Grouping enforces start/stop ordering and colocation so the mount, daemon, and address move together as a unit.
NFS HA is active/passive and depends on shared storage with single-writer semantics for the export directory, such as a SAN LUN or a replicated block device presented to only one node at a time. The same /etc/exports entries must exist on every node, and the export filesystem must not be auto-mounted via /etc/fstab outside cluster control to avoid concurrent mounts and data corruption. Failover briefly interrupts client I/O and may require lock recovery, so short service interruptions should be expected during moves.
Steps to set up NFS server high availability with PCS:
- Confirm the cluster is online and has quorum.
$ sudo pcs status Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum * 3 nodes configured * 0 resource instances configured ##### snipped #####
- Identify the NFS server service unit name.
$ systemctl list-unit-files --type=service | grep -E '^(nfs-server|nfs-kernel-server)\.service' nfs-kernel-server.service alias - nfs-server.service disabled enabled
- Stop and disable the NFS server service on every cluster node.
$ sudo systemctl disable --now nfs-kernel-server.service Synchronizing state of nfs-kernel-server.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install disable nfs-kernel-server ##### snipped #####
Leaving the NFS service enabled outside cluster control can result in exports running on multiple nodes, which risks filesystem corruption on single-writer storage.
- Create the export mount point directory on every cluster node.
$ sudo mkdir -p /srv/nfs
The mount point must exist even when the export filesystem is not mounted.
- Create a filesystem resource for the shared export path.
$ sudo pcs resource create nfs_fs ocf:heartbeat:Filesystem device=/dev/loop10 directory=/srv/nfs fstype=xfs op monitor interval=20s
Use the shared block device and mount path for the cluster.
Do not mount /srv/nfs via /etc/fstab on boot when using a non-cluster filesystem such as xfs or ext4.
- Confirm the same /etc/exports entries exist on every node for the shared export path.
$ sudo awk '!/^\s*($|#)/ {print}' /etc/exports /srv/nfs 192.0.2.0/24(rw,sync,no_subtree_check,root_squash)The cluster moves the mount, service, and IP, but does not synchronize export definitions.
- Create a floating IP resource for the NFS endpoint.
$ sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=192.0.2.71 cidr_netmask=24 op monitor interval=30s
- Create the NFS server service resource.
$ sudo pcs resource create nfs_service systemd:nfs-kernel-server op monitor interval=30s
Use systemd:nfs-kernel-server when that unit is present.
- Group the filesystem, service, and IP resources.
$ sudo pcs resource group add nfs-stack nfs_fs nfs_service nfs_ip
- Verify the resource group placement.
$ sudo pcs status resources * Resource Group: nfs-stack: * nfs_fs (ocf:heartbeat:Filesystem): Started node-01 * nfs_service (systemd:nfs-kernel-server): Started node-01 * nfs_ip (ocf:heartbeat:IPaddr2): Started node-01 - Confirm the export filesystem is mounted on the node hosting the resource group.
$ df -h /srv/nfs Filesystem Size Used Avail Use% Mounted on /dev/loop10 336M 27M 310M 8% /srv/nfs
- Confirm the exports are reachable through the floating IP.
$ showmount --exports 192.0.2.71 Export list for 192.0.2.71: /srv/nfs 192.0.2.0/24
A full client mount test proves end-to-end access when rpc.mountd is filtered by firewall rules.
- Run a failover test after the group is running.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
