Reliable DNS keeps applications reachable and avoids hard-to-diagnose outages when a single name server node fails. A high availability layout pairs BIND with a floating service IP so clients keep using the same DNS endpoint during maintenance and failover.
A pcs-managed Pacemaker cluster typically represents the floating IP as an OCF resource (ocf:heartbeat:IPaddr2) and the DNS daemon as a systemd resource (systemd:named or systemd:bind9). Grouping the VIP with the BIND service keeps them colocated and enforces ordered startup so the VIP is available before DNS begins answering.
Zone files, TSIG keys, and any dynamic update journals must remain consistent across nodes, or failover can serve stale responses from the new active server. Native service startup should be disabled so Pacemaker is the only component starting BIND, and a planned failover test should confirm the VIP and DNS responses remain healthy after a move.
Steps to set up BIND DNS high availability with PCS:
- Confirm the cluster is online and has quorum.
$ sudo pcs status Cluster name: clustername Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node-01 (version 2.1.6-6fdc9deea29) - partition with quorum * Last updated: Thu Jan 1 04:28:39 2026 on node-01 * Last change: Thu Jan 1 04:28:37 2026 by root via cibadmin on node-01 * 3 nodes configured * 0 resource instances configured Node List: * Online: [ node-01 node-02 node-03 ] Full List of Resources: * No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Quorum and fencing reduce the risk of two nodes serving the same VIP after a split-brain.
- Identify the BIND DNS service unit name.
$ systemctl list-unit-files --type=service | grep -E '^(named|bind9)\.service' named.service disabled enabled
Use the detected unit name when creating the systemd:... resource.
- Validate the BIND configuration with zone loading on each node before enabling cluster control.
$ sudo named-checkconf -z zone example.net/IN: loaded serial 2026010101 zone localhost/IN: loaded serial 2 zone 127.in-addr.arpa/IN: loaded serial 1 zone 0.in-addr.arpa/IN: loaded serial 1 zone 255.in-addr.arpa/IN: loaded serial 1
No output indicates the configuration and referenced zones loaded without errors.
- Disable the DNS service for native startup so Pacemaker controls it.
$ sudo systemctl disable --now named Synchronizing state of named.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install disable named Synchronizing state of named.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install disable named Synchronizing state of named.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install disable named
Replace named with bind9 when that is the detected unit.
- Create a floating IP resource for the DNS endpoint.
$ sudo pcs resource create bind_ip ocf:heartbeat:IPaddr2 ip=192.0.2.65 cidr_netmask=24 op monitor interval=30s
The IP address must be unused and reachable from DNS clients on that subnet.
- Create the BIND DNS service resource.
$ sudo pcs resource create bind_service systemd:named op monitor interval=30s
Use systemd:bind9 when that unit is present.
- Group the IP and BIND DNS resources.
$ sudo pcs resource group add bind-stack bind_ip bind_service
- Verify the resource group placement.
$ sudo pcs status resources * Resource Group: bind-stack: * bind_ip (ocf:heartbeat:IPaddr2): Started node-01 * bind_service (systemd:named): Started node-01 - Query the floating IP with dig to confirm the DNS service answers requests.
$ dig @192.0.2.65 localhost A +time=2 +tries=1 ; <<>> DiG 9.18.39-0ubuntu0.24.04.2-Ubuntu <<>> @192.0.2.65 localhost A +time=2 +tries=1 ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52533 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 087807b18bbbdc2e010000006955f802bdac77ae18c71527 (good) ;; QUESTION SECTION: ;localhost. IN A ;; ANSWER SECTION: localhost. 604800 IN A 127.0.0.1 ;; Query time: 0 msec ;; SERVER: 192.0.2.65#53(192.0.2.65) (UDP) ;; WHEN: Thu Jan 01 04:28:50 UTC 2026 ;; MSG SIZE rcvd: 82
A successful answer confirms the VIP is responding to DNS requests.
- Move the resource group to the other node for a planned failover test.
$ sudo pcs resource move bind-stack node-02
- Confirm the resource group is started on the target node.
$ sudo pcs status resources * Resource Group: bind-stack: * bind_ip (ocf:heartbeat:IPaddr2): Started node-02 * bind_service (systemd:named): Started node-02 - Re-run a DNS query against the floating IP after the move.
$ dig @192.0.2.65 localhost A +time=2 +tries=1 ; <<>> DiG 9.18.39-0ubuntu0.24.04.2-Ubuntu <<>> @192.0.2.65 localhost A +time=2 +tries=1 ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59905 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 241a0002baedf694010000006955f811c1951e211a4b79a4 (good) ;; QUESTION SECTION: ;localhost. IN A ;; ANSWER SECTION: localhost. 604800 IN A 127.0.0.1 ;; Query time: 1 msec ;; SERVER: 192.0.2.65#53(192.0.2.65) (UDP) ;; WHEN: Thu Jan 01 04:29:05 UTC 2026 ;; MSG SIZE rcvd: 82
- Clear the temporary move constraint after the test completes.
$ sudo pcs resource clear bind-stack
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
