How to heal a GlusterFS volume

Healing a GlusterFS volume restores redundancy after a brick outage, network partition, or interrupted maintenance so divergent replicas converge on the same data.

On redundant volume types, the self-heal daemon (shd) reconciles entries that are out of sync between bricks by copying metadata and file contents as needed. The gluster volume heal command can force a heal scan, while gluster volume heal VOLNAME info commands expose the per-brick backlog for progress tracking.

Healing depends on healthy connectivity between peers and bricks, so offline bricks or unreachable nodes stall progress and keep entries pending. Split-brain situations are not automatically resolved by healing and must be handled separately before full convergence is possible. A forced heal scan can generate sustained disk and network activity, so scheduling during low-usage periods reduces client impact.

Steps to heal a GlusterFS volume:

Confirm the volume is Started before triggering a heal.

$ sudo gluster volume info volume1
Volume Name: volume1
Type: Replicate
Status: Started
Number of Bricks: 1 x 2 = 2
##### snipped #####

The gluster volume heal command is supported only on replicate or disperse volume types.

Confirm all processes show Online for the volume.

$ sudo gluster volume status volume1
Status of volume: volume1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/srv/gluster/brick1              49152     0          Y       21435
Brick node2:/srv/gluster/brick1              49152     0          Y       21987
Self-heal Daemon on node1                    N/A       N/A        Y       21310
Self-heal Daemon on node2                    N/A       N/A        Y       21876
##### snipped #####

Start a self-heal scan for the volume.
```
$ sudo gluster volume heal volume1
volume heal: volume1: success: Heal operation started
```
Use full to crawl the entire volume when a complete rescan is required: sudo gluster volume heal volume1 full.

Healing can increase disk and network load while replicas are synchronized.

Review a summary of pending heal entries.

$ sudo gluster volume heal volume1 info summary
Brick node1:/srv/gluster/brick1
Status: Connected
Number of entries: 12

Brick node2:/srv/gluster/brick1
Status: Connected
Number of entries: 12

Use sudo gluster volume heal volume1 statistics heal-count for a faster count when info summary is slow on very large volumes.

List individual paths pending heal when troubleshooting specific files.

$ sudo gluster volume heal volume1 info
Brick node1:/srv/gluster/brick1
/dir1/file-a.log
/dir2/file-b.db

Brick node2:/srv/gluster/brick1
/dir1/file-a.log
/dir2/file-b.db

Confirm pending heal entries reach zero after synchronization completes.

$ sudo gluster volume heal volume1 info summary
Brick node1:/srv/gluster/brick1
Status: Connected
Number of entries: 0

Brick node2:/srv/gluster/brick1
Status: Connected
Number of entries: 0

Counts that never reach zero commonly indicate split-brain entries or a brick that is not Connected.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.