Healing a GlusterFS volume restores redundancy after a brick outage, network partition, or interrupted maintenance so divergent replicas converge on the same data.
On redundant volume types, the self-heal daemon (shd) reconciles entries that are out of sync between bricks by copying metadata and file contents as needed. The gluster volume heal command can force a heal scan, while gluster volume heal VOLNAME info commands expose the per-brick backlog for progress tracking.
Healing depends on healthy connectivity between peers and bricks, so offline bricks or unreachable nodes stall progress and keep entries pending. Split-brain situations are not automatically resolved by healing and must be handled separately before full convergence is possible. A forced heal scan can generate sustained disk and network activity, so scheduling during low-usage periods reduces client impact.
Steps to heal a GlusterFS volume:
- Confirm the volume is Started before triggering a heal.
$ sudo gluster volume info volume1 Volume Name: volume1 Type: Replicate Status: Started Number of Bricks: 1 x 2 = 2 ##### snipped #####
The gluster volume heal command is supported only on replicate or disperse volume types.
- Confirm all processes show Online for the volume.
$ sudo gluster volume status volume1 Status of volume: volume1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick node1:/srv/gluster/brick1 49152 0 Y 21435 Brick node2:/srv/gluster/brick1 49152 0 Y 21987 Self-heal Daemon on node1 N/A N/A Y 21310 Self-heal Daemon on node2 N/A N/A Y 21876 ##### snipped #####
- Start a self-heal scan for the volume.
$ sudo gluster volume heal volume1 volume heal: volume1: success: Heal operation started
Use full to crawl the entire volume when a complete rescan is required: sudo gluster volume heal volume1 full.
Healing can increase disk and network load while replicas are synchronized.
- Review a summary of pending heal entries.
$ sudo gluster volume heal volume1 info summary Brick node1:/srv/gluster/brick1 Status: Connected Number of entries: 12 Brick node2:/srv/gluster/brick1 Status: Connected Number of entries: 12
Use sudo gluster volume heal volume1 statistics heal-count for a faster count when info summary is slow on very large volumes.
- List individual paths pending heal when troubleshooting specific files.
$ sudo gluster volume heal volume1 info Brick node1:/srv/gluster/brick1 /dir1/file-a.log /dir2/file-b.db Brick node2:/srv/gluster/brick1 /dir1/file-a.log /dir2/file-b.db
- Confirm pending heal entries reach zero after synchronization completes.
$ sudo gluster volume heal volume1 info summary Brick node1:/srv/gluster/brick1 Status: Connected Number of entries: 0 Brick node2:/srv/gluster/brick1 Status: Connected Number of entries: 0
Counts that never reach zero commonly indicate split-brain entries or a brick that is not Connected.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
