Healing a GlusterFS volume restores redundancy after a brick outage, network partition, or interrupted maintenance so divergent replicas converge on the same data.

On redundant volume types, the self-heal daemon (shd) reconciles entries that are out of sync between bricks by copying metadata and file contents as needed. The gluster volume heal command can force a heal scan, while gluster volume heal VOLNAME info commands expose the per-brick backlog for progress tracking.

Healing depends on healthy connectivity between peers and bricks, so offline bricks or unreachable nodes stall progress and keep entries pending. Split-brain situations are not automatically resolved by healing and must be handled separately before full convergence is possible. A forced heal scan can generate sustained disk and network activity, so scheduling during low-usage periods reduces client impact.

Steps to heal a GlusterFS volume:

  1. Confirm the volume is Started before triggering a heal.
    $ sudo gluster volume info volume1
    Volume Name: volume1
    Type: Replicate
    Status: Started
    Number of Bricks: 1 x 2 = 2
    ##### snipped #####

    The gluster volume heal command is supported only on replicate or disperse volume types.

  2. Confirm all processes show Online for the volume.
    $ sudo gluster volume status volume1
    Status of volume: volume1
    Gluster process                             TCP Port  RDMA Port  Online  Pid
    ------------------------------------------------------------------------------
    Brick node1:/srv/gluster/brick1              49152     0          Y       21435
    Brick node2:/srv/gluster/brick1              49152     0          Y       21987
    Self-heal Daemon on node1                    N/A       N/A        Y       21310
    Self-heal Daemon on node2                    N/A       N/A        Y       21876
    ##### snipped #####
  3. Start a self-heal scan for the volume.
    $ sudo gluster volume heal volume1
    volume heal: volume1: success: Heal operation started

    Use full to crawl the entire volume when a complete rescan is required: sudo gluster volume heal volume1 full.

    Healing can increase disk and network load while replicas are synchronized.

  4. Review a summary of pending heal entries.
    $ sudo gluster volume heal volume1 info summary
    Brick node1:/srv/gluster/brick1
    Status: Connected
    Number of entries: 12
    
    Brick node2:/srv/gluster/brick1
    Status: Connected
    Number of entries: 12

    Use sudo gluster volume heal volume1 statistics heal-count for a faster count when info summary is slow on very large volumes.

  5. List individual paths pending heal when troubleshooting specific files.
    $ sudo gluster volume heal volume1 info
    Brick node1:/srv/gluster/brick1
    /dir1/file-a.log
    /dir2/file-b.db
    
    Brick node2:/srv/gluster/brick1
    /dir1/file-a.log
    /dir2/file-b.db
  6. Confirm pending heal entries reach zero after synchronization completes.
    $ sudo gluster volume heal volume1 info summary
    Brick node1:/srv/gluster/brick1
    Status: Connected
    Number of entries: 0
    
    Brick node2:/srv/gluster/brick1
    Status: Connected
    Number of entries: 0

    Counts that never reach zero commonly indicate split-brain entries or a brick that is not Connected.