A GlusterFS cluster can lose redundancy before clients see hard failures. Checking peer connectivity, brick process state, heal backlog, and replication status gives operators a fast view of whether a volume is still safe to serve writes.

Run health checks from a node that belongs to the trusted storage pool. The gluster CLI reports pool membership, volume process state, heal queues, rebalance progress, and geo-replication workers through the management daemon, so a disconnected local daemon can hide cluster state even when brick data still exists on disk.

Health monitoring should inspect state without changing the cluster. Treat disconnected peers, bricks with Online set to N, nonzero split-brain entries, rebalance failures, faulty geo-replication workers, and recent warning or error logs as follow-up signals for the matching recovery guide or incident runbook.

Steps to monitor GlusterFS health:

  1. Confirm the local GlusterFS management service is running.
    $ systemctl is-active glusterd
    active

    Some distributions use glusterfs-server.service instead of glusterd.service. Use the installed unit name when checking service state.
    Related: How to manage the GlusterFS service with systemctl

  2. Check peer connectivity across the trusted storage pool.
    $ sudo gluster peer status
    Number of Peers: 2
    
    Hostname: node2
    Uuid: 6770f88c-9ec5-4cf8-b9f5-658fa17b6bdc
    State: Peer in Cluster (Connected)
    
    Hostname: node3
    Uuid: 5a3c65f3-1b4d-4d6e-93d4-4c24f0b6b5bf
    State: Peer in Cluster (Connected)

    Peer in Cluster (Connected) means the node is reachable and participating in the trusted pool.

    A disconnected peer can reduce redundancy, block management operations, or leave a replica set unable to heal.

  3. Check brick and daemon state for the volume.
    $ sudo gluster volume status volume1
    Status of volume: volume1
    Gluster process                             TCP Port  RDMA Port  Online  Pid
    ------------------------------------------------------------------------------
    Brick node1:/srv/gluster/brick1             49152     0          Y       2143
    Brick node2:/srv/gluster/brick1             49153     0          Y       2311
    Self-heal Daemon on node1                   N/A       N/A        Y       2202
    Self-heal Daemon on node2                   N/A       N/A        Y       2370

    Replace volume1 with the target volume name. Online should be Y for each expected brick and support daemon.
    Related: How to check GlusterFS volume status

  4. Check free space on each brick filesystem.
    $ df -h /srv/gluster/brick1
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1       1.8T  1.1T  640G  64% /srv/gluster/brick1
  5. Check inode usage on each brick filesystem.
    $ df -i /srv/gluster/brick1
    Filesystem       Inodes  IUsed     IFree IUse% Mounted on
    /dev/sdb1     122093568 512340 121581228    1% /srv/gluster/brick1

    A brick filesystem that reaches 100% space or inode use can block client writes and prevent heal or rebalance work from completing.

  6. Check the self-heal backlog for the volume.
    $ sudo gluster volume heal volume1 info summary
    Brick node1:/srv/gluster/brick1
    Status: Connected
    Number of entries: 0
    
    Brick node2:/srv/gluster/brick1
    Status: Connected
    Number of entries: 0

    Replica and disperse volumes should trend back toward 0 entries after the cluster catches up. A count that stays nonzero needs a heal review.
    Related: How to heal a GlusterFS volume

  7. Check split-brain entries on replica or disperse volumes.
    $ sudo gluster volume heal volume1 info split-brain
    Brick node1:/srv/gluster/brick1
    Number of entries: 0
    
    Brick node2:/srv/gluster/brick1
    Number of entries: 0

    Any split-brain entry means file versions diverged across bricks and should not be treated as a routine backlog.
    Related: How to check for split-brain in GlusterFS

  8. Check rebalance status after adding, removing, or replacing bricks.
    $ sudo gluster volume rebalance volume1 status
                                       Node  Rebalanced-files      size       scanned   failures   status
                                  ---------  ----------------  --------  ----------  ---------  ---------
                                       node1               182   12.3GB         182          0  completed
                                       node2               181   12.2GB         181          0  completed

    failures should stay at 0 and status should reach completed before the brick-change work is considered finished.
    Related: How to rebalance a GlusterFS volume

  9. Check geo-replication workers when secondary replication is enabled.
    $ sudo gluster volume geo-replication status
    PRIMARY NODE    PRIMARY VOL    PRIMARY BRICK          SECONDARY USER    SECONDARY                         SECONDARY NODE    STATUS    CRAWL STATUS       LAST_SYNCED
    node1           volume1        /srv/gluster/brick1    geoaccount        node4.example.net::volume1-dr     node4             Active    Changelog Crawl    2026-06-16 08:41:22
    node2           volume1        /srv/gluster/brick1    geoaccount        node4.example.net::volume1-dr     node4             Active    Changelog Crawl    2026-06-16 08:41:20

    Active workers with recent LAST_SYNCED values indicate the secondary is receiving changes.
    Related: How to check GlusterFS geo-replication status

  10. Review recent glusterd warning and error logs.
    $ sudo journalctl --unit=glusterd.service --priority=warning..alert --since "1 hour ago" --no-pager
    -- No entries --

    GlusterFS file logs are commonly under /var/log/glusterfs, including /var/log/glusterfs/glusterd.log, /var/log/glusterfs/glustershd.log, and brick logs under /var/log/glusterfs/bricks.
    Related: How to check GlusterFS logs