Monitoring GlusterFS health keeps distributed volumes available and catches early warning signs such as disconnected peers, offline bricks, or a growing heal backlog before client workloads start timing out or returning I/O errors.
A GlusterFS cluster forms a trusted pool of peers and serves data from brick directories grouped into volumes. Health signals are exposed through the gluster CLI by checking peer connectivity, brick process state, and background activity such as self-heal and rebalance.
Checks differ by volume type and features in use: replica and disperse volumes depend heavily on heal and split-brain state, while distributed layouts focus on brick availability and capacity. Treat persistent non-zero heal entries, any split-brain listings, repeated errors in /var/log/glusterfs, and unhealthy geo-replication sessions as incidents rather than “noise”.
Related: How to improve GlusterFS security
Related: How to improve GlusterFS performance
GlusterFS monitoring checklist:
- Check peer connectivity across the trusted pool.
$ sudo gluster peer status Number of Peers: 2 Hostname: node2 Uuid: 6770f88c-9ec5-4cf8-b9f5-658fa17b6bdc State: Peer in Cluster (Connected) Hostname: node3 Uuid: 5a3c65f3-1b4d-4d6e-93d4-4c24f0b6b5bf State: Peer in Cluster (Connected)
Peer in Cluster (Connected) indicates the peer is reachable and participating in the cluster.
- Review volume status for brick health.
$ sudo gluster volume status volume1 Status of volume: volume1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick node1:/srv/gluster/brick1 49152 0 Y 2143 Brick node2:/srv/gluster/brick1 49152 0 Y 2311 Self-heal Daemon on node1 N/A N/A Y 2202 Self-heal Daemon on node2 N/A N/A Y 2370
Replace volume1 with the target volume name, and treat any Online value of N as a service-impacting fault.
Related: How to check GlusterFS volume status
- Check brick filesystems for free space and inode usage.
$ df -h /srv/gluster/brick1 Filesystem Size Used Avail Use% Mounted on /dev/sdb1 1.8T 1.1T 640G 64% /srv/gluster/brick1 $ df -i /srv/gluster/brick1 Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 122093568 512340 121581228 1% /srv/gluster/brick1
Bricks that reach 100% space or inodes can trigger client write failures and may block heal or rebalance progress.
- Inspect heal activity for split-brain indicators.
$ sudo gluster volume heal volume1 info summary Brick node1:/srv/gluster/brick1 Status: Connected Number of entries: 0 Brick node2:/srv/gluster/brick1 Status: Connected Number of entries: 0 $ sudo gluster volume heal volume1 info split-brain Brick node1:/srv/gluster/brick1 Number of entries: 0 Brick node2:/srv/gluster/brick1 Number of entries: 0
Heal checks are most relevant for replica and disperse volumes; a sustained non-zero count usually means the cluster is still converging or is repeatedly failing to heal.
Any split-brain entry indicates diverged file versions across bricks, and leaving it unresolved risks serving inconsistent data to clients.
Related: How to heal a GlusterFS volume
Related: How to check for split-brain in GlusterFS - Track rebalance activity after brick changes.
$ sudo gluster volume rebalance volume1 status Node Rebalanced-files size scanned failures status --------- ---------------- --------- ---------- --------- --------- node1 0 0B 0 0 completed node2 0 0B 0 0 completedRebalance is common after adding or removing bricks on distributed layouts, and long runtimes usually correlate with the amount of data to migrate.
Related: How to rebalance a GlusterFS volume
- Review GlusterFS logs for errors and warnings.
$ sudo tail -n 20 /var/log/glusterfs/glusterd.log [2025-05-13 10:31:08.912345 +0000] I [MSGID: 106487] [glusterd.c:1960:glusterd_init] 0-management: Glusterd started successfully [2025-05-13 10:33:41.104882 +0000] W [MSGID: 100030] [rpc-clnt.c:735:rpc_clnt_handle_disconnect] 0-rpc: disconnecting from peer node2 ##### snipped #####
Cluster-wide logs are commonly under /var/log/glusterfs, with brick-specific logs typically under /var/log/glusterfs/bricks.
Related: How to check GlusterFS logs
- Check geo-replication status when secondary replication is enabled.
Geo-replication is asynchronous, so a stopped or faulty session can silently leave the secondary behind even when the primary volume looks healthy.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
