How to decommission a Hadoop DataNode

Removing a DataNode without decommissioning can reduce replication and create missing-block alerts. HDFS decommissioning marks the node as leaving service and gives the NameNode time to copy its blocks elsewhere before the host is shut down.

The include and exclude files define which DataNodes may serve the cluster. The procedure updates the exclude file, refreshes NameNode node state, waits for the node to reach Decommissioned, and only then stops the daemon.

Decommissioning needs enough remaining capacity to absorb the blocks from the host. Check cluster health before starting and avoid changing several large nodes at once unless the capacity plan already covers the movement.

Steps to decommission a Hadoop DataNode:

Check HDFS health before removing the node.

$ hdfs fsck / -blocks -locations
Status: HEALTHY
 Total size: 184320000 B
 Total blocks (validated): 42

Add the DataNode hostname to the exclude file.
dfs.exclude
```
worker02.example.net
```

Confirm the active exclude path from Hadoop configuration.

$ hdfs getconf -confKey dfs.hosts.exclude
/etc/hadoop/dfs.exclude

Refresh the NameNode node list.

$ hdfs dfsadmin -refreshNodes
Refresh nodes successful

Watch the decommission state from the NameNode report.

$ hdfs dfsadmin -report
Name: worker02.example.net:9866
Decommission Status : Decommission in progress
Configured Capacity: 107374182400 (100 GB)

Wait until the node reports Decommissioned.

$ hdfs dfsadmin -report
Name: worker02.example.net:9866
Decommission Status : Decommissioned
Under replicated blocks: 0

Stop the DataNode daemon on the removed host.
```
$ hdfs --daemon stop datanode
Stopping datanode
```
Related: How to restart Hadoop services

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.