How to format the Apache Hadoop NameNode

Formatting the NameNode creates a new HDFS namespace and cluster identity. Running it against an existing cluster removes the metadata that maps files to blocks, so the command belongs only in first-time setup or a deliberate rebuild.

The format command uses the active Hadoop configuration to find dfs.namenode.name.dir and write the namespace files. Confirm the path and stop HDFS daemons before running it.

Use an explicit cluster ID for repeatable lab builds and documented recovery procedures. Keep the ID unique to the cluster whose storage directories are being initialized.

Steps to format the Apache Hadoop NameNode:

Confirm the NameNode storage directory from active configuration.

$ hdfs getconf -confKey dfs.namenode.name.dir
file:///data/hadoop/hdfs/name

Stop HDFS daemons before formatting.

$ stop-dfs.sh
Stopping namenodes on [master01.example.net]
Stopping datanodes
Stopping secondary namenodes [master01.example.net]

Related: How to restart Hadoop services

Check that the NameNode directory is empty for a new cluster.

$ ls -la /data/hadoop/hdfs/name
total 0
drwx------  2 hadoop hadoop  64 Jun 17 03:00 .
drwxr-xr-x  4 hadoop hadoop 128 Jun 17 03:00 ..

Format the NameNode with a cluster ID.
```
$ hdfs namenode -format -clusterId hadoop-lab01
Formatting using clusterid: hadoop-lab01
Storage directory /data/hadoop/hdfs/name has been successfully formatted.
```
Do not run this command on a cluster that already contains data unless the namespace has been backed up and the rebuild is intentional.

Start HDFS after the format completes.

$ start-dfs.sh
Starting namenodes on [master01.example.net]
Starting datanodes
Starting secondary namenodes [master01.example.net]

Verify that the new namespace responds.
```
$ hdfs dfs -ls /
Found 0 items
```