Pseudo-distributed mode runs Hadoop daemons on one host while still using HDFS and YARN service boundaries. It is the smallest setup that exercises NameNode, DataNode, ResourceManager, and NodeManager behavior together.
The configuration uses localhost-facing service addresses, local storage directories, and the same XML files used by larger clusters. Format HDFS only after the files are written and the directories are ready.
Use pseudo-distributed mode for development and training. It does not model multi-host failures, network partitions, rack awareness, or production security boundaries.
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property>
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///var/lib/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///var/lib/hadoop/hdfs/data</value> </property>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
$ hdfs namenode -format -clusterId pseudo-local Storage directory /var/lib/hadoop/hdfs/name has been successfully formatted.
$ start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [localhost]
Related: How to restart Hadoop services
$ hdfs dfs -mkdir -p /user/hadoop
$ hdfs dfs -ls / Found 2 items drwxr-xr-x - hadoop supergroup 0 2026-06-17 03:00 /tmp drwxr-xr-x - hadoop supergroup 0 2026-06-17 03:00 /user