HDFS high availability keeps a standby NameNode ready to take over when the active NameNode fails. With QJM, both NameNodes write edit logs through a quorum of JournalNode daemons instead of relying on shared storage.

The HA configuration must name the logical namespace, both NameNode RPC addresses, JournalNode quorum, and failover provider. Format the first NameNode, initialize shared edits, bootstrap the standby, and verify active and standby states before relying on failover.

Use an odd number of JournalNodes on separate hosts. A two-node quorum cannot tolerate a JournalNode loss and still protect the edit log.

Steps to configure HDFS high availability with QJM:

  1. Define the logical nameservice and NameNode IDs.
    hdfs-site.xml
    <property>
      <name>dfs.nameservices</name>
      <value>cluster1</value>
    </property>
    <property>
      <name>dfs.ha.namenodes.cluster1</name>
      <value>nn1,nn2</value>
    </property>
  2. Set the NameNode RPC addresses.
    hdfs-site.xml
    <property>
      <name>dfs.namenode.rpc-address.cluster1.nn1</name>
      <value>nn1.example.net:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.cluster1.nn2</name>
      <value>nn2.example.net:8020</value>
    </property>
  3. Set the JournalNode quorum and failover provider.
    hdfs-site.xml
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://jn1.example.net:8485;jn2.example.net:8485;jn3.example.net:8485/cluster1</value>
    </property>
    <property>
      <name>dfs.client.failover.proxy.provider.cluster1</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
  4. Start the JournalNode daemons.
    $ hdfs --daemon start journalnode

    Run this on each JournalNode host before formatting shared edits.

  5. Format the first NameNode and initialize shared edits.
    $ hdfs namenode -format -clusterId hadoop-ha01
    Storage directory /data/hadoop/hdfs/name has been successfully formatted.
  6. Bootstrap the standby NameNode.
    $ hdfs namenode -bootstrapStandby
    =====================================================
    About to bootstrap Standby ID nn2 from:
               Nameservice ID: cluster1
            Other Namenode ID: nn1
    =====================================================
    Storage directory /data/hadoop/hdfs/name has been successfully formatted.
  7. Verify NameNode HA states.
    $ hdfs haadmin -getServiceState nn1
    active
  8. Check the standby state.
    $ hdfs haadmin -getServiceState nn2
    standby