How to copy data with Hadoop DistCp

Large HDFS-to-HDFS or object-store copies need a job that can split work across the cluster. DistCp runs a MapReduce job for distributed copying, which makes it better suited than a local shell copy for multi-gigabyte directories and cross-cluster migrations.

The source and destination URIs control where the copy runs. Use explicit paths, run a dry listing before the job, and inspect counters afterward so accidental overwrite or missing-source mistakes are caught early.

DistCp can copy between HDFS and compatible object stores, but object-store renames and consistency behavior can differ from HDFS. Use the simplest copy options first and add update, delete, or bandwidth limits only when the job requires them.

Steps to copy data with Hadoop DistCp:

  1. List the source path before starting the copy.
    $ hdfs dfs -ls hdfs://nn1.example.net/data/events
    Found 2 items
    drwxr-xr-x   - analytics data          0 2026-06-17 02:11 hdfs://nn1.example.net/data/events/day=2026-06-16
    drwxr-xr-x   - analytics data          0 2026-06-17 02:12 hdfs://nn1.example.net/data/events/day=2026-06-17
  2. Create the destination parent directory when needed.
    $ hdfs dfs -mkdir -p hdfs://nn2.example.net/archive
  3. Run DistCp with explicit source and destination URIs.
    $ hadoop distcp hdfs://nn1.example.net/data/events hdfs://nn2.example.net/archive/events
    INFO tools.DistCp: DistCp job-id: job_1720000000000_0042
    INFO mapreduce.Job: map 100% reduce 0%
    INFO mapreduce.Job: Job job_1720000000000_0042 completed successfully
  4. Check the YARN application for a successful final state.
    $ yarn application -status application_1720000000000_0042
    Final-State : SUCCEEDED
    Tracking-URL : http://rm01.example.net:8088/proxy/application_1720000000000_0042/
  5. Compare the copied directory size.
    $ hdfs dfs -du -s -h hdfs://nn2.example.net/archive/events
    90.5 G  181.0 G  hdfs://nn2.example.net/archive/events
  6. Inspect the destination listing.
    $ hdfs dfs -ls hdfs://nn2.example.net/archive/events
    Found 2 items
    drwxr-xr-x   - analytics data          0 2026-06-17 03:11 hdfs://nn2.example.net/archive/events/day=2026-06-16
    drwxr-xr-x   - analytics data          0 2026-06-17 03:12 hdfs://nn2.example.net/archive/events/day=2026-06-17