Large HDFS-to-HDFS or object-store copies need a job that can split work across the cluster. DistCp runs a MapReduce job for distributed copying, which makes it better suited than a local shell copy for multi-gigabyte directories and cross-cluster migrations.
The source and destination URIs control where the copy runs. Use explicit paths, run a dry listing before the job, and inspect counters afterward so accidental overwrite or missing-source mistakes are caught early.
DistCp can copy between HDFS and compatible object stores, but object-store renames and consistency behavior can differ from HDFS. Use the simplest copy options first and add update, delete, or bandwidth limits only when the job requires them.
$ hdfs dfs -ls hdfs://nn1.example.net/data/events Found 2 items drwxr-xr-x - analytics data 0 2026-06-17 02:11 hdfs://nn1.example.net/data/events/day=2026-06-16 drwxr-xr-x - analytics data 0 2026-06-17 02:12 hdfs://nn1.example.net/data/events/day=2026-06-17
$ hdfs dfs -mkdir -p hdfs://nn2.example.net/archive
$ hadoop distcp hdfs://nn1.example.net/data/events hdfs://nn2.example.net/archive/events INFO tools.DistCp: DistCp job-id: job_1720000000000_0042 INFO mapreduce.Job: map 100% reduce 0% INFO mapreduce.Job: Job job_1720000000000_0042 completed successfully
$ yarn application -status application_1720000000000_0042 Final-State : SUCCEEDED Tracking-URL : http://rm01.example.net:8088/proxy/application_1720000000000_0042/
Related: How to list YARN applications
$ hdfs dfs -du -s -h hdfs://nn2.example.net/archive/events 90.5 G 181.0 G hdfs://nn2.example.net/archive/events
$ hdfs dfs -ls hdfs://nn2.example.net/archive/events Found 2 items drwxr-xr-x - analytics data 0 2026-06-17 03:11 hdfs://nn2.example.net/archive/events/day=2026-06-16 drwxr-xr-x - analytics data 0 2026-06-17 03:12 hdfs://nn2.example.net/archive/events/day=2026-06-17