Recursive retrieval is useful when a remote server exposes one bounded subtree of files that needs to be copied locally for offline access, review, or repeatable processing. A single wget run can walk that subtree and preserve the same relative layout on disk, which is usually cleaner than downloading each file by hand.

GNU wget handles this job with --recursive, then narrows the crawl with options such as --level, --no-parent, --domains, --accept, --directory-prefix, --no-host-directories, and --cut-dirs. The exact mix decides how deep traversal goes, which hosts and paths are allowed, which file types stay in scope, and how the mirrored tree is laid out locally.

Scope control matters more than speed on recursive jobs. A loose start URL, unlimited depth, or missing host boundary can pull unrelated content quickly, while repeated requests can place unnecessary load on the origin. Start with a spider pass, keep the target path explicit, and add pacing when the source is shared or rate limited.

Steps to download a directory recursively with wget:

  1. Create and enter a dedicated working directory before the crawl starts.
    $ mkdir -p ~/downloads/records-mirror
    $ cd ~/downloads/records-mirror
    $ pwd
    /home/user/downloads/records-mirror

    A fixed working directory keeps the mirrored tree, temporary files, and cleanup work together.

  2. Validate the subtree with spider mode before writing any files.
    $ wget --spider --recursive --no-parent --level=2 \
      --domains=archive.example.net \
      https://archive.example.net/exports/records/
    Spider mode enabled. Check if remote file exists.
    ##### snipped #####
    HTTP request sent, awaiting response... 200 OK
    Remote file exists and could contain links to other resources -- retrieving.
    
    Loading robots.txt; please ignore errors.
    ##### snipped #####
    Remote file exists but does not contain any link -- not retrieving.
    
    Found no broken links.

    A clean spider pass confirms the URL, depth, and parent boundary before a live run starts filling the local disk.

    Keep the trailing slash on a directory URL so --no-parent treats the start path as the crawl boundary.

  3. Download the directory tree with explicit depth and host limits.
    $ wget --recursive --no-parent --level=2 \
      --domains=archive.example.net \
      --directory-prefix=mirror \
      --convert-links --adjust-extension \
      https://archive.example.net/exports/records/
    ##### snipped #####
    HTTP request sent, awaiting response... 200 OK
    Saving to: 'mirror/archive.example.net/exports/records/index.html'
    Loading robots.txt; please ignore errors.
    Saving to: 'mirror/archive.example.net/robots.txt'
    ##### snipped #####
    Saving to: 'mirror/archive.example.net/exports/records/reports/daily-summary-2026-03-28.csv'
    Saving to: 'mirror/archive.example.net/exports/records/reports/monthly-summary-2026-03.csv'
    Saving to: 'mirror/archive.example.net/exports/records/assets/storage-trend.png'
    FINISHED --2026-03-29 09:24:01--
    Downloaded: 5 files, 34K in 0.02s (1.68 MB/s)

    --no-parent stops ascent above /exports/records/, while --domains prevents the crawl from drifting onto other hosts.

  4. Flatten the local tree or limit file types when the remote layout is broader than the job requires.
    $ wget --recursive --no-parent --level=2 \
      --domains=archive.example.net \
      --no-host-directories --cut-dirs=2 \
      --accept=csv,png \
      --directory-prefix=mirror \
      https://archive.example.net/exports/records/

    --no-host-directories drops the host directory, --cut-dirs=2 strips the leading /exports/records/ path segments, and --accept keeps only the requested extensions.

  5. Add pacing before running the same pattern against a shared or rate-limited host.
    $ wget --recursive --no-parent --level=2 \
      --domains=archive.example.net \
      --wait=2 --random-wait --limit-rate=250k \
      https://archive.example.net/exports/records/

    Recursive jobs generate many requests in one run, so delay and rate controls matter more here than on single-file downloads.

  6. Verify the final tree before another tool consumes the files.
    $ find mirror -type f | sort
    mirror/archive.example.net/exports/records/assets/storage-trend.png
    mirror/archive.example.net/exports/records/index.html
    mirror/archive.example.net/exports/records/reports/daily-summary-2026-03-28.csv
    mirror/archive.example.net/exports/records/reports/monthly-summary-2026-03.csv
    mirror/archive.example.net/robots.txt

    The mirrored export subtree, plus the fetched robots.txt policy file, confirms that recursion stayed inside the intended host and path boundary.