Recursive downloads with wget allow remote directory trees or web site sections to be copied locally in a single operation, preserving structure for offline browsing, compliance archiving, and bulk analysis. Mirroring a directory rather than fetching individual files keeps relative links intact and reduces manual work when large hierarchies must be inspected.

During recursive operation, wget starts from a seed URL, parses links in HTML pages or directory listings, and follows them according to options such as --recursive, --no-parent, --level, and --accept. Paths from the remote server are translated into local directories so that filenames and relative URLs match the original layout as closely as possible.

Unconstrained recursion can easily cross into unrelated areas, hammer a remote service with thousands of requests, or consume large amounts of local storage. The commands below assume a POSIX-style shell with wget already installed and combine depth limits, hostname restrictions, file-type filters, and request pauses so that mirrors stay focused and comply with terms of use and /robots.txt rules.

Steps to download directories recursively using wget:

  1. Open a terminal in the destination directory for the mirrored content and confirm the current path.
    $ pwd
    /home/user/recursive/step1

    Using a dedicated directory keeps mirrored trees separate from other downloads and simplifies cleanup.

  2. Run wget with --recursive against the base directory URL to perform an initial mirror.
    $ wget --recursive https://www.example.com/data/
    --2026-01-10 06:07:37--  https://www.example.com/data/
    Resolving www.example.com (www.example.com)... 203.0.113.50
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 251 [text/html]
    Saving to: 'www.example.com/data/index.html'
    
         0K                                                       100% 16.4M=0s
    
    2026-01-10 06:07:37 (16.4 MB/s) - 'www.example.com/data/index.html' saved [251/251]
    
    Loading robots.txt; please ignore errors.
    --2026-01-10 06:07:37--  https://www.example.com/robots.txt
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 34 [text/plain]
    Saving to: 'www.example.com/robots.txt'
    
         0K                                                       100% 2.10M=0s
    
    2026-01-10 06:07:37 (2.10 MB/s) - 'www.example.com/robots.txt' saved [34/34]
    
    --2026-01-10 06:07:37--  https://www.example.com/data/image01.jpg
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 14 [image/jpeg]
    Saving to: 'www.example.com/data/image01.jpg'
    
         0K                                                       100%  607K=0s
    
    2026-01-10 06:07:37 (607 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14]
    
    --2026-01-10 06:07:37--  https://www.example.com/data.tar.gz
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1048576 (1.0M) [application/gzip]
    Saving to: 'www.example.com/data.tar.gz'
    
         0K .......... .......... .......... .......... ..........  4%  305M 0s
        50K .......... .......... .......... .......... ..........  9%  442M 0s
       100K .......... .......... .......... .......... .......... 14%  693M 0s
       150K .......... .......... .......... .......... .......... 19%  602M 0s
       200K .......... .......... .......... .......... .......... 24%  541M 0s
       250K .......... .......... .......... .......... .......... 29%  475M 0s
       300K .......... .......... .......... .......... .......... 34%  656M 0s
       350K .......... .......... .......... .......... .......... 39%  607M 0s
       400K .......... .......... .......... .......... .......... 43%  471M 0s
       450K .......... .......... .......... .......... .......... 48%  623M 0s
       500K .......... .......... .......... .......... .......... 53%  449M 0s
       550K .......... .......... .......... .......... .......... 58%  105M 0s
       600K .......... .......... .......... .......... .......... 63%  382M 0s
       650K .......... .......... .......... .......... .......... 68%  702M 0s
       700K .......... .......... .......... .......... .......... 73%  479M 0s
       750K .......... .......... .......... .......... .......... 78%  595M 0s
       800K .......... .......... .......... .......... .......... 83%  476M 0s
       850K .......... .......... .......... .......... .......... 87%  657M 0s
       900K .......... .......... .......... .......... .......... 92%  649M 0s
       950K .......... .......... .......... .......... .......... 97%  628M 0s
      1000K .......... .......... ....                            100%  666M=0.002s
    
    2026-01-10 06:07:37 (440 MB/s) - 'www.example.com/data.tar.gz' saved [1048576/1048576]
    
    FINISHED --2026-01-10 06:07:37--
    Total wall clock time: 0.04s
    Downloaded: 4 files, 1.0M in 0.002s (430 MB/s)

    A trailing slash at the end of the URL signals a directory target so wget mirrors its contents instead of treating the URL as a single file.

  3. Prevent recursion from following links into parent directories by enabling --no-parent and confining requests to a specific hostname with --domains.
    $ wget --recursive --no-parent --domains=www.example.com https://www.example.com/data/
    --2026-01-10 06:07:37--  https://www.example.com/data/
    Resolving www.example.com (www.example.com)... 203.0.113.50
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 251 [text/html]
    Saving to: 'www.example.com/data/index.html'
    
         0K                                                       100% 9.92M=0s
    
    2026-01-10 06:07:37 (9.92 MB/s) - 'www.example.com/data/index.html' saved [251/251]
    
    Loading robots.txt; please ignore errors.
    --2026-01-10 06:07:37--  https://www.example.com/robots.txt
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 34 [text/plain]
    Saving to: 'www.example.com/robots.txt'
    
         0K                                                       100% 2.21M=0s
    
    2026-01-10 06:07:37 (2.21 MB/s) - 'www.example.com/robots.txt' saved [34/34]
    
    --2026-01-10 06:07:37--  https://www.example.com/data/image01.jpg
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 14 [image/jpeg]
    Saving to: 'www.example.com/data/image01.jpg'
    
         0K                                                       100%  688K=0s
    
    2026-01-10 06:07:37 (688 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14]
    
    FINISHED --2026-01-10 06:07:37--
    Total wall clock time: 0.03s
    Downloaded: 3 files, 299 in 0s (4.86 MB/s)

    --no-parent stops climbs into higher-level paths such as https://www.example.com/, while --domains blocks links that point to third-party hosts.

  4. Control how deep recursion descends by setting an explicit level so that only a limited number of link hops are followed.
    $ wget --recursive --no-parent --domains=www.example.com --level=2 https://www.example.com/data/

    --level=1 keeps downloads to files linked directly from the starting page, while larger values expand coverage but increase runtime and storage use.

  5. Restrict saved content to specific file extensions using --accept to focus on relevant assets such as images or documents.
    $ wget --recursive --no-parent --domains=www.example.com --level=1 --accept=jpg,png,pdf https://www.example.com/data/
    --2026-01-10 06:07:37--  https://www.example.com/data/
    Resolving www.example.com (www.example.com)... 203.0.113.50
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 251 [text/html]
    Saving to: 'www.example.com/data/index.html.tmp'
    
         0K                                                       100% 16.4M=0s
    
    2026-01-10 06:07:37 (16.4 MB/s) - 'www.example.com/data/index.html.tmp' saved [251/251]
    
    Loading robots.txt; please ignore errors.
    --2026-01-10 06:07:37--  https://www.example.com/robots.txt
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 34 [text/plain]
    Saving to: 'www.example.com/robots.txt.tmp'
    
         0K                                                       100% 1.60M=0s
    
    2026-01-10 06:07:37 (1.60 MB/s) - 'www.example.com/robots.txt.tmp' saved [34/34]
    
    Removing www.example.com/robots.txt.tmp.
    Removing www.example.com/data/index.html.tmp since it should be rejected.
    
    --2026-01-10 06:07:37--  https://www.example.com/data/image01.jpg
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 14 [image/jpeg]
    Saving to: 'www.example.com/data/image01.jpg'
    
         0K                                                       100%  946K=0s
    
    2026-01-10 06:07:37 (946 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14]
    
    FINISHED --2026-01-10 06:07:37--
    Total wall clock time: 0.03s
    Downloaded: 3 files, 299 in 0s (5.78 MB/s)

    Extensions in --accept are comma-separated without dots; only matching files are stored even though recursion still traverses HTML pages that reference them.

  6. Throttle request rate with --wait to insert pauses between downloads and reduce impact on the remote service.
    $ wget --recursive --no-parent --domains=www.example.com --level=1 --wait=2 https://www.example.com/data/

    Very small wait intervals combined with large recursion depths can still overload small servers and may violate acceptable-use policies; --random-wait spreads requests out with jitter for additional protection.

  7. Verify that the mirrored directory structure and expected files exist locally by inspecting the generated tree.
    $ find www.example.com -maxdepth 3 -type f | head
    www.example.com/data/index.html
    www.example.com/data/image01.jpg
    www.example.com/robots.txt
    www.example.com/data.tar.gz

    Successful recursion is indicated by a populated local tree under the host name, absence of repeated 404 Not Found lines in output, and presence of the expected file types in the listing.