Recursive downloads with wget allow remote directory trees or web site sections to be copied locally in a single operation, preserving structure for offline browsing, compliance archiving, and bulk analysis. Mirroring a directory rather than fetching individual files keeps relative links intact and reduces manual work when large hierarchies must be inspected.
During recursive operation, wget starts from a seed URL, parses links in HTML pages or directory listings, and follows them according to options such as --recursive, --no-parent, --level, and --accept. Paths from the remote server are translated into local directories so that filenames and relative URLs match the original layout as closely as possible.
Unconstrained recursion can easily cross into unrelated areas, hammer a remote service with thousands of requests, or consume large amounts of local storage. The commands below assume a POSIX-style shell with wget already installed and combine depth limits, hostname restrictions, file-type filters, and request pauses so that mirrors stay focused and comply with terms of use and /robots.txt rules.
Steps to download directories recursively using wget:
- Open a terminal in the destination directory for the mirrored content and confirm the current path.
$ pwd /home/user/recursive/step1
Using a dedicated directory keeps mirrored trees separate from other downloads and simplifies cleanup.
- Run wget with --recursive against the base directory URL to perform an initial mirror.
$ wget --recursive https://www.example.com/data/ --2026-01-10 06:07:37-- https://www.example.com/data/ Resolving www.example.com (www.example.com)... 203.0.113.50 Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 251 [text/html] Saving to: 'www.example.com/data/index.html' 0K 100% 16.4M=0s 2026-01-10 06:07:37 (16.4 MB/s) - 'www.example.com/data/index.html' saved [251/251] Loading robots.txt; please ignore errors. --2026-01-10 06:07:37-- https://www.example.com/robots.txt Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 34 [text/plain] Saving to: 'www.example.com/robots.txt' 0K 100% 2.10M=0s 2026-01-10 06:07:37 (2.10 MB/s) - 'www.example.com/robots.txt' saved [34/34] --2026-01-10 06:07:37-- https://www.example.com/data/image01.jpg Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14 [image/jpeg] Saving to: 'www.example.com/data/image01.jpg' 0K 100% 607K=0s 2026-01-10 06:07:37 (607 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14] --2026-01-10 06:07:37-- https://www.example.com/data.tar.gz Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576 (1.0M) [application/gzip] Saving to: 'www.example.com/data.tar.gz' 0K .......... .......... .......... .......... .......... 4% 305M 0s 50K .......... .......... .......... .......... .......... 9% 442M 0s 100K .......... .......... .......... .......... .......... 14% 693M 0s 150K .......... .......... .......... .......... .......... 19% 602M 0s 200K .......... .......... .......... .......... .......... 24% 541M 0s 250K .......... .......... .......... .......... .......... 29% 475M 0s 300K .......... .......... .......... .......... .......... 34% 656M 0s 350K .......... .......... .......... .......... .......... 39% 607M 0s 400K .......... .......... .......... .......... .......... 43% 471M 0s 450K .......... .......... .......... .......... .......... 48% 623M 0s 500K .......... .......... .......... .......... .......... 53% 449M 0s 550K .......... .......... .......... .......... .......... 58% 105M 0s 600K .......... .......... .......... .......... .......... 63% 382M 0s 650K .......... .......... .......... .......... .......... 68% 702M 0s 700K .......... .......... .......... .......... .......... 73% 479M 0s 750K .......... .......... .......... .......... .......... 78% 595M 0s 800K .......... .......... .......... .......... .......... 83% 476M 0s 850K .......... .......... .......... .......... .......... 87% 657M 0s 900K .......... .......... .......... .......... .......... 92% 649M 0s 950K .......... .......... .......... .......... .......... 97% 628M 0s 1000K .......... .......... .... 100% 666M=0.002s 2026-01-10 06:07:37 (440 MB/s) - 'www.example.com/data.tar.gz' saved [1048576/1048576] FINISHED --2026-01-10 06:07:37-- Total wall clock time: 0.04s Downloaded: 4 files, 1.0M in 0.002s (430 MB/s)A trailing slash at the end of the URL signals a directory target so wget mirrors its contents instead of treating the URL as a single file.
- Prevent recursion from following links into parent directories by enabling --no-parent and confining requests to a specific hostname with --domains.
$ wget --recursive --no-parent --domains=www.example.com https://www.example.com/data/ --2026-01-10 06:07:37-- https://www.example.com/data/ Resolving www.example.com (www.example.com)... 203.0.113.50 Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 251 [text/html] Saving to: 'www.example.com/data/index.html' 0K 100% 9.92M=0s 2026-01-10 06:07:37 (9.92 MB/s) - 'www.example.com/data/index.html' saved [251/251] Loading robots.txt; please ignore errors. --2026-01-10 06:07:37-- https://www.example.com/robots.txt Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 34 [text/plain] Saving to: 'www.example.com/robots.txt' 0K 100% 2.21M=0s 2026-01-10 06:07:37 (2.21 MB/s) - 'www.example.com/robots.txt' saved [34/34] --2026-01-10 06:07:37-- https://www.example.com/data/image01.jpg Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14 [image/jpeg] Saving to: 'www.example.com/data/image01.jpg' 0K 100% 688K=0s 2026-01-10 06:07:37 (688 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14] FINISHED --2026-01-10 06:07:37-- Total wall clock time: 0.03s Downloaded: 3 files, 299 in 0s (4.86 MB/s)--no-parent stops climbs into higher-level paths such as https://www.example.com/, while --domains blocks links that point to third-party hosts.
- Control how deep recursion descends by setting an explicit level so that only a limited number of link hops are followed.
$ wget --recursive --no-parent --domains=www.example.com --level=2 https://www.example.com/data/
--level=1 keeps downloads to files linked directly from the starting page, while larger values expand coverage but increase runtime and storage use.
- Restrict saved content to specific file extensions using --accept to focus on relevant assets such as images or documents.
$ wget --recursive --no-parent --domains=www.example.com --level=1 --accept=jpg,png,pdf https://www.example.com/data/ --2026-01-10 06:07:37-- https://www.example.com/data/ Resolving www.example.com (www.example.com)... 203.0.113.50 Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 251 [text/html] Saving to: 'www.example.com/data/index.html.tmp' 0K 100% 16.4M=0s 2026-01-10 06:07:37 (16.4 MB/s) - 'www.example.com/data/index.html.tmp' saved [251/251] Loading robots.txt; please ignore errors. --2026-01-10 06:07:37-- https://www.example.com/robots.txt Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 34 [text/plain] Saving to: 'www.example.com/robots.txt.tmp' 0K 100% 1.60M=0s 2026-01-10 06:07:37 (1.60 MB/s) - 'www.example.com/robots.txt.tmp' saved [34/34] Removing www.example.com/robots.txt.tmp. Removing www.example.com/data/index.html.tmp since it should be rejected. --2026-01-10 06:07:37-- https://www.example.com/data/image01.jpg Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14 [image/jpeg] Saving to: 'www.example.com/data/image01.jpg' 0K 100% 946K=0s 2026-01-10 06:07:37 (946 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14] FINISHED --2026-01-10 06:07:37-- Total wall clock time: 0.03s Downloaded: 3 files, 299 in 0s (5.78 MB/s)Extensions in --accept are comma-separated without dots; only matching files are stored even though recursion still traverses HTML pages that reference them.
- Throttle request rate with --wait to insert pauses between downloads and reduce impact on the remote service.
$ wget --recursive --no-parent --domains=www.example.com --level=1 --wait=2 https://www.example.com/data/
Very small wait intervals combined with large recursion depths can still overload small servers and may violate acceptable-use policies; --random-wait spreads requests out with jitter for additional protection.
- Verify that the mirrored directory structure and expected files exist locally by inspecting the generated tree.
$ find www.example.com -maxdepth 3 -type f | head www.example.com/data/index.html www.example.com/data/image01.jpg www.example.com/robots.txt www.example.com/data.tar.gz
Successful recursion is indicated by a populated local tree under the host name, absence of repeated 404 Not Found lines in output, and presence of the expected file types in the listing.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
