Recursive downloads with wget allow remote directory trees or web site sections to be copied locally in a single operation, preserving structure for offline browsing, compliance archiving, and bulk analysis. Mirroring a directory rather than fetching individual files keeps relative links intact and reduces manual work when large hierarchies must be inspected.
During recursive operation, wget starts from a seed URL, parses links in HTML pages or directory listings, and follows them according to options such as --recursive, --no-parent, --level, and --accept. Paths from the remote server are translated into local directories so that filenames and relative URLs match the original layout as closely as possible.
Unconstrained recursion can easily cross into unrelated areas, hammer a remote service with thousands of requests, or consume large amounts of local storage. The commands below assume a POSIX-style shell with wget already installed and combine depth limits, hostname restrictions, file-type filters, and request pauses so that mirrors stay focused and comply with terms of use and /robots.txt rules.
Steps to download directories recursively using wget:
- Open a terminal in the destination directory for the mirrored content and confirm the current path.
$ pwd /home/alex/mirrors
Using a dedicated directory keeps mirrored trees separate from other downloads and simplifies cleanup.
- Run wget with --recursive against the base directory URL to perform an initial mirror.
$ wget --recursive https://www.example.com/data/ --2025-12-08 10:00:00-- https://www.example.com/data/ Resolving www.example.com (www.example.com)... 93.184.216.34 Connecting to www.example.com (www.example.com)|93.184.216.34|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘www.example.com/data/index.html’ ##### snipped #####
A trailing slash at the end of the URL signals a directory target so wget mirrors its contents instead of treating the URL as a single file.
- Prevent recursion from following links into parent directories by enabling --no-parent and confining requests to a specific hostname with --domains.
$ wget --recursive --no-parent --domains=www.example.com https://www.example.com/data/
--no-parent stops climbs into higher-level paths such as https://www.example.com/, while --domains blocks links that point to third-party hosts.
- Control how deep recursion descends by setting an explicit level so that only a limited number of link hops are followed.
$ wget --recursive --no-parent --domains=www.example.com --level=2 https://www.example.com/data/
--level=1 keeps downloads to files linked directly from the starting page, while larger values expand coverage but increase runtime and storage use.
- Restrict saved content to specific file extensions using --accept to focus on relevant assets such as images or documents.
$ wget --recursive --no-parent --domains=www.example.com --level=1 --accept=jpg,png,pdf https://www.example.com/data/
Extensions in --accept are comma-separated without dots; only matching files are stored even though recursion still traverses HTML pages that reference them.
- Throttle request rate with --wait to insert pauses between downloads and reduce impact on the remote service.
$ wget --recursive --no-parent --domains=www.example.com --level=1 --wait=2 https://www.example.com/data/
Very small wait intervals combined with large recursion depths can still overload small servers and may violate acceptable-use policies; --random-wait spreads requests out with jitter for additional protection.
- Verify that the mirrored directory structure and expected files exist locally by inspecting the generated tree.
$ find www.example.com -maxdepth 3 -type f | head www.example.com/data/index.html www.example.com/data/image01.jpg www.example.com/data/subdir/report.pdf ##### snipped #####
Successful recursion is indicated by a populated local tree under the host name, absence of repeated 404 Not Found lines in output, and presence of the expected file types in the listing.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
Comment anonymously. Login not required.
