Recursive retrieval is useful when one published directory needs to be copied locally with its files still arranged in a usable tree. That makes it practical to review exports offline, pass the result to another tool, or preserve one bounded part of a site without mirroring everything.
GNU wget follows links from the starting directory page when -r is enabled. -np blocks climbs above the starting path, and -l sets how many link levels wget may follow, so -l 1 fits one directory while -l 2 reaches files inside its first nested subdirectories.
The starting URL should end with a trailing slash so -np treats it as a directory boundary, and recursive runs still fetch the HTML listing page and robots.txt while discovering links. Raise the level only as far as the remote tree needs, and add request delays before pointing the same pattern at a shared or rate-limited host.
$ wget -r -np -l 2 -P mirror https://archive.example.net/exports/records/ --2026-04-22 06:24:45-- https://archive.example.net/exports/records/ Resolving archive.example.net (archive.example.net)... 203.0.113.50 Connecting to archive.example.net (archive.example.net)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Saving to: 'mirror/archive.example.net/exports/records/index.html' ##### snipped ##### Saving to: 'mirror/archive.example.net/robots.txt' Saving to: 'mirror/archive.example.net/exports/records/reports/daily-summary.csv' Saving to: 'mirror/archive.example.net/exports/records/reports/monthly-summary.csv' Saving to: 'mirror/archive.example.net/exports/records/assets/storage-trend.png' FINISHED --2026-04-22 06:24:46-- Downloaded: 5 files, 34K in 0.02s (1.68 MB/s)
-l 1 is enough when the starting directory page links directly to every file that needs to be saved, while -l 2 is the practical next step when the listing links to immediate subdirectories first.
$ wget -r -np -l 2 -nH --cut-dirs=2 -P mirror https://archive.example.net/exports/records/
-nH drops archive.example.net from the local path, and –cut-dirs=2 strips /exports/records/ so the saved tree starts directly under /mirror/.
$ wget -r -np -l 2 -nH --cut-dirs=2 -A csv,png -P mirror https://archive.example.net/exports/records/
-A filters the saved files by suffix, but wget still downloads the listing page and robots.txt long enough to discover links before removing local files that do not match the accept list.
$ wget -r -np -l 2 -nH --cut-dirs=2 --wait=2 --random-wait -P mirror https://archive.example.net/exports/records/
$ find mirror -type f mirror/assets/storage-trend.png mirror/reports/daily-summary.csv mirror/reports/monthly-summary.csv
The absence of parent-path content is the signal that -np kept retrieval inside the intended directory boundary.