Recursive wget downloads respect a site's published robot rules by default, which is the right behavior for ordinary mirrors, audits, and offline copies. When you have explicit permission to fetch paths that the site blocks from crawlers, you can turn that check off for one bounded run.
Current GNU wget still uses the robots setting for this behavior. The command-line form --execute robots=off disables the usual /robots.txt/ check for the current command, while robots = off in ~/.wgetrc makes the override persistent for that account. The same setting also disables document-level nofollow rules during recursive retrieval.
Use the override narrowly. Keep the start URL, recursion depth, and output directory explicit, and restore the default policy when the approved capture is finished so later recursive jobs do not keep bypassing crawl rules.
$ wget --quiet --output-document=- https://archive.example.net/robots.txt User-agent: * Disallow: /exports/internal/
This confirms which path the default recursive run will refuse to fetch.
$ wget --recursive --level=1 --no-parent https://archive.example.net/exports/ Saving to: 'archive.example.net/exports/index.html' Loading robots.txt; please ignore errors. Saving to: 'archive.example.net/robots.txt' Saving to: 'archive.example.net/exports/public/status.html' Downloaded: 3 files, 472 in 0s
The baseline run makes the later override easy to audit because the blocked path is still absent.
$ wget --recursive --level=1 --no-parent --execute robots=off https://archive.example.net/exports/ Saving to: 'archive.example.net/exports/index.html' Saving to: 'archive.example.net/exports/public/status.html' Saving to: 'archive.example.net/exports/internal/report.html' Downloaded: 3 files, 588 in 0s
Use this only for approved captures. The short -e robots=off form is the same override.
~/.wgetrc robots = off
This changes every later recursive wget run for that user until you remove it or set it back to on. Related: How to configure default options in ~/.wgetrc
$ ls archive.example.net/exports/internal report.html
A file in the blocked directory confirms that the override was active for that run.
~/.wgetrc robots = on
Remove the override entirely if you do not want any saved robots setting in the user profile.