Ignoring the /robots.txt file in Wget enables full‑site crawling and archiving, including paths that polite crawlers normally skip. This behavior is often required for internal compliance audits, offline mirrors, and controlled testing of how sensitive content is exposed.
Under the Robots Exclusion Protocol, web servers publish a /robots.txt file at the site root, and clients that implement the protocol adjust which URLs they fetch based on user‑agent rules. Wget honours these rules when recursive retrieval is enabled by first requesting /robots.txt, then filtering links unless the internal robots variable is explicitly disabled through command‑line options or configuration files.
Disabling this safeguard bypasses the site owner’s published crawling preferences and can add significant load or violate acceptable‑use policies, even if technically possible. The commands below assume access is authorised and focus on Wget running in a shell on Linux, showing both a one‑off override and a configuration change that permanently turns off robots handling for a single user.
Steps to ignore robots.txt in wget:
- Open a terminal on Linux with standard user privileges.
$ whoami userRunning Wget as an unprivileged account reduces the blast radius if an unexpected path is fetched or a misconfiguration causes excessive downloads.
- Ignore /robots.txt for a single recursive crawl by passing the robots variable on the command line.
$ wget --execute=robots=off --recursive https://www.example.com/ --2026-01-10 06:07:58-- https://www.example.com/ Resolving www.example.com (www.example.com)... 203.0.113.50 Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 277 [text/html] Saving to: 'www.example.com/index.html' 0K 100% 15.6M=0s 2026-01-10 06:07:58 (15.6 MB/s) - 'www.example.com/index.html' saved [277/277] --2026-01-10 06:07:58-- https://www.example.com/docs/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 62 [text/html] Saving to: 'www.example.com/docs/index.html' 0K 100% 3.73M=0s 2026-01-10 06:07:58 (3.73 MB/s) - 'www.example.com/docs/index.html' saved [62/62] --2026-01-10 06:07:58-- https://www.example.com/data/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 251 [text/html] Saving to: 'www.example.com/data/index.html' 0K 100% 16.2M=0s 2026-01-10 06:07:58 (16.2 MB/s) - 'www.example.com/data/index.html' saved [251/251] --2026-01-10 06:07:58-- https://www.example.com/repo/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 174 [text/html] Saving to: 'www.example.com/repo/index.html' 0K 100% 10.2M=0s 2026-01-10 06:07:58 (10.2 MB/s) - 'www.example.com/repo/index.html' saved [174/174] --2026-01-10 06:07:58-- https://www.example.com/internal/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 252 [text/html] Saving to: 'www.example.com/internal/index.html' 0K 100% 14.8M=0s 2026-01-10 06:07:58 (14.8 MB/s) - 'www.example.com/internal/index.html' saved [252/252] --2026-01-10 06:07:58-- https://www.example.com/docs/guide.html Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 46 [text/html] Saving to: 'www.example.com/docs/guide.html' 0K 100% 4.05M=0s 2026-01-10 06:07:58 (4.05 MB/s) - 'www.example.com/docs/guide.html' saved [46/46] --2026-01-10 06:07:58-- https://www.example.com/data/image01.jpg Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14 [image/jpeg] Saving to: 'www.example.com/data/image01.jpg' 0K 100% 1.13M=0s 2026-01-10 06:07:58 (1.13 MB/s) - 'www.example.com/data/image01.jpg' saved [14/14] --2026-01-10 06:07:58-- https://www.example.com/data.tar.gz Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576 (1.0M) [application/gzip] Saving to: 'www.example.com/data.tar.gz' 0K .......... .......... .......... .......... .......... 4% 505M 0s 50K .......... .......... .......... .......... .......... 9% 221M 0s 100K .......... .......... .......... .......... .......... 14% 532M 0s 150K .......... .......... .......... .......... .......... 19% 386M 0s 200K .......... .......... .......... .......... .......... 24% 590M 0s 250K .......... .......... .......... .......... .......... 29% 529M 0s 300K .......... .......... .......... .......... .......... 34% 741M 0s 350K .......... .......... .......... .......... .......... 39% 579M 0s 400K .......... .......... .......... .......... .......... 43% 360M 0s 450K .......... .......... .......... .......... .......... 48% 616M 0s 500K .......... .......... .......... .......... .......... 53% 341M 0s 550K .......... .......... .......... .......... .......... 58% 692M 0s 600K .......... .......... .......... .......... .......... 63% 326M 0s 650K .......... .......... .......... .......... .......... 68% 455M 0s 700K .......... .......... .......... .......... .......... 73% 463M 0s 750K .......... .......... .......... .......... .......... 78% 645M 0s 800K .......... .......... .......... .......... .......... 83% 388M 0s 850K .......... .......... .......... .......... .......... 87% 298M 0s 900K .......... .......... .......... .......... .......... 92% 401M 0s 950K .......... .......... .......... .......... .......... 97% 279M 0s 1000K .......... .......... .... 100% 661M=0.002s 2026-01-10 06:07:58 (426 MB/s) - 'www.example.com/data.tar.gz' saved [1048576/1048576] --2026-01-10 06:07:58-- https://www.example.com/internal/file.tar.gz Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 16 [application/gzip] Saving to: 'www.example.com/internal/file.tar.gz' 0K 100% 995K=0s 2026-01-10 06:07:58 (995 KB/s) - 'www.example.com/internal/file.tar.gz' saved [16/16] FINISHED --2026-01-10 06:07:58-- Total wall clock time: 0.05s Downloaded: 9 files, 1.0M in 0.002s (406 MB/s)
The --execute=robots=off option sets the internal robots variable for this invocation only while leaving global configuration unchanged.
- Enable persistent ignoring of /robots.txt by adding a robots setting to the per‑user configuration file.
$ printf 'robots = off\n' >> ~/.wgetrc
Permanently disabling robots handling for a user can breach site policies, increase load on fragile servers, and may trigger IP‑level blocking or legal complaints from administrators.
- Confirm that the robots setting is present in the per‑user configuration.
$ grep -i '^robots' ~/.wgetrc robots = off
If multiple robots entries exist in ~/.wgetrc, the final line is the effective value that Wget applies during downloads.
- Verify that recursive downloads now ignore /robots.txt without specifying the execute option explicitly.
$ wget --recursive https://www.example.com/ --2026-01-10 06:07:58-- https://www.example.com/ Resolving www.example.com (www.example.com)... 203.0.113.50 Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 277 [text/html] Saving to: 'www.example.com/index.html' 0K 100% 16.6M=0s 2026-01-10 06:07:58 (16.6 MB/s) - 'www.example.com/index.html' saved [277/277] --2026-01-10 06:07:58-- https://www.example.com/docs/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 62 [text/html] Saving to: 'www.example.com/docs/index.html' 0K 100% 3.96M=0s 2026-01-10 06:07:58 (3.96 MB/s) - 'www.example.com/docs/index.html' saved [62/62] --2026-01-10 06:07:58-- https://www.example.com/data/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 251 [text/html] Saving to: 'www.example.com/data/index.html' 0K 100% 14.7M=0s 2026-01-10 06:07:58 (14.7 MB/s) - 'www.example.com/data/index.html' saved [251/251] --2026-01-10 06:07:58-- https://www.example.com/repo/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 174 [text/html] Saving to: 'www.example.com/repo/index.html' 0K 100% 11.7M=0s 2026-01-10 06:07:58 (11.7 MB/s) - 'www.example.com/repo/index.html' saved [174/174] --2026-01-10 06:07:58-- https://www.example.com/internal/ Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 252 [text/html] Saving to: 'www.example.com/internal/index.html' 0K 100% 19.4M=0s 2026-01-10 06:07:58 (19.4 MB/s) - 'www.example.com/internal/index.html' saved [252/252] --2026-01-10 06:07:58-- https://www.example.com/docs/guide.html Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 46 [text/html] Saving to: 'www.example.com/docs/guide.html' 0K 100% 3.17M=0s 2026-01-10 06:07:58 (3.17 MB/s) - 'www.example.com/docs/guide.html' saved [46/46] --2026-01-10 06:07:58-- https://www.example.com/data/image01.jpg Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14 [image/jpeg] Saving to: 'www.example.com/data/image01.jpg' 0K 100% 859K=0s 2026-01-10 06:07:58 (859 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14] --2026-01-10 06:07:58-- https://www.example.com/data.tar.gz Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576 (1.0M) [application/gzip] Saving to: 'www.example.com/data.tar.gz' 0K .......... .......... .......... .......... .......... 4% 488M 0s 50K .......... .......... .......... .......... .......... 9% 361M 0s 100K .......... .......... .......... .......... .......... 14% 398M 0s 150K .......... .......... .......... .......... .......... 19% 319M 0s 200K .......... .......... .......... .......... .......... 24% 602M 0s 250K .......... .......... .......... .......... .......... 29% 521M 0s 300K .......... .......... .......... .......... .......... 34% 509M 0s 350K .......... .......... .......... .......... .......... 39% 589M 0s 400K .......... .......... .......... .......... .......... 43% 312M 0s 450K .......... .......... .......... .......... .......... 48% 512M 0s 500K .......... .......... .......... .......... .......... 53% 289M 0s 550K .......... .......... .......... .......... .......... 58% 671M 0s 600K .......... .......... .......... .......... .......... 63% 581M 0s 650K .......... .......... .......... .......... .......... 68% 699M 0s 700K .......... .......... .......... .......... .......... 73% 691M 0s 750K .......... .......... .......... .......... .......... 78% 715M 0s 800K .......... .......... .......... .......... .......... 83% 468M 0s 850K .......... .......... .......... .......... .......... 87% 667M 0s 900K .......... .......... .......... .......... .......... 92% 649M 0s 950K .......... .......... .......... .......... .......... 97% 708M 0s 1000K .......... .......... .... 100% 740M=0.002s 2026-01-10 06:07:58 (499 MB/s) - 'www.example.com/data.tar.gz' saved [1048576/1048576] --2026-01-10 06:07:58-- https://www.example.com/internal/file.tar.gz Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 16 [application/gzip] Saving to: 'www.example.com/internal/file.tar.gz' 0K 100% 1.11M=0s 2026-01-10 06:07:58 (1.11 MB/s) - 'www.example.com/internal/file.tar.gz' saved [16/16] FINISHED --2026-01-10 06:07:58-- Total wall clock time: 0.05s Downloaded: 9 files, 1.0M in 0.002s (472 MB/s)
Successful access to URLs that are disallowed in the site’s /robots.txt, subject to any additional authentication or IP restrictions, indicates that robot exclusion is no longer being honoured for this configuration.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
