Ignoring the /robots.txt file in Wget enables full‑site crawling and archiving, including paths that polite crawlers normally skip. This behavior is often required for internal compliance audits, offline mirrors, and controlled testing of how sensitive content is exposed.

Under the Robots Exclusion Protocol, web servers publish a /robots.txt file at the site root, and clients that implement the protocol adjust which URLs they fetch based on user‑agent rules. Wget honours these rules when recursive retrieval is enabled by first requesting /robots.txt, then filtering links unless the internal robots variable is explicitly disabled through command‑line options or configuration files.

Disabling this safeguard bypasses the site owner’s published crawling preferences and can add significant load or violate acceptable‑use policies, even if technically possible. The commands below assume access is authorised and focus on Wget running in a shell on Linux, showing both a one‑off override and a configuration change that permanently turns off robots handling for a single user.

Steps to ignore robots.txt in wget:

  1. Open a terminal on Linux with standard user privileges.
    $ whoami
    user

    Running Wget as an unprivileged account reduces the blast radius if an unexpected path is fetched or a misconfiguration causes excessive downloads.

  2. Ignore /robots.txt for a single recursive crawl by passing the robots variable on the command line.
    $ wget --execute=robots=off --recursive https://www.example.com/
    --2026-01-10 06:07:58--  https://www.example.com/
    Resolving www.example.com (www.example.com)... 203.0.113.50
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 277 [text/html]
    Saving to: 'www.example.com/index.html'
     
         0K                                                       100% 15.6M=0s
     
    2026-01-10 06:07:58 (15.6 MB/s) - 'www.example.com/index.html' saved [277/277]
     
    --2026-01-10 06:07:58--  https://www.example.com/docs/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 62 [text/html]
    Saving to: 'www.example.com/docs/index.html'
     
         0K                                                       100% 3.73M=0s
     
    2026-01-10 06:07:58 (3.73 MB/s) - 'www.example.com/docs/index.html' saved [62/62]
     
    --2026-01-10 06:07:58--  https://www.example.com/data/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 251 [text/html]
    Saving to: 'www.example.com/data/index.html'
     
         0K                                                       100% 16.2M=0s
     
    2026-01-10 06:07:58 (16.2 MB/s) - 'www.example.com/data/index.html' saved [251/251]
     
    --2026-01-10 06:07:58--  https://www.example.com/repo/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 174 [text/html]
    Saving to: 'www.example.com/repo/index.html'
     
         0K                                                       100% 10.2M=0s
     
    2026-01-10 06:07:58 (10.2 MB/s) - 'www.example.com/repo/index.html' saved [174/174]
     
    --2026-01-10 06:07:58--  https://www.example.com/internal/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 252 [text/html]
    Saving to: 'www.example.com/internal/index.html'
     
         0K                                                       100% 14.8M=0s
     
    2026-01-10 06:07:58 (14.8 MB/s) - 'www.example.com/internal/index.html' saved [252/252]
     
    --2026-01-10 06:07:58--  https://www.example.com/docs/guide.html
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 46 [text/html]
    Saving to: 'www.example.com/docs/guide.html'
     
         0K                                                       100% 4.05M=0s
     
    2026-01-10 06:07:58 (4.05 MB/s) - 'www.example.com/docs/guide.html' saved [46/46]
     
    --2026-01-10 06:07:58--  https://www.example.com/data/image01.jpg
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 14 [image/jpeg]
    Saving to: 'www.example.com/data/image01.jpg'
     
         0K                                                       100% 1.13M=0s
     
    2026-01-10 06:07:58 (1.13 MB/s) - 'www.example.com/data/image01.jpg' saved [14/14]
     
    --2026-01-10 06:07:58--  https://www.example.com/data.tar.gz
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1048576 (1.0M) [application/gzip]
    Saving to: 'www.example.com/data.tar.gz'
     
         0K .......... .......... .......... .......... ..........  4%  505M 0s
        50K .......... .......... .......... .......... ..........  9%  221M 0s
       100K .......... .......... .......... .......... .......... 14%  532M 0s
       150K .......... .......... .......... .......... .......... 19%  386M 0s
       200K .......... .......... .......... .......... .......... 24%  590M 0s
       250K .......... .......... .......... .......... .......... 29%  529M 0s
       300K .......... .......... .......... .......... .......... 34%  741M 0s
       350K .......... .......... .......... .......... .......... 39%  579M 0s
       400K .......... .......... .......... .......... .......... 43%  360M 0s
       450K .......... .......... .......... .......... .......... 48%  616M 0s
       500K .......... .......... .......... .......... .......... 53%  341M 0s
       550K .......... .......... .......... .......... .......... 58%  692M 0s
       600K .......... .......... .......... .......... .......... 63%  326M 0s
       650K .......... .......... .......... .......... .......... 68%  455M 0s
       700K .......... .......... .......... .......... .......... 73%  463M 0s
       750K .......... .......... .......... .......... .......... 78%  645M 0s
       800K .......... .......... .......... .......... .......... 83%  388M 0s
       850K .......... .......... .......... .......... .......... 87%  298M 0s
       900K .......... .......... .......... .......... .......... 92%  401M 0s
       950K .......... .......... .......... .......... .......... 97%  279M 0s
      1000K .......... .......... ....                            100%  661M=0.002s
     
    2026-01-10 06:07:58 (426 MB/s) - 'www.example.com/data.tar.gz' saved [1048576/1048576]
     
    --2026-01-10 06:07:58--  https://www.example.com/internal/file.tar.gz
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 16 [application/gzip]
    Saving to: 'www.example.com/internal/file.tar.gz'
     
         0K                                                       100%  995K=0s
     
    2026-01-10 06:07:58 (995 KB/s) - 'www.example.com/internal/file.tar.gz' saved [16/16]
     
    FINISHED --2026-01-10 06:07:58--
    Total wall clock time: 0.05s
    Downloaded: 9 files, 1.0M in 0.002s (406 MB/s)

    The --execute=robots=off option sets the internal robots variable for this invocation only while leaving global configuration unchanged.

  3. Enable persistent ignoring of /robots.txt by adding a robots setting to the per‑user configuration file.
    $ printf 'robots = off\n' >> ~/.wgetrc

    Permanently disabling robots handling for a user can breach site policies, increase load on fragile servers, and may trigger IP‑level blocking or legal complaints from administrators.

  4. Confirm that the robots setting is present in the per‑user configuration.
    $ grep -i '^robots' ~/.wgetrc
    robots = off

    If multiple robots entries exist in ~/.wgetrc, the final line is the effective value that Wget applies during downloads.

  5. Verify that recursive downloads now ignore /robots.txt without specifying the execute option explicitly.
    $ wget --recursive https://www.example.com/
    --2026-01-10 06:07:58--  https://www.example.com/
    Resolving www.example.com (www.example.com)... 203.0.113.50
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 277 [text/html]
    Saving to: 'www.example.com/index.html'
     
         0K                                                       100% 16.6M=0s
     
    2026-01-10 06:07:58 (16.6 MB/s) - 'www.example.com/index.html' saved [277/277]
     
    --2026-01-10 06:07:58--  https://www.example.com/docs/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 62 [text/html]
    Saving to: 'www.example.com/docs/index.html'
     
         0K                                                       100% 3.96M=0s
     
    2026-01-10 06:07:58 (3.96 MB/s) - 'www.example.com/docs/index.html' saved [62/62]
     
    --2026-01-10 06:07:58--  https://www.example.com/data/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 251 [text/html]
    Saving to: 'www.example.com/data/index.html'
     
         0K                                                       100% 14.7M=0s
     
    2026-01-10 06:07:58 (14.7 MB/s) - 'www.example.com/data/index.html' saved [251/251]
     
    --2026-01-10 06:07:58--  https://www.example.com/repo/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 174 [text/html]
    Saving to: 'www.example.com/repo/index.html'
     
         0K                                                       100% 11.7M=0s
     
    2026-01-10 06:07:58 (11.7 MB/s) - 'www.example.com/repo/index.html' saved [174/174]
     
    --2026-01-10 06:07:58--  https://www.example.com/internal/
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 252 [text/html]
    Saving to: 'www.example.com/internal/index.html'
     
         0K                                                       100% 19.4M=0s
     
    2026-01-10 06:07:58 (19.4 MB/s) - 'www.example.com/internal/index.html' saved [252/252]
     
    --2026-01-10 06:07:58--  https://www.example.com/docs/guide.html
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 46 [text/html]
    Saving to: 'www.example.com/docs/guide.html'
     
         0K                                                       100% 3.17M=0s
     
    2026-01-10 06:07:58 (3.17 MB/s) - 'www.example.com/docs/guide.html' saved [46/46]
     
    --2026-01-10 06:07:58--  https://www.example.com/data/image01.jpg
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 14 [image/jpeg]
    Saving to: 'www.example.com/data/image01.jpg'
     
         0K                                                       100%  859K=0s
     
    2026-01-10 06:07:58 (859 KB/s) - 'www.example.com/data/image01.jpg' saved [14/14]
     
    --2026-01-10 06:07:58--  https://www.example.com/data.tar.gz
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1048576 (1.0M) [application/gzip]
    Saving to: 'www.example.com/data.tar.gz'
     
         0K .......... .......... .......... .......... ..........  4%  488M 0s
        50K .......... .......... .......... .......... ..........  9%  361M 0s
       100K .......... .......... .......... .......... .......... 14%  398M 0s
       150K .......... .......... .......... .......... .......... 19%  319M 0s
       200K .......... .......... .......... .......... .......... 24%  602M 0s
       250K .......... .......... .......... .......... .......... 29%  521M 0s
       300K .......... .......... .......... .......... .......... 34%  509M 0s
       350K .......... .......... .......... .......... .......... 39%  589M 0s
       400K .......... .......... .......... .......... .......... 43%  312M 0s
       450K .......... .......... .......... .......... .......... 48%  512M 0s
       500K .......... .......... .......... .......... .......... 53%  289M 0s
       550K .......... .......... .......... .......... .......... 58%  671M 0s
       600K .......... .......... .......... .......... .......... 63%  581M 0s
       650K .......... .......... .......... .......... .......... 68%  699M 0s
       700K .......... .......... .......... .......... .......... 73%  691M 0s
       750K .......... .......... .......... .......... .......... 78%  715M 0s
       800K .......... .......... .......... .......... .......... 83%  468M 0s
       850K .......... .......... .......... .......... .......... 87%  667M 0s
       900K .......... .......... .......... .......... .......... 92%  649M 0s
       950K .......... .......... .......... .......... .......... 97%  708M 0s
      1000K .......... .......... ....                            100%  740M=0.002s
     
    2026-01-10 06:07:58 (499 MB/s) - 'www.example.com/data.tar.gz' saved [1048576/1048576]
     
    --2026-01-10 06:07:58--  https://www.example.com/internal/file.tar.gz
    Connecting to www.example.com (www.example.com)|203.0.113.50|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 16 [application/gzip]
    Saving to: 'www.example.com/internal/file.tar.gz'
     
         0K                                                       100% 1.11M=0s
     
    2026-01-10 06:07:58 (1.11 MB/s) - 'www.example.com/internal/file.tar.gz' saved [16/16]
     
    FINISHED --2026-01-10 06:07:58--
    Total wall clock time: 0.05s
    Downloaded: 9 files, 1.0M in 0.002s (472 MB/s)

    Successful access to URLs that are disallowed in the site’s /robots.txt, subject to any additional authentication or IP restrictions, indicates that robot exclusion is no longer being honoured for this configuration.