How to download an entire website using wget

Archiving or mirroring an entire website allows offline browsing, backups, or reference material for research. wget provides recursive fetching capabilities, making it possible to save HTML, images, and other resources while preserving directory structure.

By using options that follow internal links and convert them for offline use, a local copy of the site is created. This approach is helpful when studying site structure, accessing documentation offline, or safeguarding content that might become unavailable.

Before downloading a site, review its robots.txt and ensure compliance with any usage policies. Mirroring large sites may strain the server, so use such features responsibly and ethically.

Steps to download a website using wget:

Install wget if not already available.

$ sudo apt update && sudo apt install wget

Check the website’s robots.txt file to confirm allowed paths.
```
$ curl https://www.example.com/robots.txt
```
Respect the site’s terms to avoid restrictions or bans.

Use the --mirror option to create a local copy of the entire website.

$ wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://www.example.com/

These options ensure all necessary files are downloaded and links adjusted for offline viewing.

Option	Description
--mirror	Tells wget to mirror the site
--convert-links	Adjusts links for offline viewing
--adjust-extension	Adjusts the file extensions
--page-requisites	Downloads all page prerequisites
--no-parent	Avoids downloading links outside of the specified domain

After completion, navigate to the mirrored site directory and open the index.html file in a browser.
```
$ cd www.example.com
$ firefox index.html
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.

Discuss the article:

Comment anonymously. Login not required.