Archiving or mirroring an entire website allows offline browsing, backups, or reference material for research. wget provides recursive fetching capabilities, making it possible to save HTML, images, and other resources while preserving directory structure.
By using options that follow internal links and convert them for offline use, a local copy of the site is created. This approach is helpful when studying site structure, accessing documentation offline, or safeguarding content that might become unavailable.
Before downloading a site, review its robots.txt and ensure compliance with any usage policies. Mirroring large sites may strain the server, so use such features responsibly and ethically.
Steps to download a website using wget:
- Install wget if not already available.
$ sudo apt update && sudo apt install wget
- Check the website’s robots.txt file to confirm allowed paths.
$ curl https://www.example.com/robots.txt
Respect the site’s terms to avoid restrictions or bans.
- Use the --mirror option to create a local copy of the entire website.
$ wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://www.example.com/
These options ensure all necessary files are downloaded and links adjusted for offline viewing.
Option Description --mirror Tells wget to mirror the site --convert-links Adjusts links for offline viewing --adjust-extension Adjusts the file extensions --page-requisites Downloads all page prerequisites --no-parent Avoids downloading links outside of the specified domain - After completion, navigate to the mirrored site directory and open the index.html file in a browser.
$ cd www.example.com $ firefox index.html
data:image/s3,"s3://crabby-images/7e416/7e4166085b1a79d78fd4339f3d906a362d6afd63" alt=""
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
Comment anonymously. Login not required.