How to mirror an entire website with wget

An offline mirror is useful when a published site needs to stay readable without a network connection, be reviewed before a migration, or be preserved as a point-in-time copy. GNU wget can walk the site tree, save the pages, and keep the result in a directory layout that is easy to browse locally.

Current GNU wget still treats --mirror as the recursive timestamping shortcut, while --convert-links rewrites downloaded HTML so internal navigation points at the local copy instead of the remote site. Pairing it with --backup-converted keeps the original HTML as .orig before the converted version is written, and --adjust-extension helps when HTML pages need a local .html suffix for cleaner offline viewing.

Keep the mirror bounded to approved hostnames and expect weaker results on pages that depend on browser-side rendering, authenticated sessions, or live APIs. Running the same mirror command again later rechecks timestamps and refreshes only the files that changed on the origin.

Steps to mirror an entire website with wget:

Run the mirror command with local-browsing options against the site root.

$ wget --mirror --convert-links --backup-converted --adjust-extension --page-requisites https://docs.example.net/
--2026-04-22 09:14:33--  https://docs.example.net/
Resolving docs.example.net (docs.example.net)... 203.0.113.50
Connecting to docs.example.net (docs.example.net)|203.0.113.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Saving to: 'docs.example.net/index.html'

##### snipped #####

Saving to: 'docs.example.net/assets/site.css'
Saving to: 'docs.example.net/image/logo.svg'
Saving to: 'docs.example.net/docs/index.html'
Saving to: 'docs.example.net/docs/overview.html'

FINISHED --2026-04-22 09:14:34--
Downloaded: 6 files, 1.4K in 0s (4.23 MB/s)
Converted links in 3 files in 0.001 seconds.

--mirror adds recursive retrieval with timestamping, and --backup-converted is the current GNU Wget companion for --convert-links when the saved pages need to stay browseable offline. If approved assets live on another host such as cdn.docs.example.net, add --span-hosts --domains=docs.example.net,cdn.docs.example.net so the mirror stays bounded while still fetching the required files.

Check that the converted HTML backups were created for the rewritten pages.
```
$ find docs.example.net -name '*.orig'
docs.example.net/docs/index.html.orig
docs.example.net/docs/overview.html.orig
docs.example.net/index.html.orig
```
The .orig files are the pre-conversion HTML copies, which makes them the quickest proof that --convert-links rewrote the downloaded pages and --backup-converted preserved the originals.

List the saved tree and confirm the mirror contains both pages and required assets.

$ find docs.example.net -type f
docs.example.net/assets/site.css
docs.example.net/docs/index.html
docs.example.net/docs/index.html.orig
docs.example.net/docs/overview.html
docs.example.net/docs/overview.html.orig
docs.example.net/image/logo.svg
docs.example.net/index.html
docs.example.net/index.html.orig
docs.example.net/robots.txt

A usable mirror needs the HTML entry points and the files those pages reference, not just the first document that started the crawl.

Check the saved entry page and confirm the internal links now point at local relative paths.

$ cat docs.example.net/index.html
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Docs Example</title>
  <link rel="stylesheet" href="assets/site.css">
</head>
<body>
##### snipped #####
  <nav>
    <a href="docs/index.html">Documentation</a>
    <a href="docs/overview.html">Overview</a>
  </nav>
</body>
</html>

Relative links such as docs/index.html and assets/site.css are the signal that the mirror can be browsed from disk without falling back to the remote host.