Using wget to mirror entire website folder

GNU wget can create a very usable local or offline mirror of an entire folder of a website using the following combination of options.

From the output directory, use this invocation:

URL=http://www.emsl.pnl.gov/docs/global/index.shtml
wget -r -l inf -k -N -np -p -E $URL

You can play around with the options:

-r
recursive
-l inf
follow link infinitely
-k
convert links after downloading
-N
time-stamping
-np
no parent
-p
retrieve prerequisites of pages
-E
page extensions according to MIME type

The files will be stored under hostname/path/to/file. If you know up front that the origin of the files is restricted to a particular host or even subpath, you can additionally use:

-nH
no subdir for host
–cut-dirs=N
strip N path components

One thought on “Using wget to mirror entire website folder

  1. Bruno

    Alternative when you want to download a flat Apache directory listing:

    wget -r -np -l1 -nd -N $URL

    The option -nd prevents the creation of the local directory structure.

    Afterwards you can probably remove index.html and robots.txt

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>