GNU wget can create a very usable local or offline mirror of an entire folder of a website using the following combination of options.
From the output directory, use this invocation:
URL=http://www.emsl.pnl.gov/docs/global/index.shtml wget -r -l inf -k -N -np -p -E $URL |
You can play around with the options:
- -r
- recursive
- -l inf
- follow link infinitely
- -k
- convert links after downloading
- -N
- time-stamping
- -np
- no parent
- -p
- retrieve prerequisites of pages
- -E
- page extensions according to MIME type
The files will be stored under hostname/path/to/file
. If you know up front that the origin of the files is restricted to a particular host or even subpath, you can additionally use:
- -nH
- no subdir for host
- –cut-dirs=N
- strip N path components
Alternative when you want to download a flat Apache directory listing:
wget -r -np -l1 -nd -N $URL
The option
-nd
prevents the creation of the local directory structure.Afterwards you can probably remove
index.html
androbots.txt