wget: download an HTTP or FTP directory recursively

Download data listed as directories on a website recursively to your PC as follows:

wget -r -np -nc -nH --cut-dirs=4 --random-wait --wait 1 -e robots=off http://mysite.com/aaa/bbb/ccc/ddd/

This plops the files to whatever directory you ran the command in.

To use this on an FTP site, just change the http:// to ftp:// and type the proper ftp site address.

You can add a overall quota option, for example to stop downloading after 1GB has been downloaded altogether add the option:

-Q 1g

wget recursive download option explanation:

-r download recursively (and place in recursive folders on your PC)

-np Never get parent directories (sometimes a site will link back up and you don’t want that)

-nc no clobber – don’t re-download files you already have

-nH don’t put obnoxious site name directories on your PC

--cut-dirs=4 don’t put an obnoxious hierarchy of directories above the desired directory on your PC. Note you must set the number equal to the number of directories on server (here aaa/bbb/ccc/ddd is four)

-e robots=off Many sites will block robots from mindlessly consuming huge amounts of data. Here we override this setting telling Apache that we’re (somewhat) human.

--random-wait To avoid excessive download requests (that can get you auto-banned from downloading) we politely wait in-between files–better than trying to get yourself un-banned!

--wait 1 making the random wait time average to about 1 second before starting to download the next file.

Categories:

Updated:

Leave a Comment