Recursive download HTTP / FTP with wget

Download data listed as directories on a website recursively to your PC using wget:

wget -r -np -nc -nH --cut-dirs=4 --random-wait --wait 1 -e robots=off http://mysite.com/aaa/bbb/ccc/ddd/

This plops the files to whatever directory you ran the command in.

To use wget to recursively download using FTP, change http:// to ftp:// using the FTP directory.

wget recursive download options

-r
download recursively (and place in recursive folders on your PC)
-r -l1
recurse but -l1 don’t go below specified directory
-Q 1g
overall quota option, for example to stop downloading after 1 GB has been downloaded altogether
-np
Never get parent directories (sometimes a site will link back up and you don’t want that)
-nc
no clobber – don’t re-download files you already have
-nd
no directory structure on download (put all files in one directory commanded by -P)
-nH
don’t put obnoxious site name directories on your PC
-A
only accept files matching globbed pattern
--cut-dirs=4
don’t put an obnoxious hierarchy of directories above the desired directory on your PC. Note you must set the number equal to the number of directories on server (here aaa/bbb/ccc/ddd is four)
-e robots=off
Many sites will block robots from mindlessly consuming huge amounts of data. Here we override this setting telling Apache that we’re (somewhat) human.
--random-wait
To avoid excessive download requests (that can get you auto-banned from downloading) we politely wait in-between files–better than trying to get yourself un-banned!
--wait 1
making the random wait time average to about 1 second before starting to download the next file.

Categories:

Updated:

Leave a Comment