Download data listed as directories on a website recursively to your PC using wget:
wget -r -np -nc -nH --cut-dirs=4 --random-wait --wait 1 -e robots=off http://mysite.com/aaa/bbb/ccc/ddd/
This plops the files to whatever directory you ran the command in.
To use wget to recursively download using FTP, change
ftp:// using the FTP directory.
wget recursive download options
- download recursively (and place in recursive folders on your PC)
- recurse but
-l1don’t go below specified directory
- total overall download
--quotaoption, for example to stop downloading after 1 GB has been downloaded altogether
- Never get parent directories (sometimes a site will link back up and you don’t want that)
- no clobber – don’t re-download files you already have
- no directory structure on download (put all files in one directory commanded by -P)
- don’t put obnoxious site name directories on your PC
- only accept files matching globbed pattern
- don’t put an obnoxious hierarchy of directories above the desired directory on your PC. Note you must set the number equal to the number of directories on server (here aaa/bbb/ccc/ddd is four)
- Many sites will block robots from mindlessly consuming huge amounts of data. Here we override this setting telling Apache that we’re (somewhat) human.
- To avoid excessive download requests (that can get you auto-banned from downloading) we politely wait in-between files–better than trying to get yourself un-banned!
- making the random wait time average to about 1 second before starting to download the next file.