HTTrack is a free and open-source Web crawler and offline browser allowing users to download World Wide Web sites from the Internet to a local computer. (Wikipédia).
Yep, this is a website mirroring software !
Download a website, a little bit faster#
Here is the command line you can use, explanations follows :
httrack \ --connection-per-second=50 \ --sockets=80 \ --keep-alive \ --display \ --verbose \ --advanced-progressinfo \ --disable-security-limits \ -n \ -i \ -s0 \ -m \ -F 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0' \ -A100000000 \ -#L500000000 \ 'YOURURL'
%c: maximum number of connections/seconds
-c: number of multiple connections. If this gives you errors, lower this to
-k: use keep-alive if possible, greately reducing latency for small files and test requests
--displaydisplay on screen filenames downloaded (in realtime)
--verboselog on screen
--advanced-progressinfodisplay ugly progress information
--disable-security-limitsbypass built-in security limits aimed to avoid bandwidth abuses (bandwidth, simultaneous connections)
--near: get non-html files near an html file (ex: an image located outside)
--continue: continue an interrupted mirror using the cache
-s0follow robots.txt and meta robots tags (
3=always (even strict rules))
-F 'custom user agent here'user−agent field sent in HTTP headers
--max-rate: maximum transfer rate in bytes/seconds. 100mb/sec here.
--advanced-maxlinks: raises the maximum amount of links HTTrack fetches to 500M. Raise if needed.
More information ? See the official doc.
Image transformed from "HaringAi learns about vacuum cleaners" by Alan Stanton is licensed under CC BY-SA 2.0.