HTTrack#
HTTrack is a free and open-source Web crawler and offline browser allowing users to download World Wide Web sites from the Internet to a local computer. (Wikipédia).
Yep, this is a website mirroring software !
Download a website, a little bit faster#
Here is the command line you can use, explanations follows :
httrack \
--connection-per-second=50 \
--sockets=80 \
--keep-alive \
--display \
--verbose \
--advanced-progressinfo \
--disable-security-limits \
-n \
-i \
-s0 \
-m \
-F 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0' \
-A100000000 \
-#L500000000 \
'YOURURL'
--connection-per-second
same as%c
: maximum number of connections/seconds--sockets=80
same as-c
: number of multiple connections. If this gives you errors, lower this to48
.--keep-alive
same as-k
: use keep-alive if possible, greately reducing latency for small files and test requests--display
display on screen filenames downloaded (in realtime)--verbose
log on screen--advanced-progressinfo
display ugly progress information--disable-security-limits
bypass built-in security limits aimed to avoid bandwidth abuses (bandwidth, simultaneous connections)-n
same as--near
: get non-html files near an html file (ex: an image located outside)-i
same as--continue
: continue an interrupted mirror using the cache-s0
follow robots.txt and meta robots tags (0
=never,1
=sometimes,*2
=always,3
=always (even strict rules))-F 'custom user agent here'
user−agent field sent in HTTP headers-A100000000
same as--max-rate
: maximum transfer rate in bytes/seconds. 100mb/sec here.-#L500000000
same as--advanced-maxlinks
: raises the maximum amount of links HTTrack fetches to 500M. Raise if needed.
More information ? See the official doc.
Credits#
Image transformed from "HaringAi learns about vacuum cleaners" by Alan Stanton is licensed under CC BY-SA 2.0.