download all links from a site and save to a text file

June 11, 2016 - Reading time: ~1 minute

wget is not designed for this. You can however parse its output to get what you want:

$ wget http://aligajani.com -O - 2>/dev/null | grep -oP 'href="\Khttp:.+?"' | sed 's/"//' | grep -v facebook > file.txt

You could use lynx for this:

lynx -dump -listonly http://aligajani.com | grep -v facebook.com > file.txt

This command dumps the links of a single page. To do this recursively:

      wget -r -p -k http://website 

or

wget -r -p -k --wait=#SECONDS http://website

The second one is for websites that may flag you if downloading too quickly; may also cause a loss of service, so use second one for most circumstances to be courteous. Everything will be placed in a folder named the same as website in your root folder directory or whatever directory you have terminal in at time of executing command.