wget | Linux Notes

how to download specific files from some url path with wget

July 11, 2018 - Reading time: ~1 minute

wget -r -l1 --no-parent -A ".deb" http://www.shinken-monitoring.org/pub/debian/

-r recursively
-l1 to a maximum depth of 1
--no-parent ignore links to a higher directory
-A "*.deb" your pattern

generate a list of a site's URLs using wget

June 12, 2018 - Reading time: ~1 minute

You can use wget to generate a list of the URLs on a website.

Spider example.com, writing URLs to urls.txt, filtering out common media files (css, js, etc..):

wget --spider -r http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt

Note that this gives a list that duplicates URLs.

If you mirror instead of spider you seem to get a more comprehensive list without duplicates:

wget -m http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt

This will download all pages of the site into a directory with the same name as the domain.

download all links from a site and save to a text file

June 11, 2016 - Reading time: ~1 minute

wget is not designed for this. You can however parse its output to get what you want:

$ wget http://aligajani.com -O - 2>/dev/null | grep -oP 'href="\Khttp:.+?"' | sed 's/"//' | grep -v facebook > file.txt

You could use lynx for this:

lynx -dump -listonly http://aligajani.com | grep -v facebook.com > file.txt

This command dumps the links of a single page. To do this recursively:

      wget -r -p -k http://website

wget -r -p -k --wait=#SECONDS http://website

The second one is for websites that may flag you if downloading too quickly; may also cause a loss of service, so use second one for most circumstances to be courteous. Everything will be placed in a folder named the same as website in your root folder directory or whatever directory you have terminal in at time of executing command.

how to download specific files from some url path with wget

July 11, 2018 - Reading time: ~1 minute

generate a list of a site's URLs using wget

June 12, 2018 - Reading time: ~1 minute

download all links from a site and save to a text file

June 11, 2016 - Reading time: ~1 minute

About

Search

Resources

Linked Notes

Domains

Recent

Tags