wget -r -l1 --no-parent -A ".deb" http://www.shinken-monitoring.org/pub/debian/
-r recursively-l1 to a maximum depth of 1--no-parent ignore links to a higher directory-A "*.deb" your pattern
wget -r -l1 --no-parent -A ".deb" http://www.shinken-monitoring.org/pub/debian/
-r recursively-l1 to a maximum depth of 1--no-parent ignore links to a higher directory-A "*.deb" your pattern
You can use wget to generate a list of the URLs on a website.
Spider example.com, writing URLs to urls.txt, filtering out common media files (css, js, etc..):
wget --spider -r http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt
Note that this gives a list that duplicates URLs.
If you mirror instead of spider you seem to get a more comprehensive list without duplicates:
wget -m http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt
This will download all pages of the site into a directory with the same name as the domain.
wget is not designed for this. You can however parse its output to get what you want:
$ wget http://aligajani.com -O - 2>/dev/null |
grep -oP 'href="\Khttp:.+?"' | sed 's/"//' | grep -v facebook > file.txt
You could use lynx for this:
lynx -dump -listonly http://aligajani.com | grep -v facebook.com > file.txt
This command dumps the links of a single page. To do this recursively:
wget -r -p -k http://website
or
wget -r -p -k --wait=#SECONDS http://website
The second one is for websites that may flag you if downloading too quickly; may also cause a loss of service, so use second one for most circumstances to be courteous. Everything will be placed in a folder named the same as website in your root folder directory or whatever directory you have terminal in at time of executing command.