Python: extract all hyperlinks from a webpage

I know they must be thousands of programs to do this, but just thought i would give it a try. Its pretty easy. I will keep editing this as and when I improve my regular expression to do this.

import urllib as ul
import BeautifulSoup as bs
import re

myFile = ul.urlopen(
soup = bs.BeautifulSoup(myFile)
#print soup.prettify()
for anchor in soup.findAll(a):
#print re.match(href,anchor)
myString = str(anchor)
#print myString
print myString[a:b]
print error


Leave a Reply

Your email address will not be published. Required fields are marked *