Quantcast
Channel: Go4Expert
Viewing all articles
Browse latest Browse all 1989

Link Extractor in Python

$
0
0
Link Extractor as the name suggests, scrapes all the URLs from a particular webpage. In the following code the extractor can extract relative as well as absolute URL's from a HTML webpage, and outputs them is a more readable and useful format.

The Code



Code:
from BeautifulSoup import BeautifulSoup
import urllib2
from urlparse import urljoin # to support relative urls
import sys
import re

def checkUrl(url) :
    # django regex for url validation...
Link Extractor in Python

Viewing all articles
Browse latest Browse all 1989

Trending Articles