Tuesday, August 4, 2015

Introduction to urllib

Before I move on to the little browser project which will open a web page and show the value my favorite hare price and also can be used to see basic html view of a website, I need to introduce you to a the urllib class of python.

So,open your IDLE and write import urllib, then press enter. You then type help(urllib) and take a quick glance on the reading material there. Then, let;s start. Lets open my site

import urllib
page=urllib.urlopen('http://pythonbeginner.in')
print page.read()

only these three steps are enough to open a webpage and read the html file to your idle console. look at another program -


import urllib2
import repage = urllib2.urlopen('http://www.pythonbeginner.in').read()
matches = re.findall('python', page);
if len(matches) == 0:
   print ('Not about python')
else:
   print ('My site is about python')


Look at the program above. I have used another module urllib2 and also the re (Regular Expression) module. urlopen again opens the file and reads it. Then we search for 'python' string in the whole html file and if we found it then print 'My site is about python' otherwise 'Not about python'. Don't bother about regular expression as I will cover it in later section.

The urllib and urlopen are the classs we will need in the next example but you practice all the method and class defined in dir(urllib) or urllib2. Sometimes html tags are awful and if I need to remove the tagging then we have to do the things which the next program will demonstrate -

import urllib,htmllib,sys,formatter
page=urllib.urlopen("https://www.pythonbeginner.in")
data=page.read()

page.close()
new_format=formatter.AbstractFormatter(formatter.DumbWriter(sys.stdout))
ptext=htmllib.HTMLParser(new_format)
ptext.feed(data)
ptext.close()


The last program is your food for thought as a lot of things are not defined yet and I will define and describe the procedure in due time. Please subscribe for next blog entry.

 

No comments:

Post a Comment

Feautured Post

Python Trivia Post: enumerate()

Python has a very useful inbuilt function, which is called enumerate(). This is needed whenever you need the numbering of a certain element...