Saturday, November 7, 2015

Mingling with Regular Expression Python

Regular expression is a programming language itself within python. So, we need to enter the domain of regular expression very carefully. Today, I will just touch the regular expression, we will go top the depth of it as your as well as my knowledge reach to a more advanced level.

To work with regular expression , you need to import the re module e.g.

>> import re

Today we will search something in a file , may be some text by  opening it and then use regular expression to do so.

fh=open('makelist.txt')
for line in fh:
    #print line
    pos=line.find('est')
    if pos==-1: continue
    print line


The above program opens a file , find the position of the string 'est' within the file and then if found the string then print the line. Lets do it with the help of regular expression -

import re
fh=open('makelist.txt')

for line in fh:
    if re.search('est',line):
        print line


In the above program I have imported the 're' module and then as usually opened the file and iterate through each line. The re.search option is doing the same thing as the find in the previous example. But the search line eliminate the essence of the conditional used before. 
You can not find many usefulness of the regular expression through this program but one - the shortening of the program , it is the very purpose we use regular expression.
The next example replaces the use of string method 'startswith'  just a '^' symbol.

fh=open('makelist.txt')
for line in fh:
    line.rstrip()
    if line.startswith('bogomips'):
        print line 

 This will all the line starts with the word inside the line.startswith() method. Now do it by regular expression. In RE we use '^' to say the start of string or search match. 

import re
fh=open('makelist.txt')
for line in fh:
    if re.search('^bogomips',line):
        print line


Another important feature in regular expression of python is the re.findall() function. Unlike search it will create a list of the needed key-phrases searched through the program and return it. See the example below  -

import urllib,re

url="http://"+str(raw_input("Enter site"))
page=urllib.urlopen(url).read()


needed_lines=re.findall('[a-z]+',page)
print needed_lines


The program opens a site, read the whole page. And then we find all the lowercase phrases of the program bt re.findall('[a-z]+')  and then print it out. The output is the list of all lower case words in the page.We can use all the below given regular expression with it and find anything we want in the page.

In the next tutorial I will introduce the findall method of regular expression. Below are some commonly used regular expressions for you, tak ea print out and remember forever if you want to work with it.

^        Matches the beginning of a line
$        Matches the end of the line
.        Matches any character
\s       Matches whitespace
\S       Matches any non-whitespace character
*        Repeats a character zero or more times
*?       Repeats a character zero or more times
         (non-greedy)
+        Repeats a character one or more times
+?       Repeats a character one or more times
         (non-greedy)
[aeiou]  Matches a single character in the listed set
[^XYZ]   Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
(        Indicates where string extraction is to start
)        Indicates where string extraction is to end

(Thank you coursera for handing this info kit)

 Please subscribe and stay tuned for the youtube video.
 
 

No comments:

Post a Comment

Feautured Post

Python Trivia Post: enumerate()

Python has a very useful inbuilt function, which is called enumerate(). This is needed whenever you need the numbering of a certain element...