SlideShare a Scribd company logo
MODULE 3 – PART 4
REGULAR EXPRESSIONS
By,
Ravi Kumar B N
Assistant professor, Dept. of CSE
BMSIT & M
➢ Regular expression is a sequence of characters that define a search pattern.
➢ patterns are used by string searching algorithms for "find" or "find and
replace" operations on strings, or for input validation.
➢ The regular expression library “re” must be imported into our program before
we can use it.
INTRODUCTION
➢ search() function: used to search for a particular string. will only return the first occurrence that
matches the specified pattern.
This function is available in “re” library.
➢ the caret character (^) : is used in regular expressions to match the beginning of a line.
➢ The dollar character ($) : is used in regular expressions to match the end of a line.
Example: program to match only lines where “From:” is at the beginning of the line
import re
hand = open('mbox1.txt')
for line in hand:
line = line.rstrip()
if re.search('^From:', line) :
print(line)
#Output
From:stephen Sat Jan 5 09:14:16 2008
From: louis@media.berkeley.edu Mon Jan 4 16:10:39 2008
From:zqian@umich.edu Fri Jan 4 16:10:39 2008
mbox1.txt
From:stephen Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
From: louis@media.berkeley.edu Mon Jan 4 16:10:39 2008
Subject: [sakai] svn commit:
From:zqian@umich.edu Fri Jan 4 16:10:39 2008
Return-Path: <postmaster@collab.sakaiproject.org>
✓ The instruction re.search('^From:', line) equivalent with the startswith() method from the
string library.
SEARCH() FUNCTION:
➢ The dot character (.) : The most commonly used special character is the period (”dot”) or full
stop, which matches any character.
The regular expression “F..m:” would match any of the following strings since the period
characters in the regular expression match any character.
“From:”, “Fxxm:”, “F12m:”, or “F!@m:”
➢ The program in the previous slide is rewritten using dot character which gives the same output
CHARACTER MATCHING IN REGULAR
EXPRESSIONS
import re
hand = open('mbox1.txt')
for line in hand:
line = line.rstrip()
if re.search(‘^F..m:', line) :
print(line)
#Output
From:stephen Sat Jan 5 09:14:16 2008
From: louis@media.berkeley.edu Mon Jan 4 16:10:39 2008
From:zqian@umich.edu Fri Jan 4 16:10:39 2008
Character can be repeated any number of times using the “*” or “+” characters in a
regular expression.
➢ The Asterisk character (*) : matches zero-or-more characters
➢ The Plus character (+) : matches one-or-more characters
Example: Program to match lines that start with “From:”, followed by mail-id
import re
hand = open('mbox1.txt')
for line in hand:
line = line.rstrip()
if re.search(‘^From:.+@', line) :
print(line)
#Output
From: louis@media.berkeley.edu Mon Jan 4 16:10:39 2008
From:zqian@umich.edu Fri Jan 4 16:10:39 2008
✓ The search string “ˆFrom:.+@” will successfully match lines that start with “From:”, followed by one
or more characters (“.+”), followed by an at-sign. The “.+” wildcard matches all the characters
between the colon character and the at-sign.
➢ non-whitespace character (S) - matches one non-whitespace character
➢findall() function: It is used to search for “all” occurrences that match a given pattern.
In contrast, search() function will only return the first occurrence that matches the specified pattern.
import re
s = 'Hello from csev@umich.edu to cwen@iupui.edu about the meeting @2PM'
lst = re.findall('S+@S+', s)
print(lst)
#output
['csev@umich.edu', 'cwen@iupui.edu']
Example1: Program returns a list of all of the strings that look like email addresses from a given line.
# same program using search() it will display only first mail id or first
matching string
import re
s = 'Hello from csev@umich.edu to cwen@iupui.edu about the meeting @2PM'
lst = re.search('S+@S+', s)
print(lst)
#output
<re.Match object; span=(11, 25), match='csev@umich.edu'>
'S+@S+’ this regular expression
matches substrings that have at least one
non-whitespace character, followed by an
at-sign, followed by at least one more
non-whitespace character
Example2: Program returns a list of all of the strings that look like email addresses from a given file.
import re
hand = open('mbox1.txt')
for line in hand:
line = line.rstrip()
x = re.findall('S+@S+', line)
if len(x) > 0 :
print(x)
#Output
['<postmaster@collab.sakaiproject.org>']
['louis@media.berkeley.edu']
['zqian@umich.edu']
['<postmaster@collab.sakaiproject.org>']
➢ Square brackets “[]” : square brackets are used to indicate a set of multiple acceptable characters we
are willing to consider matching.
Example: [a-z] matches single lowercase letter
[A-Z] matches single uppercase letter
[a-zA-Z] matches single lowercase letter or uppercase letter
[a-zA-Z0-9] matches single lowercase letter or uppercase letter or number
Some of our email addresses have incorrect characters like
“<” or “;” at the beginning or end. we are only interested in
the portion of the string that starts and ends with a letter or
a number. To get the proper output we have to use following
character.
[amk] matches 'a', 'm', or ’k’
[(+*)] matches any of the literal characters ’(‘ , '+’, '*’, or ’)’
[0-5][0-9] matches all the two-digits numbers from 00 to 59
➢ Characters that are not within a range can be matched by complementing the set
If the first character of the set is '^', all the characters that are not in the set will be matched.
For example,
[^5] will match any character except ’5’
Ex: Program returns list of all email addresses in proper format.
import re
hand = open('mbox.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[a-zA-Z0-9]S*@S*[a-zA-Z]', line)
if len(x) > 0 :
print(x)
#output
['postmaster@collab.sakaiproject.org']
['louis@media.berkeley.edu']
['zqian@umich.edu']
['postmaster@collab.sakaiproject.org']
[a-zA-Z0-9]S*@S*[a-zA-Z] : substrings that start with a
single lowercase letter, uppercase letter, or number “[a-zA-
Z0-9]”, followed by zero or more non-blank characters “S*”,
followed by an at-sign, followed by zero or more non-blank
characters “S*”, followed by an uppercase or lowercase
letter “[a-zA-Z]”.
SEARCH AND EXTRACT
import re
hand = open('mbox2.txt')
for line in hand:
line = line.rstrip()
if re.search('^XS*: [0-9.]+', line) :
print(line)
#Output
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.9245
Example1: Find numbers on lines that start with the string “X-”
lines such as: X-DSPAM-Confidence: 0.8475
➢ parentheses “()” in regular expression : used to extract a portion of the substring that
matches the regular expression.
import re
hand = open('mbox2.txt')
for line in hand:
line = line.rstrip()
x = re.findall('^XS*: ([0-9.]+)', line)
if len(x) > 0 :
print(x) Search
#Output
['0.8475’] Extract
['0.9245']
mbox2.txt
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-5-x/conten
impl/impl/src/java/org
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan 5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.9245
Above output has entire line we only want to extract
numbers from lines that have the above syntax
import re
hand = open('mbox1.txt')
for line in hand:
line = line.rstrip()
x = re.findall('^From.* ([0-3][0-9]):', line)
if len(x) > 0 :
print(x)
#Output
['09']
['16']
['16']
Example2: Program to print the day of received mails
RANDOM EXECUTION
>>> s=" 0.9 .90 1.0 1. 138 pqr“
>>> re.findall('[0-9.]+',s)
['0.9', '.90', '1.0', '1.', '138’]
>>> re.findall('[0-9]+[.][0-9]',s)
['0.9', '1.0’]
>>> re.findall('[0-9]+[.][0-9]+',s)
['0.9', '1.0']
>>> re.findall('[0-9]*[.][0-9]+’,s)
['0.9', '.90', '1.0’]
>>> usn="1bycs123, 1byec249, 1bycs009, 1byme209, 1byis112, 1byee190“
>>> re.findall('1bycs...',usn)
['1bycs123', '1bycs009’]
>>> re.findall('[a-zA-Z0-9]+cs[0-9]+',usn)
['1bycs123', '1bycs009’]
>>> usn="1bycs123, 1byec249, 1bycs009, 1byme209, 1vecs112, 1svcs190"
>>> re.findall('[a-zA-Z0-9]+cs[0-9]+',usn)
['1bycs123', '1bycs009', '1vecs112', '1svcs190’]
>>> re.findall('[0-9]+cs[0-9]+',usn)
[]
>>> re.findall('[a-zA-Z0-9]+cs([0-9]+)',usn)
['123', '009', '112', '190']
ESCAPE CHARACTER
➢ Escape character (backslash "" ) is a metacharacter in regular expressions. It allow special
characters to be used without invoking their special meaning.
If you want to match 1+1=2, the correct regex is 1+1=2. Otherwise, the plus sign has a
special meaning.
For example, we can find money amounts with the following regular expression.
>>>import re
>>>x = 'We just received $10.00 for cookies.’
>>>y = re.findall(‘$[0-9.]+’,x)
>>> y
['$10.00']
SUMMARY
Character Meaning
ˆ Matches the beginning of the line
$ Matches the end of the line
. Matches any character (a wildcard)
s Matches a whitespace character
S Matches a non-whitespace character (opposite of s)
* Applies to the immediately preceding character and indicates to match zero or more of the
preceding character(s)
*? Applies to the immediately preceding character and indicates to match zero or more of the
preceding character(s) in “non-greedy mode”
+ Applies to the immediately preceding character and indicates to match one or more of the
preceding character(s)
+? Applies to the immediately preceding character and indicates to match one or more of the
preceding character(s) in “non-greedy mode”.
[aeiou] Matches a single character as long as that character is in the specified set. In this example, it would
match “a”, “e”, “i”, “o”, or “u”, but no other characters.
[a-z0-9] You can specify ranges of characters using the minus sign. This example is a single character that
must be a lowercase letter or a digit.
Character Meaning
[ˆA-Za-z] When the first character in the set notation is a caret, it inverts the logic. This example matches
a single character that is anything other than an uppercase or lowercase letter.
( ) When parentheses are added to a regular expression, they are ignored for the purpose of
matching, but allow you to extract a particular subset of the matched string rather than the
whole string when using findall()
b Matches the empty string, but only at the start or end of a word.
B Matches the empty string, but not at the start or end of a word
d Matches any decimal digit; equivalent to the set [0-9].
D Matches any non-digit character; equivalent to the set [ˆ0-9]
ASSIGNMENT
1) Write a python program to check the validity of a Password In this program, we will be taking a
password as a combination of alphanumeric characters along with special characters, and check whether
the password is valid or not with the help of few conditions.
Primary conditions for password validation :
1.Minimum 8 characters.
2.The alphabets must be between [a-z]
3.At least one alphabet should be of Upper Case [A-Z]
4.At least 1 number or digit between [0-9].
5.At least 1 character from [ _ or @ or $ ].
2) Write a pattern for the following:
Pattern to extract lines starting with the word From (or from) and ending with edu.
Pattern to extract lines ending with any digit.
Start with upper case letters and end with digits.
Search for the first white-space character in the string and display its position.
Replace every white-space character with the number 9: consider a sample text txt = "The rain in Spain"
THANK
YOU

More Related Content

What's hot (20)

PPTX
Java Methods
OXUS 20
 
PPTX
Chapter 03 python libraries
Praveen M Jigajinni
 
PPTX
6-Python-Recursion PPT.pptx
Venkateswara Babu Ravipati
 
PPT
Input and output in C++
Nilesh Dalvi
 
PPTX
Python- Regular expression
Megha V
 
PDF
Python programming : Standard Input and Output
Emertxe Information Technologies Pvt Ltd
 
PDF
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
Balwant Gorad
 
PPTX
Templates in C++
Tech_MX
 
PPTX
Chapter 08 data file handling
Praveen M Jigajinni
 
PPTX
Functions in c
sunila tharagaturi
 
PPSX
Modules and packages in python
TMARAGATHAM
 
PPTX
Class, object and inheritance in python
Santosh Verma
 
PPTX
Operators in Python
Anusuya123
 
PPT
Regular Languages
parmeet834
 
PPTX
Input processing and output in Python
Raajendra M
 
PPTX
Regular expressions
Ratnakar Mikkili
 
PPTX
Functions in python
colorsof
 
PPT
Strings
Nilesh Dalvi
 
PPTX
Print input-presentation
Martin McBride
 
PPTX
Types of Statements in Python Programming Language
Explore Skilled
 
Java Methods
OXUS 20
 
Chapter 03 python libraries
Praveen M Jigajinni
 
6-Python-Recursion PPT.pptx
Venkateswara Babu Ravipati
 
Input and output in C++
Nilesh Dalvi
 
Python- Regular expression
Megha V
 
Python programming : Standard Input and Output
Emertxe Information Technologies Pvt Ltd
 
Linked List, Types of Linked LIst, Various Operations, Applications of Linked...
Balwant Gorad
 
Templates in C++
Tech_MX
 
Chapter 08 data file handling
Praveen M Jigajinni
 
Functions in c
sunila tharagaturi
 
Modules and packages in python
TMARAGATHAM
 
Class, object and inheritance in python
Santosh Verma
 
Operators in Python
Anusuya123
 
Regular Languages
parmeet834
 
Input processing and output in Python
Raajendra M
 
Regular expressions
Ratnakar Mikkili
 
Functions in python
colorsof
 
Strings
Nilesh Dalvi
 
Print input-presentation
Martin McBride
 
Types of Statements in Python Programming Language
Explore Skilled
 

Similar to Python Regular Expressions (20)

PPTX
Pythonlearn-11-Regex.pptx
Dave Tan
 
PPTX
Regular expressions,function and glob module.pptx
Ramakrishna Reddy Bijjam
 
PPTX
Regular_Expressions.pptx
DurgaNayak4
 
PPTX
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
PPT
scanf function in c, variations in conversion specifier
herosaikiran
 
PPSX
Regular expressions in oracle
Logan Palanisamy
 
PPT
Regular Expressions 2007
Geoffrey Dunn
 
PPT
Regular expressions
Raj Gupta
 
PPTX
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
PPT
L5_regular expression command for linux unix
Devendra Meena
 
PDF
Strings brief introduction in python.pdf
TODAYIREAD1
 
DOCX
For this assignment, download the A6 code pack. This zip fil.docx
alfred4lewis58146
 
DOCX
Shad_Cryptography_PracticalFile_IT_4th_Year (1).docx
Sonu62614
 
PPTX
php string part 4
monikadeshmane
 
PPTX
Python programming: Anonymous functions, String operations
Megha V
 
PDF
regular-expression.pdf
DarellMuchoko
 
PPTX
unit-5 String Math Date Time AI presentation
MukeshTheLioner
 
PDF
lecture_lex.pdf
DrNilotpalChakrabort
 
PDF
Beginning with vi text editor
Jose Pla
 
PDF
Programming in lua STRING AND ARRAY
vikram mahendra
 
Pythonlearn-11-Regex.pptx
Dave Tan
 
Regular expressions,function and glob module.pptx
Ramakrishna Reddy Bijjam
 
Regular_Expressions.pptx
DurgaNayak4
 
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
scanf function in c, variations in conversion specifier
herosaikiran
 
Regular expressions in oracle
Logan Palanisamy
 
Regular Expressions 2007
Geoffrey Dunn
 
Regular expressions
Raj Gupta
 
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
L5_regular expression command for linux unix
Devendra Meena
 
Strings brief introduction in python.pdf
TODAYIREAD1
 
For this assignment, download the A6 code pack. This zip fil.docx
alfred4lewis58146
 
Shad_Cryptography_PracticalFile_IT_4th_Year (1).docx
Sonu62614
 
php string part 4
monikadeshmane
 
Python programming: Anonymous functions, String operations
Megha V
 
regular-expression.pdf
DarellMuchoko
 
unit-5 String Math Date Time AI presentation
MukeshTheLioner
 
lecture_lex.pdf
DrNilotpalChakrabort
 
Beginning with vi text editor
Jose Pla
 
Programming in lua STRING AND ARRAY
vikram mahendra
 
Ad

More from BMS Institute of Technology and Management (15)

PDF
Artificial Neural Networks: Introduction, Neural Network representation, Appr...
BMS Institute of Technology and Management
 
PPTX
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
BMS Institute of Technology and Management
 
PPTX
Classification: MNIST, training a Binary classifier, performance measure, mul...
BMS Institute of Technology and Management
 
PPTX
ML_Module1.Introduction_and_conceprtLearning_pptx.pptx
BMS Institute of Technology and Management
 
PDF
Software Engineering and Introduction, Activities and ProcessModels
BMS Institute of Technology and Management
 
PDF
File handling in Python
BMS Institute of Technology and Management
 
PPTX
Introduction to the Python
BMS Institute of Technology and Management
 
DOCX
15CS562 AI VTU Question paper
BMS Institute of Technology and Management
 
PPTX
strong slot and filler
BMS Institute of Technology and Management
 
PPT
Problems, Problem spaces and Search
BMS Institute of Technology and Management
 
PDF
Introduction to Artificial Intelligence and few examples
BMS Institute of Technology and Management
 
Artificial Neural Networks: Introduction, Neural Network representation, Appr...
BMS Institute of Technology and Management
 
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
BMS Institute of Technology and Management
 
Classification: MNIST, training a Binary classifier, performance measure, mul...
BMS Institute of Technology and Management
 
ML_Module1.Introduction_and_conceprtLearning_pptx.pptx
BMS Institute of Technology and Management
 
Software Engineering and Introduction, Activities and ProcessModels
BMS Institute of Technology and Management
 
Introduction to the Python
BMS Institute of Technology and Management
 
15CS562 AI VTU Question paper
BMS Institute of Technology and Management
 
Problems, Problem spaces and Search
BMS Institute of Technology and Management
 
Introduction to Artificial Intelligence and few examples
BMS Institute of Technology and Management
 
Ad

Recently uploaded (20)

PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PDF
monopile foundation seminar topic for civil engineering students
Ahina5
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PDF
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
Design Thinking basics for Engineers.pdf
CMR University
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
monopile foundation seminar topic for civil engineering students
Ahina5
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
MRRS Strength and Durability of Concrete
CivilMythili
 

Python Regular Expressions

  • 1. MODULE 3 – PART 4 REGULAR EXPRESSIONS By, Ravi Kumar B N Assistant professor, Dept. of CSE BMSIT & M
  • 2. ➢ Regular expression is a sequence of characters that define a search pattern. ➢ patterns are used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. ➢ The regular expression library “re” must be imported into our program before we can use it. INTRODUCTION
  • 3. ➢ search() function: used to search for a particular string. will only return the first occurrence that matches the specified pattern. This function is available in “re” library. ➢ the caret character (^) : is used in regular expressions to match the beginning of a line. ➢ The dollar character ($) : is used in regular expressions to match the end of a line. Example: program to match only lines where “From:” is at the beginning of the line import re hand = open('mbox1.txt') for line in hand: line = line.rstrip() if re.search('^From:', line) : print(line) #Output From:stephen Sat Jan 5 09:14:16 2008 From: [email protected] Mon Jan 4 16:10:39 2008 From:[email protected] Fri Jan 4 16:10:39 2008 mbox1.txt From:stephen Sat Jan 5 09:14:16 2008 Return-Path: <[email protected]> From: [email protected] Mon Jan 4 16:10:39 2008 Subject: [sakai] svn commit: From:[email protected] Fri Jan 4 16:10:39 2008 Return-Path: <[email protected]> ✓ The instruction re.search('^From:', line) equivalent with the startswith() method from the string library. SEARCH() FUNCTION:
  • 4. ➢ The dot character (.) : The most commonly used special character is the period (”dot”) or full stop, which matches any character. The regular expression “F..m:” would match any of the following strings since the period characters in the regular expression match any character. “From:”, “Fxxm:”, “F12m:”, or “F!@m:” ➢ The program in the previous slide is rewritten using dot character which gives the same output CHARACTER MATCHING IN REGULAR EXPRESSIONS import re hand = open('mbox1.txt') for line in hand: line = line.rstrip() if re.search(‘^F..m:', line) : print(line) #Output From:stephen Sat Jan 5 09:14:16 2008 From: [email protected] Mon Jan 4 16:10:39 2008 From:[email protected] Fri Jan 4 16:10:39 2008
  • 5. Character can be repeated any number of times using the “*” or “+” characters in a regular expression. ➢ The Asterisk character (*) : matches zero-or-more characters ➢ The Plus character (+) : matches one-or-more characters Example: Program to match lines that start with “From:”, followed by mail-id import re hand = open('mbox1.txt') for line in hand: line = line.rstrip() if re.search(‘^From:.+@', line) : print(line) #Output From: [email protected] Mon Jan 4 16:10:39 2008 From:[email protected] Fri Jan 4 16:10:39 2008 ✓ The search string “ˆFrom:.+@” will successfully match lines that start with “From:”, followed by one or more characters (“.+”), followed by an at-sign. The “.+” wildcard matches all the characters between the colon character and the at-sign.
  • 6. ➢ non-whitespace character (S) - matches one non-whitespace character ➢findall() function: It is used to search for “all” occurrences that match a given pattern. In contrast, search() function will only return the first occurrence that matches the specified pattern. import re s = 'Hello from [email protected] to [email protected] about the meeting @2PM' lst = re.findall('S+@S+', s) print(lst) #output ['[email protected]', '[email protected]'] Example1: Program returns a list of all of the strings that look like email addresses from a given line. # same program using search() it will display only first mail id or first matching string import re s = 'Hello from [email protected] to [email protected] about the meeting @2PM' lst = re.search('S+@S+', s) print(lst) #output <re.Match object; span=(11, 25), match='[email protected]'> 'S+@S+’ this regular expression matches substrings that have at least one non-whitespace character, followed by an at-sign, followed by at least one more non-whitespace character
  • 7. Example2: Program returns a list of all of the strings that look like email addresses from a given file. import re hand = open('mbox1.txt') for line in hand: line = line.rstrip() x = re.findall('S+@S+', line) if len(x) > 0 : print(x) #Output ['<[email protected]>'] ['[email protected]'] ['[email protected]'] ['<[email protected]>'] ➢ Square brackets “[]” : square brackets are used to indicate a set of multiple acceptable characters we are willing to consider matching. Example: [a-z] matches single lowercase letter [A-Z] matches single uppercase letter [a-zA-Z] matches single lowercase letter or uppercase letter [a-zA-Z0-9] matches single lowercase letter or uppercase letter or number Some of our email addresses have incorrect characters like “<” or “;” at the beginning or end. we are only interested in the portion of the string that starts and ends with a letter or a number. To get the proper output we have to use following character.
  • 8. [amk] matches 'a', 'm', or ’k’ [(+*)] matches any of the literal characters ’(‘ , '+’, '*’, or ’)’ [0-5][0-9] matches all the two-digits numbers from 00 to 59 ➢ Characters that are not within a range can be matched by complementing the set If the first character of the set is '^', all the characters that are not in the set will be matched. For example, [^5] will match any character except ’5’ Ex: Program returns list of all email addresses in proper format. import re hand = open('mbox.txt') for line in hand: line = line.rstrip() x = re.findall('[a-zA-Z0-9]S*@S*[a-zA-Z]', line) if len(x) > 0 : print(x) #output ['[email protected]'] ['[email protected]'] ['[email protected]'] ['[email protected]'] [a-zA-Z0-9]S*@S*[a-zA-Z] : substrings that start with a single lowercase letter, uppercase letter, or number “[a-zA- Z0-9]”, followed by zero or more non-blank characters “S*”, followed by an at-sign, followed by zero or more non-blank characters “S*”, followed by an uppercase or lowercase letter “[a-zA-Z]”.
  • 9. SEARCH AND EXTRACT import re hand = open('mbox2.txt') for line in hand: line = line.rstrip() if re.search('^XS*: [0-9.]+', line) : print(line) #Output X-DSPAM-Confidence: 0.8475 X-DSPAM-Probability: 0.9245 Example1: Find numbers on lines that start with the string “X-” lines such as: X-DSPAM-Confidence: 0.8475 ➢ parentheses “()” in regular expression : used to extract a portion of the substring that matches the regular expression. import re hand = open('mbox2.txt') for line in hand: line = line.rstrip() x = re.findall('^XS*: ([0-9.]+)', line) if len(x) > 0 : print(x) Search #Output ['0.8475’] Extract ['0.9245'] mbox2.txt From: [email protected] Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-5-x/conten impl/impl/src/java/org X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8 X-Content-Type-Message-Body: text/plain; charset=UTF-8 Content-Type: text/plain; charset=UTF-8 X-DSPAM-Result: Innocent X-DSPAM-Processed: Sat Jan 5 09:14:16 2008 X-DSPAM-Confidence: 0.8475 X-DSPAM-Probability: 0.9245 Above output has entire line we only want to extract numbers from lines that have the above syntax
  • 10. import re hand = open('mbox1.txt') for line in hand: line = line.rstrip() x = re.findall('^From.* ([0-3][0-9]):', line) if len(x) > 0 : print(x) #Output ['09'] ['16'] ['16'] Example2: Program to print the day of received mails
  • 11. RANDOM EXECUTION >>> s=" 0.9 .90 1.0 1. 138 pqr“ >>> re.findall('[0-9.]+',s) ['0.9', '.90', '1.0', '1.', '138’] >>> re.findall('[0-9]+[.][0-9]',s) ['0.9', '1.0’] >>> re.findall('[0-9]+[.][0-9]+',s) ['0.9', '1.0'] >>> re.findall('[0-9]*[.][0-9]+’,s) ['0.9', '.90', '1.0’] >>> usn="1bycs123, 1byec249, 1bycs009, 1byme209, 1byis112, 1byee190“ >>> re.findall('1bycs...',usn) ['1bycs123', '1bycs009’] >>> re.findall('[a-zA-Z0-9]+cs[0-9]+',usn) ['1bycs123', '1bycs009’] >>> usn="1bycs123, 1byec249, 1bycs009, 1byme209, 1vecs112, 1svcs190" >>> re.findall('[a-zA-Z0-9]+cs[0-9]+',usn) ['1bycs123', '1bycs009', '1vecs112', '1svcs190’] >>> re.findall('[0-9]+cs[0-9]+',usn) [] >>> re.findall('[a-zA-Z0-9]+cs([0-9]+)',usn) ['123', '009', '112', '190']
  • 12. ESCAPE CHARACTER ➢ Escape character (backslash "" ) is a metacharacter in regular expressions. It allow special characters to be used without invoking their special meaning. If you want to match 1+1=2, the correct regex is 1+1=2. Otherwise, the plus sign has a special meaning. For example, we can find money amounts with the following regular expression. >>>import re >>>x = 'We just received $10.00 for cookies.’ >>>y = re.findall(‘$[0-9.]+’,x) >>> y ['$10.00']
  • 13. SUMMARY Character Meaning ˆ Matches the beginning of the line $ Matches the end of the line . Matches any character (a wildcard) s Matches a whitespace character S Matches a non-whitespace character (opposite of s) * Applies to the immediately preceding character and indicates to match zero or more of the preceding character(s) *? Applies to the immediately preceding character and indicates to match zero or more of the preceding character(s) in “non-greedy mode” + Applies to the immediately preceding character and indicates to match one or more of the preceding character(s) +? Applies to the immediately preceding character and indicates to match one or more of the preceding character(s) in “non-greedy mode”. [aeiou] Matches a single character as long as that character is in the specified set. In this example, it would match “a”, “e”, “i”, “o”, or “u”, but no other characters. [a-z0-9] You can specify ranges of characters using the minus sign. This example is a single character that must be a lowercase letter or a digit.
  • 14. Character Meaning [ˆA-Za-z] When the first character in the set notation is a caret, it inverts the logic. This example matches a single character that is anything other than an uppercase or lowercase letter. ( ) When parentheses are added to a regular expression, they are ignored for the purpose of matching, but allow you to extract a particular subset of the matched string rather than the whole string when using findall() b Matches the empty string, but only at the start or end of a word. B Matches the empty string, but not at the start or end of a word d Matches any decimal digit; equivalent to the set [0-9]. D Matches any non-digit character; equivalent to the set [ˆ0-9]
  • 15. ASSIGNMENT 1) Write a python program to check the validity of a Password In this program, we will be taking a password as a combination of alphanumeric characters along with special characters, and check whether the password is valid or not with the help of few conditions. Primary conditions for password validation : 1.Minimum 8 characters. 2.The alphabets must be between [a-z] 3.At least one alphabet should be of Upper Case [A-Z] 4.At least 1 number or digit between [0-9]. 5.At least 1 character from [ _ or @ or $ ]. 2) Write a pattern for the following: Pattern to extract lines starting with the word From (or from) and ending with edu. Pattern to extract lines ending with any digit. Start with upper case letters and end with digits. Search for the first white-space character in the string and display its position. Replace every white-space character with the number 9: consider a sample text txt = "The rain in Spain"