SlideShare a Scribd company logo
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
FBW
25-10-2016
Wim Van Criekinge
Bioinformatics.be
Recap
if condition:
statements
[elif condition:
statements] ...
else:
statements
while condition:
statements
for var in sequence:
statements
break
continue
Strings
REGULAR EXPRESSIONS
Regular Expressions
http://en.wikipedia.org/wiki/Regular_expression
In computing, a regular expression, also
referred to as "regex" or "regexp", provides a
concise and flexible means for matching
strings of text, such as particular characters,
words, or patterns of characters. A regular
expression is written in a formal language that
can be interpreted by a regular expression
processor.
Really clever "wild card" expressions for
matching and parsing strings.
Understanding Regular Expressions
• Very powerful and quite cryptic
• Fun once you understand them
• Regular expressions are a language
unto themselves
• A language of "marker characters" -
programming with characters
• It is kind of an "old school"
language - compact
Regular Expression Quick Guide
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
s Matches whitespace
S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a chracter one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end
The Regular Expression Module
• Before you can use regular expressions in
your program, you must import the library
using "import re"
• You can use re.search() to see if a string
matches a regular expression similar to
using the find() method for strings
• You can use re.findall() extract portions of
a string that match your regular expression
similar to a combination of find() and
slicing: var[5:10]
Wild-Card Characters
• The dot character matches any
character
• If you add the asterisk character,
the character is "any number of
times"
^X.*:
Match the start of the line
Match any character
Many times
Matching and Extracting Data
• The re.search() returns a True/False
depending on whether the string matches
the regular expression
• If we actually want the matching strings
to be extracted, we use re.findall()
>>> import re
>>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = re.findall('[0-9]+',x)
>>> print y
['2', '19', '42']
Warning: Greedy Matching
• The repeat characters (* and +) push outward in both directions
(greedy) to match the largest possible string
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print y
['From: Using the :']
^F.+:
One or more
characters
First character in the
match is an F
Last character in the
match is a :
Non-Greedy Matching
• Not all regular expression repeat codes are
greedy! If you add a ? character - the + and *
chill out a bit...
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+?:', x)
>>> print y
['From:']
^F.+?:
One or more
characters but
not greedily
First character in the
match is an F
Last character in the
match is a :
Fine Tuning String Extraction
• Parenthesis are not part of the match -
but they tell where to start and stop what
string to extract
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16
2008
>>> y = re.findall('S+@S+',x)
>>> print y
['stephen.marquard@uct.ac.za']
>>> y = re.findall('^From (S+@S+)',x)
>>> print y
['stephen.marquard@uct.ac.za']
^From (S+@S+)
The Double Split Version
• Sometimes we split a line one way and then grab
one of the pieces of the line and split that piece
again
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16
2008
words = line.split()
email = words[1]
pieces = email.split('@')
print pieces[1]
stephen.marquard@uct.ac.za
['stephen.marquard', 'uct.ac.za']
'uct.ac.za'
The Regex Version
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16
2008
import re
lin = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:1
y = re.findall('@([^ ]*)',lin)
print y['uct.ac.za']
'@([^ ]*)'
Look through the string until you find an at-sign
Match non-blank character
Match many of them
Escape Character
• If you want a special regular expression
character to just behave normally (most
of the time) you prefix it with ''
>>> import re
>>> x = 'We just received $10.00 for cookies.'
>>> y = re.findall('$[0-9.]+',x)
>>> print y
['$10.00']
$[0-9.]+
A digit or periodA real dollar sign
At least one
or more
Real world problems
• Match IP Addresses, email addresses,
URLs
• Match balanced sets of parenthesis
• Substitute words
• Tokenize
• Validate
• Count
• Delete duplicates
• Natural Language processing
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
RE in Python
• Unleash the power - built-in re module
• Functions
– to compile patterns
• compile
– to perform matches
• match, search, findall, finditer
– to perform operations on match object
• group, start, end, span
– to substitute
• sub, subn
• - Metacharacters
Regex.py
text = 'abbaaabbbbaaaaa'
pattern = 'ab'
for match in re.finditer(pattern, text):
s = match.start()
e = match.end()
print ('Found "%s" at %d:%d' % (text[s:e], s, e))
Oefening 1
1. Which of following 4 sequences
(seq1/2/3/4)
a) contains a “Galactokinase signature”
b) How many of them?
http://us.expasy.org/prosite/
>SEQ1
MGNLFENCTHRYSFEYIYENCTNTTNQCGLIRNVASSIDVFHWLDVYISTTIFVISGILNFYCLFIALYT
YYFLDNETRKHYVFVLSRFLSSILVIISLLVLESTLFSESLSPTFAYYAVAFSIYDFSMDTLFFSYIMIS
LITYFGVVHYNFYRRHVSLRSLYIILISMWTFSLAIAIPLGLYEAASNSQGPIKCDLSYCGKVVEWITCS
LQGCDSFYNANELLVQSIISSVETLVGSLVFLTDPLINIFFDKNISKMVKLQLTLGKWFIALYRFLFQMT
NIFENCSTHYSFEKNLQKCVNASNPCQLLQKMNTAHSLMIWMGFYIPSAMCFLAVLVDTYCLLVTISILK
SLKKQSRKQYIFGRANIIGEHNDYVVVRLSAAILIALCIIIIQSTYFIDIPFRDTFAFFAVLFIIYDFSILSLLGSFTGVA
M MTYFGVMRPLVYRDKFTLKTIYIIAFAIVLFSVCVAIPFGLFQAADEIDGPIKCDSESCELIVKWLLFCI
ACLILMGCTGTLLFVTVSLHWHSYKSKKMGNVSSSAFNHGKSRLTWTTTILVILCCVELIPTGLLAAFGK
SESISDDCYDFYNANSLIFPAIVSSLETFLGSITFLLDPIINFSFDKRISKVFSSQVSMFSIFFCGKR
>SEQ2
MLDDRARMEA AKKEKVEQIL AEFQLQEEDL KKVMRRMQKE MDRGLRLETH EEASVKMLPT YVRSTPEGSE
VGDFLSLDLG GTNFRVMLVK VGEGEEGQWS VKTKHQMYSI PEDAMTGTAE MLFDYISECI SDFLDKHQMK
HKKLPLGFTF SFPVRHEDID KGILLNWTKG FKASGAEGNN VVGLLRDAIK RRGDFEMDVV AMVNDTVATM
ISCYYEDHQC EVGMIVGTGC NACYMEEMQN VELVEGDEGR MCVNTEWGAF GDSGELDEFL LEYDRLVDES
SANPGQQLYE KLIGGKYMGE LVRLVLLRLV DENLLFHGEA SEQLRTRGAF ETRFVSQVES DTGDRKQIYN
ILSTLGLRPS TTDCDIVRRA CESVSTRAAH MCSAGLAGVI NRMRESRSED VMRITVGVDG SVYKLHPSFK
ERFHASVRRL TPSCEITFIE SEEGSGRGAA LVSAVACKKA CMLGQ
>SEQ3
MESDSFEDFLKGEDFSNYSYSSDLPPFLLDAAPCEPESLEINKYFVVIIYVLVFLLSLLGNSLVMLVILY
SRVGRSGRDNVIGDHVDYVTDVYLLNLALADLLFALTLPIWAASKVTGWIFGTFLCKVVSLLKEVNFYSGILLLA
CISVDRY
LAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLIFRKTIYPPYVSPVCYEDMGNNTANWRMLLRILP
QSFGFIVPLLIMLFCYGFTLRTLFKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTWVIQET
CERRNDIDRALEATEILGILGRVNLIGEHWDYHSCLNPLIYAFIGQKFRHGLLKILAIHGLISKDSLPKDSRPSFVGS
SSGH TSTTL
>SEQ4
MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG
GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG
FTFSYPANQV SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK
ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ
IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR
FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI
GDNEKLISIG IAKDGSGIGA ALCALQAVKE KKGLA MEANFQQAVK KLVNDFEYPT ESLREAVKEF
DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC
VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW TKGLNIPEAI
NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG
KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE
GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL WDAAHAIGRR AARMSAVPIA
SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA ALCALQAVKE
KKGLA
Oefening 1
http://www.pythonchallenge.com
Lists
• Flexible arrays, not Lisp-like linked
lists
• a = [99, "bottles of beer", ["on", "the",
"wall"]]
• Same operators as for strings
• a+b, a*3, a[0], a[-1], a[1:], len(a)
• Item and slice assignment
• a[0] = 98
• a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]
• del a[-1] # -> [98, "bottles", "of", "beer"]
Dictionaries
• Hash tables, "associative arrays"
• d = {"duck": "eend", "water": "water"}
• Lookup:
• d["duck"] -> "eend"
• d["back"] # raises KeyError exception
• Delete, insert, overwrite:
• del d["water"] # {"duck": "eend", "back": "rug"}
• d["back"] = "rug" # {"duck": "eend", "back":
"rug"}
• d["duck"] = "duik" # {"duck": "duik", "back":
"rug"}
Find the answer in ultimate-sequence.txt ?
>ultimate-sequence
ACTCGTTATGATATTTTTTTTGAACGTGAAAATACT
TTTCGTGCTATGGAAGGACTCGTTATCGTGAAGT
TGAACGTTCTGAATGTATGCCTCTTGAAATGGA
AAATACTCATTGTTTATCTGAAATTTGAATGGGA
ATTTTATCTACAATGTTTTATTCTTACAGAACAT
TAAATTGTGTTATGTTTCATTTCACATTTTAGTA
GTTTTTTCAGTGAAAGCTTGAAAACCACCAAGA
AGAAAAGCTGGTATGCGTAGCTATGTATATATA
AAATTAGATTTTCCACAAAAAATGATCTGATAA
ACCTTCTCTGTTGGCTCCAAGTATAAGTACGAAA
AGAAATACGTTCCCAAGAATTAGCTTCATGAGT
AAGAAGAAAAGCTGGTATGCGTAGCTATGTATA
TATAAAATTAGATTTTCCACAAAAAATGATCTG
ATAA
Question 2
AA1 =
{'UUU':'F','UUC':'F','UUA':'L','UUG':'L','UCU':'S','
UCC':'S','UCA':'S','UCG':'S','UAU':'Y','UAC':'Y','UA
A':'*','UAG':'*','UGU':'C','UGC':'C','UGA':'*','UGG':
'W','CUU':'L','CUC':'L','CUA':'L','CUG':'L','CCU':'P',
'CCC':'P','CCA':'P','CCG':'P','CAU':'H','CAC':'H','CA
A':'Q','CAG':'Q','CGU':'R','CGC':'R','CGA':'R','CGG'
:'R','AUU':'I','AUC':'I','AUA':'I','AUG':'M','ACU':'T','
ACC':'T','ACA':'T','ACG':'T','AAU':'N','AAC':'N','AAA'
:'K','AAG':'K','AGU':'S','AGC':'S','AGA':'R','AGG':'R',
'GUU':'V','GUC':'V','GUA':'V','GUG':'V','GCU':'A','G
CC':'A','GCA':'A','GCG':'A','GAU':'D','GAC':'D','GA
A':'E','GAG':'E','GGU':'G','GGC':'G','GGA':'G','GGG
':'G' }
Hint: Use Dictionaries
Hint 2: Translations
Python way:
tab = str.maketrans("ACGU","UGCA")
sequence = sequence.translate(tab)[::-1]
30
Reading Files
name = open("filename")
– opens the given file for reading, and returns a file object
name.read() - file's entire contents as a string
name.readline() - next line from file as a string
name.readlines() - file's contents as a list of lines
– the lines from a file object can also be read using a for loop
>>> f = open("hours.txt")
>>> f.read()
'123 Susan 12.5 8.1 7.6 3.2n
456 Brad 4.0 11.6 6.5 2.7 12n
789 Jenn 8.0 8.0 8.0 8.0 7.5n'
31
File Input Template
• A template for reading files in Python:
name = open("filename")
for line in name:
statements
>>> input = open("hours.txt")
>>> for line in input:
... print(line.strip()) # strip() removes n
123 Susan 12.5 8.1 7.6 3.2
456 Brad 4.0 11.6 6.5 2.7 12
789 Jenn 8.0 8.0 8.0 8.0 7.5
32
Writing Files
name = open("filename", "w")
name = open("filename", "a")
– opens file for write (deletes previous contents), or
– opens file for append (new data goes after previous data)
name.write(str) - writes the given string to the file
name.close() - saves file once writing is done
>>> out = open("output.txt", "w")
>>> out.write("Hello, world!n")
>>> out.write("How are you?")
>>> out.close()
>>> open("output.txt").read()
'Hello, world!nHow are you?'
Question 3. Swiss-Knife.py
• Using a database as input ! Parse
the entire Swiss Prot collection
– How many entries are there ?
– Average Protein Length (in aa and
MW)
– Relative frequency of amino acids
• Compare to the ones used to construct
the PAM scoring matrixes from 1978 –
1991
Question 3: Getting the database
Uniprot_sprot.dat.gz – 528Mb
(on Github onder Files)
Unzipped 2.92 Gb !
http://www.ebi.ac.uk/uniprot/download-center
Amino acid frequencies
1978 1991
L 0.085 0.091
A 0.087 0.077
G 0.089 0.074
S 0.070 0.069
V 0.065 0.066
E 0.050 0.062
T 0.058 0.059
K 0.081 0.059
I 0.037 0.053
D 0.047 0.052
R 0.041 0.051
P 0.051 0.051
N 0.040 0.043
Q 0.038 0.041
F 0.040 0.040
Y 0.030 0.032
M 0.015 0.024
H 0.034 0.023
C 0.033 0.020
W 0.010 0.014
Second step: Frequencies of Occurence
Extra Questions
• How many records have a sequence of length 260?
• What are the first 20 residues of 143X_MAIZE?
• What is the identifier for the record with the
shortest sequence? Is there more than one record
with that length?
• What is the identifier for the record with the
longest sequence? Is there more than one record
with that length?
• How many contain the subsequence "ARRA"?
• How many contain the substring "KCIP-1" in the
description?
Question 4
• Program your own prosite parser !
• Download prosite pattern database
(prosite.dat)
• Automatically generate >2000 search
patterns, and search in sequence set
from question 1

More Related Content

Viewers also liked (20)

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
Prof. Wim Van Criekinge
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
Prof. Wim Van Criekinge
 
2017 biological databasespart2
2017 biological databasespart22017 biological databasespart2
2017 biological databasespart2
Prof. Wim Van Criekinge
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
Prof. Wim Van Criekinge
 
Introducing SMCR from an HR perspective
Introducing SMCR from an HR perspectiveIntroducing SMCR from an HR perspective
Introducing SMCR from an HR perspective
Heath Buck
 
Mysql introduction
Mysql introduction Mysql introduction
Mysql introduction
Prof. Wim Van Criekinge
 
2016 03 15_biological_databases_part4
2016 03 15_biological_databases_part42016 03 15_biological_databases_part4
2016 03 15_biological_databases_part4
Prof. Wim Van Criekinge
 
Mysql all
Mysql allMysql all
Mysql all
Prof. Wim Van Criekinge
 
2016 02 23_biological_databases_part2
2016 02 23_biological_databases_part22016 02 23_biological_databases_part2
2016 02 23_biological_databases_part2
Prof. Wim Van Criekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
Prof. Wim Van Criekinge
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
Prof. Wim Van Criekinge
 
Introducing SMCR from an HR perspective
Introducing SMCR from an HR perspectiveIntroducing SMCR from an HR perspective
Introducing SMCR from an HR perspective
Heath Buck
 

Similar to 2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge (20)

Pythonlearn-11-Regex.pptx
Pythonlearn-11-Regex.pptxPythonlearn-11-Regex.pptx
Pythonlearn-11-Regex.pptx
Dave Tan
 
Regular expression for everyone
Regular expression for everyoneRegular expression for everyone
Regular expression for everyone
Sanjeev Kumar Jaiswal
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
Max Kleiner
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
DurgaNayak4
 
P3 2017 python_regexes
P3 2017 python_regexesP3 2017 python_regexes
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
Helvetia
HelvetiaHelvetia
Helvetia
ESUG
 
Practical JavaScript Programming - Session 6/8
Practical JavaScript Programming - Session 6/8Practical JavaScript Programming - Session 6/8
Practical JavaScript Programming - Session 6/8
Wilson Su
 
Dynamic grammars
Dynamic grammarsDynamic grammars
Dynamic grammars
Lukas Renggli
 
JavaScript Objects
JavaScript ObjectsJavaScript Objects
JavaScript Objects
Reem Alattas
 
5. string
5. string5. string
5. string
PhD Research Scholar
 
Regular expression
Regular expressionRegular expression
Regular expression
Rajon
 
Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"
LogeekNightUkraine
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
Mahzad Zahedi
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
Prof. Wim Van Criekinge
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applications
Sunil Kumar Gunasekaran
 
CS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic ServicesCS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic Services
Eelco Visser
 
Regexp
RegexpRegexp
Regexp
Ynon Perek
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
Carl Brown
 
Reg EX
Reg EXReg EX
Reg EX
Blazing Cloud
 
03 standardclasses
03 standardclasses03 standardclasses
03 standardclasses
The World of Smalltalk
 
Pythonlearn-11-Regex.pptx
Pythonlearn-11-Regex.pptxPythonlearn-11-Regex.pptx
Pythonlearn-11-Regex.pptx
Dave Tan
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
Max Kleiner
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
DurgaNayak4
 
Helvetia
HelvetiaHelvetia
Helvetia
ESUG
 
Practical JavaScript Programming - Session 6/8
Practical JavaScript Programming - Session 6/8Practical JavaScript Programming - Session 6/8
Practical JavaScript Programming - Session 6/8
Wilson Su
 
JavaScript Objects
JavaScript ObjectsJavaScript Objects
JavaScript Objects
Reem Alattas
 
Regular expression
Regular expressionRegular expression
Regular expression
Rajon
 
Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"
LogeekNightUkraine
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
Prof. Wim Van Criekinge
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applications
Sunil Kumar Gunasekaran
 
CS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic ServicesCS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic Services
Eelco Visser
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
Carl Brown
 

More from Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
Prof. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
Prof. Wim Van Criekinge
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
Prof. Wim Van Criekinge
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
Prof. Wim Van Criekinge
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
Prof. Wim Van Criekinge
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
Prof. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
Prof. Wim Van Criekinge
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
Prof. Wim Van Criekinge
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
Prof. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 

Recently uploaded (20)

SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 

2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge

  • 4. Recap if condition: statements [elif condition: statements] ... else: statements while condition: statements for var in sequence: statements break continue Strings REGULAR EXPRESSIONS
  • 5. Regular Expressions http://en.wikipedia.org/wiki/Regular_expression In computing, a regular expression, also referred to as "regex" or "regexp", provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor. Really clever "wild card" expressions for matching and parsing strings.
  • 6. Understanding Regular Expressions • Very powerful and quite cryptic • Fun once you understand them • Regular expressions are a language unto themselves • A language of "marker characters" - programming with characters • It is kind of an "old school" language - compact
  • 7. Regular Expression Quick Guide ^ Matches the beginning of a line $ Matches the end of the line . Matches any character s Matches whitespace S Matches any non-whitespace character * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a chracter one or more times +? Repeats a character one or more times (non-greedy) [aeiou] Matches a single character in the listed set [^XYZ] Matches a single character not in the listed set [a-z0-9] The set of characters can include a range ( Indicates where string extraction is to start ) Indicates where string extraction is to end
  • 8. The Regular Expression Module • Before you can use regular expressions in your program, you must import the library using "import re" • You can use re.search() to see if a string matches a regular expression similar to using the find() method for strings • You can use re.findall() extract portions of a string that match your regular expression similar to a combination of find() and slicing: var[5:10]
  • 9. Wild-Card Characters • The dot character matches any character • If you add the asterisk character, the character is "any number of times" ^X.*: Match the start of the line Match any character Many times
  • 10. Matching and Extracting Data • The re.search() returns a True/False depending on whether the string matches the regular expression • If we actually want the matching strings to be extracted, we use re.findall() >>> import re >>> x = 'My 2 favorite numbers are 19 and 42' >>> y = re.findall('[0-9]+',x) >>> print y ['2', '19', '42']
  • 11. Warning: Greedy Matching • The repeat characters (* and +) push outward in both directions (greedy) to match the largest possible string >>> import re >>> x = 'From: Using the : character' >>> y = re.findall('^F.+:', x) >>> print y ['From: Using the :'] ^F.+: One or more characters First character in the match is an F Last character in the match is a :
  • 12. Non-Greedy Matching • Not all regular expression repeat codes are greedy! If you add a ? character - the + and * chill out a bit... >>> import re >>> x = 'From: Using the : character' >>> y = re.findall('^F.+?:', x) >>> print y ['From:'] ^F.+?: One or more characters but not greedily First character in the match is an F Last character in the match is a :
  • 13. Fine Tuning String Extraction • Parenthesis are not part of the match - but they tell where to start and stop what string to extract From [email protected] Sat Jan 5 09:14:16 2008 >>> y = re.findall('S+@S+',x) >>> print y ['[email protected]'] >>> y = re.findall('^From (S+@S+)',x) >>> print y ['[email protected]'] ^From (S+@S+)
  • 14. The Double Split Version • Sometimes we split a line one way and then grab one of the pieces of the line and split that piece again From [email protected] Sat Jan 5 09:14:16 2008 words = line.split() email = words[1] pieces = email.split('@') print pieces[1] [email protected] ['stephen.marquard', 'uct.ac.za'] 'uct.ac.za'
  • 15. The Regex Version From [email protected] Sat Jan 5 09:14:16 2008 import re lin = 'From [email protected] Sat Jan 5 09:14:1 y = re.findall('@([^ ]*)',lin) print y['uct.ac.za'] '@([^ ]*)' Look through the string until you find an at-sign Match non-blank character Match many of them
  • 16. Escape Character • If you want a special regular expression character to just behave normally (most of the time) you prefix it with '' >>> import re >>> x = 'We just received $10.00 for cookies.' >>> y = re.findall('$[0-9.]+',x) >>> print y ['$10.00'] $[0-9.]+ A digit or periodA real dollar sign At least one or more
  • 17. Real world problems • Match IP Addresses, email addresses, URLs • Match balanced sets of parenthesis • Substitute words • Tokenize • Validate • Count • Delete duplicates • Natural Language processing
  • 20. RE in Python • Unleash the power - built-in re module • Functions – to compile patterns • compile – to perform matches • match, search, findall, finditer – to perform operations on match object • group, start, end, span – to substitute • sub, subn • - Metacharacters
  • 21. Regex.py text = 'abbaaabbbbaaaaa' pattern = 'ab' for match in re.finditer(pattern, text): s = match.start() e = match.end() print ('Found "%s" at %d:%d' % (text[s:e], s, e))
  • 22. Oefening 1 1. Which of following 4 sequences (seq1/2/3/4) a) contains a “Galactokinase signature” b) How many of them? http://us.expasy.org/prosite/
  • 23. >SEQ1 MGNLFENCTHRYSFEYIYENCTNTTNQCGLIRNVASSIDVFHWLDVYISTTIFVISGILNFYCLFIALYT YYFLDNETRKHYVFVLSRFLSSILVIISLLVLESTLFSESLSPTFAYYAVAFSIYDFSMDTLFFSYIMIS LITYFGVVHYNFYRRHVSLRSLYIILISMWTFSLAIAIPLGLYEAASNSQGPIKCDLSYCGKVVEWITCS LQGCDSFYNANELLVQSIISSVETLVGSLVFLTDPLINIFFDKNISKMVKLQLTLGKWFIALYRFLFQMT NIFENCSTHYSFEKNLQKCVNASNPCQLLQKMNTAHSLMIWMGFYIPSAMCFLAVLVDTYCLLVTISILK SLKKQSRKQYIFGRANIIGEHNDYVVVRLSAAILIALCIIIIQSTYFIDIPFRDTFAFFAVLFIIYDFSILSLLGSFTGVA M MTYFGVMRPLVYRDKFTLKTIYIIAFAIVLFSVCVAIPFGLFQAADEIDGPIKCDSESCELIVKWLLFCI ACLILMGCTGTLLFVTVSLHWHSYKSKKMGNVSSSAFNHGKSRLTWTTTILVILCCVELIPTGLLAAFGK SESISDDCYDFYNANSLIFPAIVSSLETFLGSITFLLDPIINFSFDKRISKVFSSQVSMFSIFFCGKR >SEQ2 MLDDRARMEA AKKEKVEQIL AEFQLQEEDL KKVMRRMQKE MDRGLRLETH EEASVKMLPT YVRSTPEGSE VGDFLSLDLG GTNFRVMLVK VGEGEEGQWS VKTKHQMYSI PEDAMTGTAE MLFDYISECI SDFLDKHQMK HKKLPLGFTF SFPVRHEDID KGILLNWTKG FKASGAEGNN VVGLLRDAIK RRGDFEMDVV AMVNDTVATM ISCYYEDHQC EVGMIVGTGC NACYMEEMQN VELVEGDEGR MCVNTEWGAF GDSGELDEFL LEYDRLVDES SANPGQQLYE KLIGGKYMGE LVRLVLLRLV DENLLFHGEA SEQLRTRGAF ETRFVSQVES DTGDRKQIYN ILSTLGLRPS TTDCDIVRRA CESVSTRAAH MCSAGLAGVI NRMRESRSED VMRITVGVDG SVYKLHPSFK ERFHASVRRL TPSCEITFIE SEEGSGRGAA LVSAVACKKA CMLGQ >SEQ3 MESDSFEDFLKGEDFSNYSYSSDLPPFLLDAAPCEPESLEINKYFVVIIYVLVFLLSLLGNSLVMLVILY SRVGRSGRDNVIGDHVDYVTDVYLLNLALADLLFALTLPIWAASKVTGWIFGTFLCKVVSLLKEVNFYSGILLLA CISVDRY LAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLIFRKTIYPPYVSPVCYEDMGNNTANWRMLLRILP QSFGFIVPLLIMLFCYGFTLRTLFKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTWVIQET CERRNDIDRALEATEILGILGRVNLIGEHWDYHSCLNPLIYAFIGQKFRHGLLKILAIHGLISKDSLPKDSRPSFVGS SSGH TSTTL >SEQ4 MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA ALCALQAVKE KKGLA MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA ALCALQAVKE KKGLA Oefening 1
  • 25. Lists • Flexible arrays, not Lisp-like linked lists • a = [99, "bottles of beer", ["on", "the", "wall"]] • Same operators as for strings • a+b, a*3, a[0], a[-1], a[1:], len(a) • Item and slice assignment • a[0] = 98 • a[1:2] = ["bottles", "of", "beer"] -> [98, "bottles", "of", "beer", ["on", "the", "wall"]] • del a[-1] # -> [98, "bottles", "of", "beer"]
  • 26. Dictionaries • Hash tables, "associative arrays" • d = {"duck": "eend", "water": "water"} • Lookup: • d["duck"] -> "eend" • d["back"] # raises KeyError exception • Delete, insert, overwrite: • del d["water"] # {"duck": "eend", "back": "rug"} • d["back"] = "rug" # {"duck": "eend", "back": "rug"} • d["duck"] = "duik" # {"duck": "duik", "back": "rug"}
  • 27. Find the answer in ultimate-sequence.txt ? >ultimate-sequence ACTCGTTATGATATTTTTTTTGAACGTGAAAATACT TTTCGTGCTATGGAAGGACTCGTTATCGTGAAGT TGAACGTTCTGAATGTATGCCTCTTGAAATGGA AAATACTCATTGTTTATCTGAAATTTGAATGGGA ATTTTATCTACAATGTTTTATTCTTACAGAACAT TAAATTGTGTTATGTTTCATTTCACATTTTAGTA GTTTTTTCAGTGAAAGCTTGAAAACCACCAAGA AGAAAAGCTGGTATGCGTAGCTATGTATATATA AAATTAGATTTTCCACAAAAAATGATCTGATAA ACCTTCTCTGTTGGCTCCAAGTATAAGTACGAAA AGAAATACGTTCCCAAGAATTAGCTTCATGAGT AAGAAGAAAAGCTGGTATGCGTAGCTATGTATA TATAAAATTAGATTTTCCACAAAAAATGATCTG ATAA Question 2
  • 29. Hint 2: Translations Python way: tab = str.maketrans("ACGU","UGCA") sequence = sequence.translate(tab)[::-1]
  • 30. 30 Reading Files name = open("filename") – opens the given file for reading, and returns a file object name.read() - file's entire contents as a string name.readline() - next line from file as a string name.readlines() - file's contents as a list of lines – the lines from a file object can also be read using a for loop >>> f = open("hours.txt") >>> f.read() '123 Susan 12.5 8.1 7.6 3.2n 456 Brad 4.0 11.6 6.5 2.7 12n 789 Jenn 8.0 8.0 8.0 8.0 7.5n'
  • 31. 31 File Input Template • A template for reading files in Python: name = open("filename") for line in name: statements >>> input = open("hours.txt") >>> for line in input: ... print(line.strip()) # strip() removes n 123 Susan 12.5 8.1 7.6 3.2 456 Brad 4.0 11.6 6.5 2.7 12 789 Jenn 8.0 8.0 8.0 8.0 7.5
  • 32. 32 Writing Files name = open("filename", "w") name = open("filename", "a") – opens file for write (deletes previous contents), or – opens file for append (new data goes after previous data) name.write(str) - writes the given string to the file name.close() - saves file once writing is done >>> out = open("output.txt", "w") >>> out.write("Hello, world!n") >>> out.write("How are you?") >>> out.close() >>> open("output.txt").read() 'Hello, world!nHow are you?'
  • 33. Question 3. Swiss-Knife.py • Using a database as input ! Parse the entire Swiss Prot collection – How many entries are there ? – Average Protein Length (in aa and MW) – Relative frequency of amino acids • Compare to the ones used to construct the PAM scoring matrixes from 1978 – 1991
  • 34. Question 3: Getting the database Uniprot_sprot.dat.gz – 528Mb (on Github onder Files) Unzipped 2.92 Gb ! http://www.ebi.ac.uk/uniprot/download-center
  • 35. Amino acid frequencies 1978 1991 L 0.085 0.091 A 0.087 0.077 G 0.089 0.074 S 0.070 0.069 V 0.065 0.066 E 0.050 0.062 T 0.058 0.059 K 0.081 0.059 I 0.037 0.053 D 0.047 0.052 R 0.041 0.051 P 0.051 0.051 N 0.040 0.043 Q 0.038 0.041 F 0.040 0.040 Y 0.030 0.032 M 0.015 0.024 H 0.034 0.023 C 0.033 0.020 W 0.010 0.014 Second step: Frequencies of Occurence
  • 36. Extra Questions • How many records have a sequence of length 260? • What are the first 20 residues of 143X_MAIZE? • What is the identifier for the record with the shortest sequence? Is there more than one record with that length? • What is the identifier for the record with the longest sequence? Is there more than one record with that length? • How many contain the subsequence "ARRA"? • How many contain the substring "KCIP-1" in the description?
  • 37. Question 4 • Program your own prosite parser ! • Download prosite pattern database (prosite.dat) • Automatically generate >2000 search patterns, and search in sequence set from question 1