This document provides an overview of bioinformatics and its computational goals. It discusses how bioinformatics aims to learn from and generalize patterns in genetic sequence, structure, and function data from well-studied examples to enable prediction of unknown examples. It also aims to organize and integrate molecular interaction data on a genomic scale to simulate processes like gene expression and protein folding. The ultimate goals are to engineer novel organisms or functions and target specific genes or proteins.
This document provides an overview of bioinformatics and its computational goals. It discusses how bioinformatics aims to learn from and generalize patterns in genetic sequence, structure, and function data from well-studied examples to enable prediction of unknown examples. It also aims to organize and integrate molecular interaction data on a genomic scale to simulate processes like gene expression and protein folding. The ultimate goals are to engineer novel organisms or functions and target specific genes or proteins.
of Biochemistry & Medicine (by courtesy) Stanford University School of Medicine Genomics& Medicine htt!""biochem##$%stanford%edu" &hat is Bioinformatics' ()* Protein D)* Phenotye Selection Evolution +ndividuals Poulations Biological +nformation ,omutational -oals of Bioinformatics . /earn & -enerali0e! Discover conserved atterns (models) of se1uences, structures, metabolism & chemistries from 2ell3studied e4amles% . Prediction! +nfer function or structure of ne2ly se1uenced genes, genomes, roteomes or roteins from these generali0ations% . 5rgani0e & +ntegrate! Develo a systematic and genomic aroach to molecular interactions, metabolism, cell signaling, gene e4ression6 . Simulate! Model gene e4ression, gene regulation, rotein folding, rotein3rotein interaction, rotein3ligand binding, catalytic function, metabolism6 . Engineer! ,onstruct novel organisms or novel functions or novel regulation of genes and roteins% . 7arget! Mutations, ()*i to seci8c genes and transcrits or drugs to seci8c rotein targets% ,entral Paradigm of Molecular Biology DNA RNA Protein Phenotype ,entral Paradigm of Medicine Opinions DNA RNA Protein Symptoms ,entral Paradigm of Bioinformatics -enetic +nformation Molecular Structure Phenotye (Symtoms) Biochemical 9unction MVHLTPEEKT AVNALWGKVN VDAVGGEALG RLLVVYPWTQ RFFESFGDLS SPDAVMGNPK VKAHGKKVLG AFSDGLAHLD NLKGTFSQLS ELHCDKLHVD PENFRLLGNV LVCVLARNFG KEFTPQMQAA YQKVVAGVAN ALAHKYH ,entral Paradigm of Bioinformatics Molecular Structure Phenotye (Symtoms) Biochemical 9unction -enetic +nformation MVHLTPEEKT AVNALWGKVN VDAVGGEALG RLLVVYPWTQ RFFESFGDLS SPDAVMGNPK VKAHGKKVLG AFSDGLAHLD NLKGTFSQLS ELHCDKLHVD PENFRLLGNV LVCVLARNFG KEFTPQMQAA YQKVVAGVAN ALAHKYH ,hallenges Understanding -enetic +nformation -enetic +nformation Molecular Structure Biochemical 9unction Phenotye . -enetic information is redundant . Structural information is redundant Soybean /eghemoglobin and Serm &hale Myoglobin Soybean /eghemoglobin Serm &hale Myoglobin ,hallenges Understanding -enetic +nformation -enetic +nformation Molecular Structure Biochemical 9unction Phenotye . -enetic information is redundant . Structural information is redundant . -enes and roteins are one dimensional but their function deends on three3dimensional structure ,hallenges Understanding -enetic +nformation -enetic +nformation Molecular Structure Biochemical 9unction Phenotye . -enetic information is redundant . Structural information is redundant . -enes and roteins are one dimensional but their function deends on three3dimensional structure . -enes and roteins are meta3stable Discovering 9unction from Protein Se1uence Sequences of Common Structure or Function Sequence Similarity 10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Dayhoff:s P*M ;<= *mino *cid (elacement Matri4 (#>?$) Discovering 9unction from Protein Se1uence Sequences of Common Structure or Function Sequence Similarity 10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Discovering 9unction from Protein Se1uence ,onsensus Se1uences or Se1uence Motifs Zinc Finger (C22 type! C "#2$%& C "#'2& "#($)& Sequence Similarity 10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Sequences of Common Structure or Function Protein Motifs from Multile Se1uence *lignments EB+ ,ourse on Protein Motifs"Signatures htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi * 7yical Motif! Ainc 9inger D)* Binding Motif %&&%&&&&&&&&&&&&H&&&&H Discovering 9unction from Protein Se1uence ,onsensus Se1uences or Se1uence Motifs Zinc Finger (C22 type! C "#2$%& C "#'2& "#($)& Sequences of Common Structure or Function Sequence Similarity 10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 P'$("(') P'$("(') 1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12 A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2 R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0 N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0 D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0 % % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1 Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2 E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0 G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0 H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0 . . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1* L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14 K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2 M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 , F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0 P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0 S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0 T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5 W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1 Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1 V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2, PSSMs or &eight Matrices Discovering 9unction from Protein Se1uence ,onsensus Se1uences or Se1uence Motifs Zinc Finger (C22 type! C "#2$%& C "#'2& "#($)& Sequences of Common Structure or Function Sequence Similarity 10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Protein Motifs from Multile Se1uence *lignments EB+ ,ourse on Protein Motifs"Signatures htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi Position3Seci8c Scoring Matri4 for Pro@aryotic Beli437urn3Beli4 Motifs Sequence Helix Turn Helix RCRO_LAMBD F G Q T K T A K D L G V Y Q S A I N K A I H RCRO_BP434 M T Q T E L A T K A G V K Q Q S I Q L I E A RCRO_BPP22 G T Q R A V A K A L G I S D A A V S Q W K E RPC1_LAMBD L S Q E S V A D K M G M G Q S G V G A L F N RPC1_BP434 L N Q A E L A Q K V G T T Q Q S I E Q L E N RPC1_BPP22 I R Q A A L G K M V G V S N V A I S Q W E R RPC2_LAMBD L G T E K T A E A V G V D K S Q I S R W K R LACR_ECOLI V T L Y D V A E Y A G V S Y Q T V S R V V N CRP_ECOLI I T Q Q E I G Q I V G C S R E T V G R I L K TRPR_ECOLI M S Q R E L K N E L G A G I A T I T R G S N RPC1_CPP22 R G Q R K V A D A L G I N E S Q I S R W K G GALR_ECOLI A T I K D V A R L A G V S V A T V S R V I N Y77_BPT7 L S H R S L G E L Y G V S Q S T I T R I L Q TER3_ECOLI L T T R K L A Q K L G V E Q P T L Y W H V K VIVB_BPT7 D Y Q A I F A Q Q L G G T Q S A A S Q I D E DEOR_ECOLI L H L K D A A A L L G V S E M T I R R D L N RP32_BACSU R T L E E V G K V F G V T R E R I R Q I E A Y28_BPT7 E S N V S L A R T Y G V S Q Q T I C D I R K IMMRE_BPPH S T L E A V A G A L G I Q V S A I V G E E T Bloc@s or 9inger Prints from Multile Se1uence *lignments EB+ ,ourse on Protein Motifs"Signatures htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi Bloc@s or 9inger Prints from Multile Se1uence *lignments EB+ ,ourse on Protein Motifs"Signatures htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi Pro8les, PS+3B/*S7 Bidden Mar@ov Models AA' AA2 AA( AA% AA) AA* + ' + 2 + ( + % + ) D 2 D ( D % D ) Discovering 9unction from Protein Se1uence ,onsensus Se1uences or Se1uence Motifs Zinc Finger (C22 type! C "#2$%& C "#'2& "#($)& Sequences of Common Structure or Function P'$("(') P'$("(') 1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12 A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2 R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0 N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0 D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0 % % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1 Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2 E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0 G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0 H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0 . . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1* L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14 K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2 M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 , F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0 P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0 S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0 T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5 W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1 Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1 V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2, PSSMs or &eight Matrices Sequence Similarity 10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Bidden Mar@ov Models from Multile Se1uence *lignments EB+ ,ourse on Protein Motifs"Signatures htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi Data Mining! 7he Seach for Buried 7reasure Data Mining! 7he Seach for Buried 7reasure Data Mining! 7he Seach for Buried 7reasure P(5S+7E Patterns htt!""e4asy%org"rosite" . *ctive site of trysin3li@e serine roteases G D S G G . Ainc 9inger (, ; B ; tye) C-X(2,4)-C-X(12)-H-X(3,5)-H . )3-lycosylation Site N-[^P]-[S T]-[^P] . Bomeobo4 Domain Signature [LIVMF]-X(5)-[LIVM]-X(4)-[IV]-[RKQ]-X-W-X(8)-[RK] S2iss +nstitute of Bioinformatics htt!""222%isb3sib%ch" E4asy Bioinformatics (esource Portal htt!""e4asy%org" E4asy Bioinformatics (esource Portal htt!""e4asy%org" UniProt Cno2ledge Base htt!""222%unirot%org" UniProt 5sin Entries htt!""222%unirot%org" UniProt Buman 5sin *dvanced Search htt!""222%unirot%org" UniProt Buman 5sin Entries (evie2ed htt!""222%unirot%org" UniProt Buman 5sin 5P)#M& Entry htt!""222%unirot%org"unirot"P=D==# Blast UniProt Buman 5sin 5P)#M& Entry htt!""222%unirot%org"unirot"P=D==# Blast UniProt Buman 5P)#M& (esults htt!""222%unirot%org"unirot"P=D==# ),B+ B/*S7 Bome Page htt!""blast%ncbi%nlm%nih%gov" ),B+ B/*S7 Bome Page htt!""blast%ncbi%nlm%nih%gov" ),B+ B/*S7 Parameters htt!""blast%ncbi%nlm%nih%gov" Pro8les, PS+3B/*S7 Bidden Mar@ov Models AA' AA2 AA( AA% AA) AA* + ' + 2 + ( + % + ) D 2 D ( D % D ) Discovering 9unction from Protein Se1uence ,onsensus Se1uences or Se1uence Motifs Zinc Finger (C22 type! C "#2$%& C "#'2& "#($)& Se1uence Similarity 10 20 30 40 50 1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | 2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Sequences of Common Structure or Function P'$("(') P'$("(') 1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12 A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2 R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0 N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0 D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0 % % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1 Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2 E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0 G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0 H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0 . . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1* L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14 K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2 M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 , F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0 P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0 S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0 T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5 W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1 Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1 V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2, PSSMs or &eight Matrices Entre0 -ene search for ,olorblindness Entre0 -ene search for ,olorblindness Entre0 -ene search for ,olorblindness Entre0 -ene search for 5sins Entre0 -ene search for 5sins B/*S7 Similarity Search htt!""222%ncbi%nlm%nih%gov"B/*S7" ,hoose Standard Protein3Protein B/*S7 htt!""222%ncbi%nlm%nih%gov"B/*S7" Paste Se1uence, ,hoose S2issProt Database and B/*S7E 5tional Parameters B/*S7 ,onserved Domain 5utut Se1uence *ligned 2ith Domain Most Signi8cant Similarity Bits Most Signi8cant Similarity Bits Bovine Blue 5sin Similarity Pro8les, PS+3B/*S7 Bidden Mar@ov Models AA' AA2 AA( AA% AA) AA* + ' + 2 + ( + % + ) D 2 D ( D % D ) Discovering 9unction from Protein Se1uence ,onsensus Se1uences or Se1uence Motifs Zinc Finger (C22 type! C "#2$%& C "#'2& "#($)& Se1uence Similarity 10 20 30 40 50 1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS || | | ||||| | |||| | || | | | | 2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50 Sequences of Common Structure or Function P'$("(') P'$("(') 1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12 A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2 R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0 N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0 D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0 % % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1 Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2 E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0 G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0 H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0 . . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1* L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14 K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2 M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 , F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0 P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0 S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0 T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5 W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1 Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1 V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2, PSSMs or &eight Matrices Evaluation of Pro8les Negati,e Proteins Positi,e Proteins - Evaluation of Pro8les Negati,e Proteins Positi,e Proteins -N -P FN FP Sensiti,ity. -P/(-P0FN! Speci1city. -N/(-N0FP! Positi,e Pre2icti,e 3alue. -P/(-P0FP! MyBits /ocal Motifs Search http://myhits.isb-sib.ch/ MyBits /ocal Motifs Fuery http://myhits.isb-sib.ch/ MyBits /ocal Motifs Search http://myhits.isb-sib.ch/ MyBits /ocal Motifs Summary http://myhits.isb-sib.ch/ MyBits /ocal Motif Bits http://myhits.isb-sib.ch/ MyBits /ocal Motifs Bist (,ont%) http://myhits.isb-sib.ch/ MyBits /ocal Motifs Bist (,ont%) MyBits /ocal Motifs Bist (,ont%) +nterPro htt!""222%ebi%ac%u@"interro"
+nterProScan htt!""222%ebi%ac%u@"interro"
+nterPro Scan htt!""222%ebi%ac%u@"7ools"fa"irscan" +nterPro Scan Bour-lass htt!""222%ebi%ac%u@"+nterProScan" +nterPro Scan (esults htt!""222%ebi%ac%u@"+nterProScan" +nterPro Scan (esults htt!""222%ebi%ac%u@"+nterProScan" -5! -ene 5ntology Database htt!""222%geneontology%org" -5! -ene 5ntology for 5sin 5P)#M& htt!""222%geneontology%org" -5! -ene 5ntology for 5sin 5P)#M& htt!""222%geneontology%org" -5! Se1uence +nformation for 5P)#M& htt!""222%geneontology%org" -5! *nnotations for 5P)#M& htt!""222%geneontology%org" -5! -ene 5ntology Database htt!""222%geneontology%org" -5! -ene 5ntology 7erms for 5P)#M& htt!""222%geneontology%org" -5! -ene 5ntology 7erm -,(P htt!""222%geneontology%org" -5! -ene 5ntology -,P( 7erm htt!""222%geneontology%org" -5! -ene 5ntology -,P( 7erm htt!""222%geneontology%org" Bioinformatics Bome2or@ htt!""biochem##$%stanford%edu"bioinformatics%html Bome2or@ *ssignment #) Select a protein from 5M+M or from Entre0 -ene or from UniProt concerning the disease of interest to you% ,oy and save the 9*S7* format of the rotein 8le% ;) Search your rotein for motifs 2ith the MyBits Motif Scan Fuery% Be sure to +nclude Prosite Patterns, Prosite 9re1uent Patterns, Prosite Pro8les, Pre8les, Pfam BMMSs (local Models) in your search% Please send me the MyBits you thin@ are biologically signi8cant and at least # or ; hits 2hich you thin@ are not statistically or biologically signi8cant% Please note that only the Pro8les have e4ectation values% 7he Patterns do not have a measure of statistical signi8cance% G) Search your rotein for bloc@s using the +nterPro database% Please send me a fe2 of the +nterPro domains hits you thin@ are signi8cant and at least # or ; hits 2hich you thin@ are not statistically or biologically signi8cant% Please note that the default grahic outut of +nterPro does not list e4ectation values% Hou must s2itch to the 7abular vie2 to obtain the statistical signi8cance% D) Search your rotein for homology using the B/*S7 method% Please reort t2o or three hits 2hich are both statistically and biologically signi8cant% *lso reort t2o or three hits 2hich you thin@ are neither statistically nor biologically signi8cant% +f your rotein family is very large, you may have to as@ B/*S7 to return more hits to 8nd statistically insigni8cant hits% Statistical vs% Biological Signi8cance *ssignment 9irst, for each search (MyBits, +nterPro and B/*S7 hit), + 2ould li@e you to reort some signi8cance hits and describe 2hy you thin@ they are signi8cant both statistically and biologicallyI also reort some statistically insigni8cant hits (and 2hy) and are any of your statistically insigni8cant hits, still signi8cant biologically)% 7o remind you 2hat + said in class! a statistically signi8cant 8nd in the database search is al2ays biologically signi8cant, but a biologically signi8cant result in the search is not necessarily al2ays statistically signi8cant% Statistical signi8cance and e4ectation values% Statistical signi8cance is determined by the e4ectation value 2hich gives you a measure of ho2 li@ely this 8nding is based on ure chance% * 8nding 2ith an E3value of # or greater is not signi8cant because it could occur by ure chance% * 8nding 2ith an E3value less than #= 3G (one chance in a thousand) is generally considered statistically signi8cant (unless of course you are doing a #,=== searchesE)% So the lo2er the e4ectation value, the more signi8cant the 8nding% 9indings bet2een #= 3G and # are in the so called t2ilight 0one and re1uire some further analysis or e4eriments to determine their validity% Statistical vs% Biological Signi8cance (cont) +nterPro Unli@e most of the other methods, +nterPro sets a very high level of signi8cance for a 8nding before it 2ill reort it% 7his means that you 2ill usually not 8nd any statistically insigni8cant hits for this articular search% Biological Signi8cance +n order to determine biological signi8cance you must read the biological roerties (ontology terms are the most useful) of your rotein and the biological roerties of your 8ndings% 7he 8ndings may be signi8cant because the 8nding de8nes a very closely related rotein family (osins for e4amle) or a very broad family (-3couled rotein recetors or ?3transmembrane roteins) or a common structure (rotein fold) or a seci8c function (retinal binding site) or a very seci8c catalytic activity% Hou should describe in 2ords the level of the biological signi8cance% Statistical vs% Biological Signi8cance (cont) MyBits +f you as@ MyBits to return P*77E()s as 2ell as motifs, you 2ill notice that P*77E()s do not have E3values associated 2ith them so there is no easy 2ay to Kudge statistical signi8cance% &ith attern 8ndings you are left only 2ith Kudging biological signi8cance% *lso none of the 9re1uent atterns from MyBits are statistically signi8cant% B/*S7 +f you do not have any insigni8cant hits from the B/*S7 search, it means that your rotein family is very large and you have to as@ B/*S7 to return more results using the *dvanced 5tions at the bottom of the form% 5nly 2hen you see hits 2ith E3values L =%==# do you have insigni8cant 8ndings% Bidden Mar@ov Models from Multile Se1uence *lignments EB+ ,ourse on Protein Motifs"Signatures htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi Multile Enhancer Se1uences Structure of <: ,*P 4 5ichael 67 8ing Poly*denylation of m()*s 4 5ichael 67 8ing +ntron Slicing Mechanism 4 5ichael 67 8ing Slicing, ,aing & oly*denylation Hields Mature m()* -ranscript mRNA 9ene +ntron +ntron :;on :;on :;on Promoter -erminator 5 (< Splicing poly=A 5 3 Cap 5 3 -SS --S -E)S,*) -ene Model htt!""genes%mit%edu"-E)S,*)%html i22en 5ar>o, mo2els of gene structure 4 Christopher ?urge *lternative Slicing -enerates Distinct Proteins -ranscript mRNA=' 9ene +ntron +ntron :;on :;on :;on Promoter -erminator <: G: -ranscript mRNA=2 <: G: Alternate Splicing Splicing poly=A poly=A Cap Cap ES7s, 9ull /ength cD)* Uni-ene & (efSe1 Databases -ranscript mRNA 9ene +ntron +ntron :;on :;on :;on Promoter -erminator )< (< (< :S-s )< :S-s Full @ength cDNA Splicing )< A-R )< A-R (< A-R (< A-R Protein Cap poly=A *lternative Slicing Detected in ES7 /ibraries -E)S,*) -ene Model htt!""genes%mit%edu"-E)S,*)%html i22en 5ar>o, mo2els of gene structure -ene /oci httpB//CCC7ncDi7nlm7nih7go,/entreE/query7fcgiF2D.gene Protein Sequences 9enscan 9rail:"P F9:N:S 9ene @ocus mRNA Sequences :S- Sequences -enomics, Bioinformatics & ,omutational Biology Computational ?iology Computational 5olecular ?iology ?ioinformatics 9enomics Proteomics Structural 9enomics -enomics, Bioinformatics & ,omutational Biology Computational ?iology Computational 5olecular ?iology ?ioinformatics 9enomics Proteomics Structural 9enomics Systems ?iology Databases Machine /earning (obotics Statistics & Probability *rti8cial +ntelligence -rah 7heory +nformation 7heory *lgorithms -enomics, Bioinformatics & ,omutational Biology Computational ?iology Computational 5olecular ?iology ?ioinformatics 9enomics Proteomics Structural 9enomics (edundancy in -enomic & Protein Se1uences . D)* is double3stranded . -enetic code . *ccetable amino3acid relacements . +ntron3e4on variation . *lternative slicing . Strain variations (S)Ps) . Se1uencing errors Bidden Mar@ov Models (after Baussler) htt!""222%cse%ucsc%edu"combio"sam%html AA1 AA2 AA3 AA4 AA5 AA6 I 1 I 2 I 3 I 4 I 5 D 2 D 3 D 4 D 5 9*M at Sanger ,enter (UC) htt!""222%sanger%ac%u@"Soft2are"Pfam" 9*M at Sanger ,enter (UC) htt!""fam%sanger%ac%u@" 9*M at Sanger ,enter (UC) htt!""fam%sanger%ac%u@" 9*M at Sanger ,enter (UC) htt!""fam%sanger%ac%u@" 9*M at Sanger ,enter (UC) htt!""fam%sanger%ac%u@" 9*M at Sanger ,enter (UC) htt!""fam%sanger%ac%u@"