0% found this document useful (0 votes)
30 views

Lecture 5 Protein Sequence Database

UniProt is a protein sequence database that consists of UniProtKB (a curated and annotated database), UniRef (non-redundant clusters of sequences), and UniParc (a comprehensive archive of all publicly available sequences). Pfam and Prosite are protein family and domain databases that group similar protein sequences and define common protein domains and families. The Protein Information Resource (PIR) was established in 1984 to provide a public resource for protein sequence identification and interpretation.

Uploaded by

Bhawna Rathi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Lecture 5 Protein Sequence Database

UniProt is a protein sequence database that consists of UniProtKB (a curated and annotated database), UniRef (non-redundant clusters of sequences), and UniParc (a comprehensive archive of all publicly available sequences). Pfam and Prosite are protein family and domain databases that group similar protein sequences and define common protein domains and families. The Protein Information Resource (PIR) was established in 1984 to provide a public resource for protein sequence identification and interpretation.

Uploaded by

Bhawna Rathi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Topic Name – Protein Sequence Databases

Protein Information Resource(PIR)

Uniprot - Protein Knowledge Database


PROTEIN/PROTEOMICS
DATABASES
Pfam - Protein Family And Domain

Prosite - Protein Family And Domain


• The Swiss-Prot, TrEMBL, and PIR protein
database activities have united to form the
Universal Protein Resource (UniProt)
– Uniprot Knowledgebase (UniprotKB):
curated Sequence information,
annotations, linked to other

UNIPROT
databases.
– Uniprot Reference Clusters (UniRef):
removing sequence redundancy by

Database merging sequences that are 100%,


90% and 50%, no annotations, linked
to Knowledgebase and UniParc
records.
– Uniprot Archive (UniParc): history of
sequences, no annotation, linked to
source records.
UNIPROT SEQUENCE DATABASES

UniProt Archive (UniParc) UniProt Reference (UniRef)


Stable, comprehensive, non-redundant Three non-redundant collections based
collection of all protein sequences ever on sequence similarity clusters
published • UniRef100 has all identical and
Merged from PIR, SwissProt, TREMBL, identical overlapping subsequences
DDBJ/EMBL/GenBank proteins and merged into one entry in UniRef100
proteomes, PDB, International Protein • UniRef90 merges all protein sequence
Index, RefSeq translations and other clusters with 90% sequence identity
organism proteomes not yet in into a single entry.
DDBJ/EMBL/GenBank • UniRef50 merges all protein sequence
clusters with 50% sequence identity
into a single entry
UniProt Sequence Databases (cont.)
•UniProt Archive (UniProt)
• UniProt/SwissProt
• Manually curated highly-annotated sequences from SwissProt & PIRSF
including descriptions, taxonomy, citations, GO terms, motifs, functional
and structural classifications, residue specific annotations including
variations.
• Some automatic rule-based annotations including InterPro domains and
motifs, PROSITE, PRINTS, Prodom, SMART, PFAM, PIRSF, Superfamily and
TIGRFAMS classifications.
• UniProt/TREMBL
• Automatically translated from genomes including predicted as well as
RefSeq genes.
• Automated rule-based annotations.
• PIR was established in 1984 by the
National Biomedical Research
Foundation (NBRF) as a resource to
assist researchers in the identification
PROTEIN and interpretation of protein sequence
INFORMATION information.
• The Protein Information Resource (PIR)
RESOURCE is an integrated public bioinformatics
resource to support genomic,
proteomic and systems biology
research and scientific studies
PFAM

PFAM IS A DATABASE OF CURATED PROTEIN FAMILIES, IN PFAM, THE PROFILE HMM IS SEARCHED AGAINST A
EACH OF WHICH IS DEFINED BY TWO ALIGNMENTS AND A LARGE SEQUENCE COLLECTION, BASED ON UNIPROT
PROFILE HIDDEN MARKOV MODEL (HMM). KNOWLEDGEBASE (UNIPROTKB), TO FIND ALL INSTANCES
OF THE FAMILY.
PROSITE DATABASE

PROSITE is a database of protein families and domains. It is based


on the observation that, while there is a huge number of different
proteins, most of them can be grouped, on the basis of similarities
in their sequences, into a limited number of families.

Proteins or protein domains belonging to a particular family


generally share functional attributes and are derived from a
common ancestor.
PROSITE DATABASE

You might also like