0% found this document useful (0 votes)
40 views

Protein Database

Uploaded by

Yash Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Protein Database

Uploaded by

Yash Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PROTEIN DATABASE

Protein databases are a type of biological database that are collections ofinformation about
proteins.
The information contained in protein databases includes the amino acid sequence,the domain
structure, the biological function of the protein, its three-dimensional structure, and its interactions
with other proteins.

Several protein databases are publicly available. Based on the type of informationstored, protein
databases can be classified into several categories.

Protein Sequence Databases


The protein sequence database contains amino acid sequences of proteins and related information. The
amino acid sequence of a protein is important because itdetermines the protein’s three-dimensional
structure and function, as well as its identity.
Some of the most popular protein sequence databases are:
NDB: Nucleic Acid structure database
• The NDB contains information about experimentally-determined nucleic acids and complex assemblies.
• Use the NDB to perform searches based on annotations relating to sequence, structure and function, and
to download, analyze, and learn about nucleic acids.
• http://ndbserver.rutgers.edu/
PIR
• PIR (Protein Information Resource) is a popular protein sequence databasethat provides
information on functionally annotated protein sequences.
• PIR maintains three databases, the Protein Sequence Database (PSD), the Non-redundant
Reference (NREF) sequence database, and the integrated ProteinClassification (iProClass)
database, which contains annotated protein sequences, classification information, and protein
family, function, and structure information.
SWISS-PROT
• SWISS-PROT is a protein sequence database that provides high levels of annotations,
including information on the protein’s function, domain structure,post-translational
modifications, and variants.

PDB – Protein Structure database

• A protein database contains the information about 3D structure of proteins.


• The PDB files contain experimentally decided 3D structures of biological macromolecules.
• The structural information of a protein can be determined by X–ray crystallography or Nuclear Magnetic
Resonance (NMR) spectroscopy methods.
• PDB allows searching for information regarding the structure, sequence, function, visualize , download
and to assess molecules.
• The PDB files also contains information of data collected, molecule name, primary and secondary
structure, ligand, atomic coordinates, crystallographic structure factors, NMR experimental data etc..
• The data are submitted by scientists from all over the world. PDB is maintained by Worldwide Protein
Data Bank. All data in PDB are accessible to public.
• Each entry in the PDB is provided with a unique identification number called the PDB ID. It is a 4 letter
identification number which consist of both alphanumeric characters.
• Without a proper tool, the PDB file will be read as a text file that lists each atom and its numerical
coordinates in 3-D space.
• There are databases which contain data derived from PDB.
For example
• Structural Classification of Proteins (SCOP) that groups different protein structures,
• HSSP (Homology-Derived Secondary Structure of Proteins) for 3D- structure and 1D- sequence of the
protein,
• CATH for protein structure classification according to their evolution etc.

SwissProt
• Swiss-Prot is jointly managed by the SIB (Swiss Institute of Bioinformatics)and the EBI
(European Bioinformatics Institute).
• The database distinguishes itself from other protein sequence databases by three criteria: (i)
annotations, which cover a broad range of information, (ii) minimal redundancy, which ensures that
each sequence is represented only once,and (iii) integration with other databases, which enables
cross-referencing and retrieval of information from related databases.
TrEMBL
• TrEMBL is a computer-annotated supplement of Swiss-Prot. TrEMBL entriesfollow the
Swiss-Prot format.
• It contains all the translations of EMBL (European Molecular Biology Laboratory)
nucleotide sequence entries that have not yet been integrated intoSwiss-Prot.

Protein Structure Databases


Protein structure databases are collections of information related to the three-dimensional
structure and secondary structure of proteins.
There are several examples of protein structure databases. Some are:
SCOP
• SCOP (Structural Classification of Proteins) is a protein structure databasethat organizes
proteins based on their secondary structure properties.
• SCOP categorizes proteins into different levels based on their evolutionaryrelationships and
structural similarities.
• Proteins with high sequence identity or similar structure and function aregrouped into
families, and families with similar structures but low sequence identity are placed into
superfamilies.
• Proteins with the same major secondary structures in the same arrangementare placed into the
same fold category, and folds are further grouped into five structural classes.

Protein Pattern and Profile Databases


Protein pattern and profile databases contain information on motifs found in sequences. Sequence
motifs correspond to structural or functional features in proteins. So, the use of protein sequence
patterns or profiles is a valuable tool indetermining the function of proteins.
InterPro
• InterPro is a database that contains information on protein families, domains,and functional
sites.
• It was created by combining several major protein signature databases,including
PROSITE, Pfam, PRINTS, ProDom, and SMART into a single comprehensive resource.
PROSITE
• PROSITE is a collection of signatures that identify patterns or profiles inproteins,
which can provide information on their biological functions.
• The signatures in the database are linked to annotation documents thatprovide
information on the protein family or domain detected, including its name, function, 3D
structure, and references.

Applications of protein databases


Protein databases have numerous applications. Some of the applications are:
• Protein databases can be used in sequence analysis to identify homologoussequences and
predict protein functions based on sequence similarity.
• Protein databases can also be used for predicting protein structure by comparing the
amino acid sequence of a protein with known structures in thedatabase.
• Protein databases also include tools to study protein-protein interactions.
• Protein pattern and profile databases can be used for protein familyidentification by
identifying conserved motifs.

Protein databases such as metabolic pathway databases can be used in drugdiscovery and disease
research by studying the metabolic pathways involved in diseases.
rotein is i in too s
s o is mo e r r phi s pro r m intended is mo e r r phi s s stem ith n
or the is is tion o proteins n ei ids nd em edded thon interpreter desi ned or re time
sm mo e es is i tion nd r pid ener tion o hi h it
he pro r m re ds in mo e e oordin te i e mo e r r phi s im es nd nim tions
nd inter ti e disp s the mo e e on the s reen o is po er mo e e is i tion
in riet o o o r s hemes nd mo e e so t re ith the o o in m in e t res
represent tions e to prod e hi h it r phi s re d or
p i tions
rrent i e represent tions in de depth
ed ire r mes reidin sti s sp e i in e to re te mo ies
spheres nd sti so id nd str nd iomo e r e to me s re ond dist n es nd n es
ri ons tom e s nd dot s r es
s n e tensi e he p s stem
tr t res n e s i ed di ed nd re ssem ed
on the nd ritten o t to st nd rd i es
oth omm nd ine inter e nd r phi ser
inter e re pro ided
thon is pro ided to ess
n tion ities

o ie the str t res th t re en oded these


tomi oordin te i es hi h h e the e tension
pd nd
to e e to m nip te the im es to ie the
mo e es rom rio s perspe ti es re ires
mo e r r phi s is i tion too
o to the tr t re ome n e
https n i n m nih o sites entre d tr t re
ter the ode in the se r h o nd press the o tton
i str t re im e to ess its re ord p e
ro to the mo e r r phi se tion nd i on the spin i on to
o d n inter ti e ie o the str t re ithin the e p e

ne protein one tr e str t re s to oo t it sin mo e r is i tion too


in is or he i nd e o or sheet
sho the m in toms in o the protein s mino ids
o n see ho m h sp e e h tom t o pies
his is osest to the protein s t sh pe
the det i s o the di erent mino ids h e een remo ed
nd th t rem ins is sti dr in sho in here the protein s one oes
the indi id mino id str t res h e in een
remo ed t the ttened ri on re s hi h i ht p es here the mino ids ome
to ether to orm spir s he i es or sheets his m es it e sier to is i e import nt
se ond r str t res in the protein
initio sed hi h m e se o sin e se en e omo o sed hi h m e se o m tip e
in orm tion se en e i nment in orm tion
he methods hi h e on to e r he homo o sed methods do not re on
ener tion methods predi t se ond r str t res st tisti s o resid es o sin e se en e t on
sed on st tisti tions o the resid es o ommon se ond r str t r p tterns onser ed
sin e er se en e mon m tip e simi r se en es
t me s res the re ti e propensit t r he ide ehind this ppro h is th t ose protein
tenden o e h mino id e on in to ert in homo o s sho d dopt the s me se ond r nd
se ond r str t re e ement terti r str t re
he propensit s ores re deri ed rom no n his homo o sed method h s he ped impro e
r st str t res the predi tion r nother o er the
mp es o initio predi tion re the ho se ond ener tion methods
sm n nd rnier s thorpe o son
orithm

re se en e to
etermines the propensit or intrinsi tenden o str t re e e nd str t re str t re e e to predi t
e h resid e to e in the he i str nd nd t rn se ond r str t re se
on orm tion sin o ser ed re en ies o nd in se o ses on predi tin h dro en onds he
protein r st str t res se o the e o tion r in orm tion he d m tip e
http st io h ir ini ed st ho s htm se en e i nment in re ses the predi tion r
ses t o e e s o e r et or s tr ined
net or s

You might also like