Bioinformatics Notebook: By: Abdul Hannan Malik
Bioinformatics Notebook: By: Abdul Hannan Malik
BCS-8B
By:
Graphic Summary 19
T-Coffee 21
ClustalW 22
What is NCBI?
NCBI is one of the leading information repositories when it comes to understanding the
language of human cells, ie., how they are made and what they do. There are only 4 letters,
through which millions, if not, trillions of individual living organisms have been composed.
With such a large pool of data, it is difficult to keep track of the various functions and
Thus, enter molecular biology, a trade that allowed us to make a sense of these “alphabets”
derived into words and phrases. The challenge that is presented is to find new innovations
and solutions to handle the large scale of data and relay this to researchers with better
access to analysis and computing to advance the principles of genetic legacy for ourselves.
Basic Research
and disease.
● NCBI has also created automated systems for storing and analyzing molecular
biology.
Responsibilities
● Maintain collaborations with several NIH institutes, academia, industry and other
governmental agencies.
series.
4
Structure of NCIB
● The Computational Biology Branch conducts basic and applied search in multiple
fields of molecular biology and genetics, which includes genome analysis, sequence
● The Information Resources Branch plans, directs and manages technical operations
of NCBI, including the computer systems used for research and development,
biology and genetics, through sponsoring meetings, workshops and lecture series. A
Scientific Visitors Program has been established to foster collaborations with extramural
scientists.
BioNLP
IBIS
LogOddsLogo
MutaBind
MutaGene
SNPDelScore
Structure
6
GenBank:
Collaboration), with other known organizations such as DDBJ, ENA and NCBI.
● A GenBank release occurs every two months and is available from the ftp site.
● The release notes for the current version of GenBank provide detailed information
Accessing:
● Search GenBank for sequence identifiers and annotations with Entrez Nucleotide.
● Search and align GenBank sequences to a query sequence using BLAST (Basic Local
Alignment Search Tool). See BLAST info for more information about the numerous
BLAST databases.
● The ASN.1 and flat file formats are available at NCBI's anonymous FTP server.
Confidentiality
Some authors are concerned that the appearance of their data in GenBank prior to
publication will compromise their work. GenBank will, upon request, withhold release of
new submissions for a specified period of time. However, if the accession number or
sequence data appears in print or online prior to the specified date, your sequence will be
released. In order to prevent the delay in the appearance of published sequence data, we
available, please send the full publication data--all authors, title, journal, volume, pages and
Privacy
If you are submitting human sequences to GenBank, do not include any data that could
reveal the personal identity of the source. GenBank assumes that the submitter has
sequences.
Europe-wide, global impact, infinite curiosity. The European Molecular Biology Laboratory
Governance
EMBL Council comprises national representatives of our member states and is EMBL’s
EMBL pursues five missions in research, services, training, technology transfer and policy
development. Our five-year plans are set out in our Scientific Programme.
Our public engagement and outreach activities seek to ensure wider awareness and
inclusive research and work culture, including the provision of independent and impartial
Mission
It is generally accepted that research in biology today requires both computer and
experimental equipment equally well. Information achieved from enormous exhaustive
data has greatly contributed to the paradigm shift in biology.
10
In silico and in vitro / in vivo analyses together will push back the frontiers of life sciences.
In particular, researchers in life science must rely on computers to analyze nucleotide
sequence data accumulating at a remarkably rapid rate.
DDBJ Center is to play a major role in carrying out research in information biology and to
run DDBJ operations in the world.
Nucleotide sequence records organismic evolution more directly than other biological
materials and thus is invaluable not only for research in life sciences but also human
welfare in general. The database is, so to speak, a common treasure of human beings. With
this in mind, we make the database online accessible to anyone in the world.
Governing Structure
DDBJ Center is in operation at Research Organization of Information and System National
Institute of Genetics(NIG) in Mishima, Japan with endorsement of MEXT; Japanese Ministry
of Education, Culture, Sports, Science and Technology.
DDBJ Center is reviewed and advised by its own advisory board, DNA Database Advisory
Committee (an outside committee of NIG), and also by the advisory board to INSDC,
International Advisory Committee.
11
Procedure
1. Open https://www.ncbi.nlm.nih.gov website.
a. Select nucleotide category and enter AE008975.
b. Click on FASTA and copy some kb of data.
2. Open NCB site, go to ORF Finder.
3. Paste the copied sequence and search.
4. Observe Start Codon and ORF Found.
5. Start Codon = ATG, ORF’s = 14.
Output
12
13
Output
14
Sequence
Predicted protein(s): >FGENESH:[mRNA] 1 1 exon (s) 1083 - 4811 3729 bp, chain +
ATGGCGAGCCCTCCGGAGAGCGATGGCTTCTCGGACGTGCGCAAGGTGGGCTACCTGCGC
AAACCCAAGAGCATGCACAAACGCTTCTTCGTACTGCGCGCGGCCAGCGAGGCTGGGGGC
CCGGCGCGCCTCGAGTACTACGAGAACGAGAAGAAGTGGCGGCACAAGTCGAGCGCCCCC
AAACGCTCGATCCCCCTTGAGAGCTGCTTCAACATCAACAAGCGGGCTGACTCCAAGAAC
AAGCACCTGGTGGCTCTCTACACCCGGGACGAGCACTTTGCCATCGCGGCGGACAGCGAG
GCCGAGCAAGACAGCTGGTACCAGGCTCTCCTACAGCTGCACAACCGTGCTAAGGGCCAC
CACGACGGAGCTGCGGCCCTCGGGGCGGGAGGTGGTGGGGGCAGCTGCAGCGGCAGCTCC
GGCCTTGGTGAGGCTGGGGAGGACTTGAGCTACGGTGACGTGCCCCCAGGACCCGCATTC
AAAGAGGTCTGGCAAGTGATCCTGAAGCCCAAGGGCCTGGGTCAGACAAAGAACCTGATT
GGTATCTACCGCCTTTGCCTGACCAGCAAGACCATCAGCTTCGTGAAGCTGAACTCGGAG
GCAGCGGCCGTGGTGCTGCAGCTGATGAACATCAGGCGCTGTGGCCACTCGGAAAACTTC
TTCTTCATCGAGGTGGGCCGTTCTGCCGTGACGGGGCCCGGGGAGTTCTGGATGCAGGTG
GATGACTCTGTGGTGGCCCAGAACATGCACGAGACCATCCTGGAGGCCATGCGGGCCATG
AGTGATGAGTTCCGCCCTCGCAGCAAGAGCCAGTCCTCGTCCAACTGCTCTAACCCCATC
AGCGTCCCCCTGCGCCGGCACCATCTCAACAATCCCCCGCCCAGCCAGGTGGGGCTGACC
CGCCGATCACGCACTGAGAGCATCACCGCCACCTCCCCGGCCAGCATGGTGGGCGGGAAG
CCAGGCTCCTTCCGTGTCCGCGCCTCCAGTGACGGCGAAGGCACCATGTCCCGCCCAGCC
TCGGTGGACGGCAGCCCTGTGAGTCCCAGCACCAACAGAACCCACGCCCACCGGCATCGG
GGCAGCGCCCGGCTGCACCCCCCGCTCAACCACAGCCGCTCCATCCCCATGCCGGCTTCC
CGCTGCTCGCCTTCGGCCACCAGCCCGGTCAGTCTGTCGTCCAGTAGCACCAGTGGCCAT
GGCTCCACCTCGGATTGTCTCTTCCCACGGCGATCTAGTGCTTCGGTGTCTGGTTCCCCC
15
AGCGATGGCGGTTTCATCTCCTCGGATGAGTATGGCTCCAGTCCCTGCGATTTCCGGAGT
TCCTTCCGCAGTGTCACTCCGGATTCCCTGGGCCACACCCCACCAGCCCGCGGTGAGGAG
GAGCTAAGCAACTATATCTGCATGGGTGGCAAGGGGCCCTCCACCCTGACCGCCCCCAAC
GGTCACTACATTTTGTCTCGGGGTGGCAATGGCCACCGCTGCACCCCAGGAACAGGCTTG
GGCACGAGTCCAGCCTTGGCTGGGGATGAAGCAGCCAGTGCTGCAGATCTGGATAATCGG
TTCCGAAAGAGAACTCACTCGGCAGGCACATCCCCTACCATTACCCACCAGAAGACCCCG
TCCCAGTCCTCAGTGGCTTCCATTGAGGAGTACACAGAGATGATGCCTGCCTACCCACCA
GGAGGTGGCAGTGGAGGCCGACTGCCGGGACACAGGCACTCCGCCTTCGTGCCCACCCGC
TCCTACCCAGAGGAGGGTCTGGAAATGCACCCCTTGGAGCGTCGGGGGGGGCACCACCGC
CCAGACAGCTCCACCCTCCACACGGATGATGGCTACATGCCCATGTCCCCAGGGGTGGCC
CCAGTGCCCAGTGGCCGAAAGGGCAGTGGAGACTATATGCCCATGAGCCCCAAGAGCGTA
TCTGCCCCACAGCAGATCATCAATCCCATCAGACGCCATCCCCAGAGAGTGGACCCCAAT
GGCTACATGATGATGTCCCCCAGCGGTGGCTGCTCTCCTGACATTGGAGGTGGCCCCAGC
AGCAGCAGCAGCAGCAGCAACGCCGTCCCTTCCGGGACCAGCTATGGAAAGCTGTGGACA
AACGGGGTAGGGGGCCACCACTCTCATGTCTTGCCTCACCCCAAACCCCCAGTGGAGAGC
AGCGGTGGTAAGCTCTTACCTTGCACAGGTGACTACATGAACATGTCACCAGTGGGGGAC
TCCAACACCAGCAGCCCCTCCGACTGCTACTACGGCCCTGAGGACCCCCAGCACAAGCCA
GTCCTCTCCTACTACTCATTGCCAAGATCCTTTAAGCACACCCAGCGCCCCGGGGAGCCG
GAGGAGGGTGCCCGGCATCAGCACCTCCGCCTTTCCACTAGCTCTGGTCGCCTTCTCTAT
GCTGCAACAGCAGATGATTCTTCCTCTTCCACCAGCAGCGACAGCCTGGGTGGGGGATAC
TGCGGGGCTAGGCTGGAGCCCAGCCTTCCACATCCCCACCATCAGGTTCTGCAGCCCCAT
CTGCCTCGAAAGGTGGACACAGCTGCTCAGACCAATAGCCGCCTGGCCCGGCCCACGAGG
CTGTCCCTGGGGGATCCCAAGGCCAGCACCTTACCTCGGGCCCGAGAGCAGCAGCAGCAG
CAGCAGCCCTTGCTGCACCCTCCAGAGCCCAAGAGCCCGGGGGAATATGTCAATATTGAA
TTTGGGAGTGATCAGTCTGGCTACTTGTCTGGCCCGGTGGCTTTCCACAGCTCACCTTCT
GTCAGGTGTCCATCCCAGCTCCAGCCAGCTCCCAGAGAGGAAGAGACTGGCACTGAGGAG
TACATGAAGATGGACCTGGGGCCGGGCCGGAGGGCAGCCTGGCAGGAGAGCACTGGGGTC
GAGATGGGCAGACTGGGCCCTGCACCTCCCGGGGCTGCTAGCATTTGCAGGCCTACCCGG
GCAGTGCCCAGCAGCCGGGGTGACTACATGACCATGCAGATGAGTTGTCCCCGTCAGAGC
TACGTGGACACCTCGCCAGCTGCCCCTGTAAGCTATGCTGACATGCGAACAGGCATTGCT
GCAGAGGAGGTGAGCCTGCCCAGGGCCACCATGGCTGCTGCCTCCTCATCCTCAGCAGCC
TCTGCTTCCCCGACTGGGCCTCAAGGGGCAGCAGAGCTGGCTGCCCACTCGTCCCTGCTG
GGGGGCCCACAAGGACCTGGGGGCATGAGCGCCTTCACCCGGGTGAACCTCAGTCCTAAC
CGCAACCAGAGTGCCAAAGTGATCCGTGCAGACCCACAAGGGTGCCGGCGGAGGCATAGC
TCCGAGACTTTCTCCTCAACACCCAGTGCCACCCGGGTGGGCAACACAGTGCCCTTTGGA
GCGGGGGCAGCAGTAGGGGGCGGTGGCGGTAGCAGCAGCAGCAGCGAGGATGTGAAACGC
16
CACAGCTCTGCTTCCTTTGAGAATGTGTGGCTGAGGCCTGGGGAGCTTGGGGGAGCCCCC
AAGGAGCCAGCCAAACTGTGTGGGGCTGCTGGGGGTTTGGAGAATGGTCTTAACTACATA
GACCTGGATTTGGTCAAGGACTTCAAACAGTGCCCTCAGGAGTGCACCCCTGAACCGCAG
CCTCCCCCACCCCCACCCCCTCATCAACCCCTGGGCAGCGGTGAGAGCAGCTCCACCCGC
CGCTCAAGTGAGGATTTAAGCGCCTATGCCAGCATCAGTTTCCAGAAGCAGCCAGAGGAC
CGTCAGTAG >FGENESH: 1 1 exon (s) 1083 - 4811 1242 aa, chain +
MASPPESDGFSDVRKVGYLRKPKSMHKRFFVLRAASEAGGPARLEYYENEKKWRHKSSAP
KRSIPLESCFNINKRADSKNKHLVALYTRDEHFAIAADSEAEQDSWYQALLQLHNRAKGH
HDGAAALGAGGGGGSCSGSSGLGEAGEDLSYGDVPPGPAFKEVWQVILKPKGLGQTKNLI
GIYRLCLTSKTISFVKLNSEAAAVVLQLMNIRRCGHSENFFFIEVGRSAVTGPGEFWMQV
DDSVVAQNMHETILEAMRAMSDEFRPRSKSQSSSNCSNPISVPLRRHHLNNPPPSQVGLT
RRSRTESITATSPASMVGGKPGSFRVRASSDGEGTMSRPASVDGSPVSPSTNRTHAHRHR
GSARLHPPLNHSRSIPMPASRCSPSATSPVSLSSSSTSGHGSTSDCLFPRRSSASVSGSP
SDGGFISSDEYGSSPCDFRSSFRSVTPDSLGHTPPARGEEELSNYICMGGKGPSTLTAPN
GHYILSRGGNGHRCTPGTGLGTSPALAGDEAASAADLDNRFRKRTHSAGTSPTITHQKTP
SQSSVASIEEYTEMMPAYPPGGGSGGRLPGHRHSAFVPTRSYPEEGLEMHPLERRGGHHR
PDSSTLHTDDGYMPMSPGVAPVPSGRKGSGDYMPMSPKSVSAPQQIINPIRRHPQRVDPN
GYMMMSPSGGCSPDIGGGPSSSSSSSNAVPSGTSYGKLWTNGVGGHHSHVLPHPKPPVES
SGGKLLPCTGDYMNMSPVGDSNTSSPSDCYYGPEDPQHKPVLSYYSLPRSFKHTQRPGEP
EEGARHQHLRLSTSSGRLLYAATADDSSSSTSSDSLGGGYCGARLEPSLPHPHHQVLQPH
LPRKVDTAAQTNSRLARPTRLSLGDPKASTLPRAREQQQQQQPLLHPPEPKSPGEYVNIE
FGSDQSGYLSGPVAFHSSPSVRCPSQLQPAPREEETGTEEYMKMDLGPGRRAAWQESTGV
EMGRLGPAPPGAASICRPTRAVPSSRGDYMTMQMSCPRQSYVDTSPAAPVSYADMRTGIA
AEEVSLPRATMAAASSSSAASASPTGPQGAAELAAHSSLLGGPQGPGGMSAFTRVNLSPN
RNQSAKVIRADPQGCRRRHSSETFSSTPSATRVGNTVPFGAGAAVGGGGGSSSSSEDVKR
HSSASFENVWLRPGELGGAPKEPAKLCGAAGGLENGLNYIDLDLVKDFKQCPQECTPEPQ
PPPPPPPHQPLGSGESSSTRRSSEDLSAYASISFQKQPEDRQ
Procedure
1. Go to NCBI database (https://www.ncbi.nlm.nih.gov).
2. Select Gene and go-to “Human Insulin”.
18
Output
5. Go to NCBI BLAST and paste the protein sequence in the search query.
6. In the Programs selection section, select the ‘PSI - BLAST’ Algorithm and hit the
BLAST button.
7. After the process, you will get the results in the form of a graph.
Graphic
Multiple Alignment
MSA Viewer
20
● Graphic Summary
21
Muscle
1. Go to the NCBI database (https://www.ncbi.nlm.nih.gov).
4. Repeat the same steps for any other Protein (Cat Insulin.
5. Go to Muscle MSA Tool and paste the protein sequences in the search query.
T-Coffee
1. Go to the NCBI database (https://www.ncbi.nlm.nih.gov).
4. Repeat the same steps for any other Protein (Cat Insulin.
5. Go to T-Coffee Tool and paste the protein sequences in the search query.
ClustalW
1. Go to the NCBI database (https://www.ncbi.nlm.nih.gov).
4. Repeat the same steps for any other Protein (Cat Insulin.
5. Go to ClustalW MSA Tool and paste the protein sequences in the search query.
2. Select the ‘Align’ option from the toolbar and select the ‘create new alignment’
option.
4. Then, under the ‘Edit’ option, select the ‘Input Sequence From File’ option. You can
get this sequence file from NCBI.
5. You will need to get more than one sequence in order to align them.
7. After that, go to the ‘Phylogeny’ option on the toolbar and open the file saved above
in MEGA format.
2. Select Nucleotide and search for a RNA structure (example: C.glauca symC mRNA for
haemoglobin)
c. Graphical Output
Output
Minimum Free Energy Prediction
Graphical Output