Pr1 Biological databases practical
Pr1 Biological databases practical
1. Using the National Center for Biotechnology Information (NCBI) GenBank to find
information on nucleotide sequences (15 marks)
Then, search for “5.3 class delta-endotoxin gene” using the search bar. This will result in
multiple hits. Filter the results by selecting Bacillus thuringiensis as the organism in the
“Top organisms” list on the right side. Finally, select the best hit for your search query
based on the name (Usually, this is the top hit, but this may change in some cases).
Answer the following questions based on the best hit record.
a. What are the GenBank accession and the version for the record? (2 marks)
GenBank: M37263.1
b. What are the length and the type of the nucleotide sequence? (2 marks)
c. Give the title of the main reference for this record. (1 mark)
Sequence of a lepidopteran toxin gene of Bacillus thuringiensis
subsp kurstaki NRD-12
1428
1
MAM 5108: Microbial Bioinformatics
2 Dr. Pasan Fernando
e. What are the sequence coordinates for the coding sequence? (1 mark)
f. What is the NCBI protein ID for encoded protein from the nucleotide sequence?
(1 mark)
AAA22420.1
g. Access the corresponding protein record by clicking on the protein ID. What is
the length of the amino acid sequence according to the protein record? (1 mark)
1155
h. Give the names and the amino acid sequence coordinates of three distinct
regions of the protein. (6 marks)
48..251
/region_name="Endotoxin_N"
259..460
/region_name="Endotoxin_M"
463..606
/region_name="delta_endotoxin_C"
2. Using the NCBI Gene database to find information about genes (15 marks)
Now, search for “endotoxin” in the search bar. This will result in multiple hits. Filter the
results by selecting Bacillus thuringiensis as the organism in the “Top organisms” list on
the right side. Then, click on the “BTHUR0008_RS29305” gene and access the gene
record.
2
MAM 5108: Microbial Bioinformatics
3 Dr. Pasan Fernando
a. What are the Gene ID and gene symbol for this record? (2 marks)
Gene ID: 67470296
BTHUR0008_RS29305
protein coding
Bacillus thuringiensis serovar berliner ATCC 10792 (strain: ATCC 10792, serovar:
berliner, culture-collection: ATCC:10792, type-material: type strain of Bacillus
thuringiensis)
d. Give the gene symbols of two adjacent genes to this particular gene in its
genome. (2 marks)
BTHUR0008_RS3434
BTHUR0008_RS29310
f. Access the RefSeq genomic sequence for the gene. What is the GenBank
accession number of this RefSeq genomic sequence (with the sequence region)?
What is the original sequence record corresponding to this genomic sequence?
(4 marks)
g. List two reasons for preferring the NCBI Gene database over the NCBI GenBank
when retrieving gene sequences (2 marks)
3
MAM 5108: Microbial Bioinformatics
4 Dr. Pasan Fernando
Now, use the UniProtKB ID you found in question 2(h) to find the corresponding protein
record and answer the following questions based on the record.
a. What are the protein and gene names as given in the record? (2 marks)
c. Give one molecular function Gene Ontology (GO) term and one biological
process GO term associated with this protein. (2 marks)
e. Is a 3-dimensional structure available for this protein? If it is, what is the source
of the structure? (2 marks)
AlphaFoldDB
f. What is the protein family that contains this particular protein? (1 mark)
4
MAM 5108: Microbial Bioinformatics
5 Dr. Pasan Fernando
h. Give sequence coordinates of a region where polar amino acid residues are
overrepresented in this protein sequence. (1 mark)
423-439
i. Give the accession numbers of the European Nucleotide Archive (ENA) and the
Protein Information Resource (PIR) databases for this record. (2 marks)
ENA :CP004134
PIR : S00873
j. Give the UniProt IDs and the organism names for two similar proteins which
have 100% identity with this protein sequence. (2 marks)
M1QWV7_BACTU
Bacillus thuringiensis serovar thuringiensis str. IS5056
A3RLZ7_BACTU
Bacillus thuringiensis
k. Access the Pfam entry for this particular protein using the cross-references
section. According to the Pfam record, list four distinct Pfam domains found in
this protein. (4 marks)
Cry1Ac_D5
Endotoxin_C
Endotoxin_C2
Endotoxin_M
5
MAM 5108: Microbial Bioinformatics
6 Dr. Pasan Fernando