lecture2-BGGN213 F17
lecture2-BGGN213 F17
Sections 1 and 2 deal with querying and searching GenBank, GENE and OMIM
databases at NCBI. Sections 3 and 4 provide exposure to EBI resources for comparing
proteins and visualizing protein structures. Finally, section 5 provides an opportunity to
explore these and other databases further with additional examples.
Section 1
The following transcript was found to be abundant in a human patient’s blood sample.
>example1
ATGGTGCATCTGACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
TTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
GGATCTGTCCACTCCTGATGCAGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
GCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
TCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTT
The only information you are given is the above sequence so you must begin your
investigation with a sequence search - for this example we will use NCBI’s BLAST
service at: http://blast.ncbi.nlm.nih.gov/
Note that there are several different “basic BLAST” programs available at NCBI
(including nucleotide BLAST, protein BLAST, and BLASTx).
Page 1
Searching against the “Nucleotide collection” (NR database) that includes GenBank is
a good place to start your investigation of this sequence.
Q2: What are the names and accession numbers of the top four hits from your
BLAST search?
Q3: What are the percent identities for the top few hits?
[HINT: scroll down to the alignment section of your BLAST result page for details of
matched nucleotides]
Q4: How many identical and non identical nucleotides are there in your top hit
compared to your last reported hit?
From the results of your BLAST search you can link to the GENE entry for one of your
top hits. This link is located under the “Related Information” heading at the right hand
side of each displayed alignment (i.e. scroll down to the “Alignments” section).
Q5: What is the “Official Symbol” and “Official Full Name” for this gene?
Q8: How many exons and introns are annotated for this gene?
Q10: Does the protein have a role in human disease(s)? If so what diseases?
[HINT: Scroll down to the “Phenotypes” section of the GENE entry page and also
explore the link to the OMIM database]
Section 2.
By now you should be aware that the example sequence corresponds to human sickle
cell beta-globin mRNA and that this disease results from a point mutation in the β globin
gene. In the following section, you will compare sickle cell and normal β globin
sequences to reveal the nature of the sickle cell mutation at the protein level.
To do this you need to find at least one sequence representing the normal beta globin
gene. Open a new window and visit the NCBI home page (http://www.ncbi.nlm.nih.gov)
Page 2
and select “Nucleotide” from the drop menu associated with the top search box. Then
enter the search term: HBB
Note that lots of irrelevant results are returned so lets apply some “Filters” (available by
clicking in the left-hand sidebar) to focus on RefSeq entries for Homo sapiens.
Remember that we are after mRNA so we can compare to the mRNA sequence from
section 1 above.
Q11. What is the ACCESSION number of the “Homo sapiens hemoglobin, beta
(HBB), mRNA” entry?
NOTE: Boolean operators (NOT, AND, OR) as well as fielded queries (i.e. “HBB[Gene
Name] AND Human[Organism]” ) can be used in ENTREZ searches to filter results
for more efficient searching.
Select “Homo sapiens hemoglobin, beta (HBB), mRNA” from the results and scroll down
to the FEATURES section to answer the following.
Section 3.
Here we will compare the retrieved sequences by creating a sequence alignment. This
will make the difference between the two sequences easy to spot.
Page 3
To generate the alignment, we will use MUSCLE available on the EBI website
at: http://www.ebi.ac.uk/Tools/msa/muscle/
Select the FASTA display for the “Homo sapiens hemoglobin, beta (HBB),
mRNA” (NM_000518) entry from section 2.
The two sequences should now be aligned. Where the aligned sequences are identical,
an * is placed under the alignment. Examine the results and note that your sequences
are nearly identical. However, being much shorter, the sickle cell sequence has many
padding gap characters (-----) to bring equivalent regions into the correct register.
You can also click on the “Results Summary” tab and launch the JalView plugin to
display a colored version alignment.
Q14: How many gap characters (-) are added to the beginning of the sickle cell beta-
globin sequence in order to align it with the beta globin sequence? How might you
have guessed this number from information you read in the GenBank annotation?
[HINT: See section 2, Q13]
Q15: Ignoring ambiguity codes (Y and N), what is the single difference between the
two sequences?
Page 4
Q16: Which codon position from the start of the sickle cell sequence would this
difference affect? What amino acid would the different codons encode in the two
sequences?
[HINT: use the codon table above to help]
Section 4
In this section we will retrieve and visualize the 3D protein structure of sickle cell
haemoglobin. The aim here is to ascertain how the Glu6 -> Val6 mutation might cause
the mutant molecules to oligomerise into fibers, hence deforming erythrocytes. This will
require you to examine the structural context of the mutation in the β globin chains.
We could find sickle cell haemoglobin structures via a text search of main PDB website
@ http://www.rcsb.org/. However, as we know the nucleotide sequence from our
previous work, lets use BLASTX to search the PDB database from the NCBI site.
Note the accession numbers and alignment statistics for the top few hits.
Q17: Is there a PDB structure with 100% identity to your example1 query sequence?
Page 5
For this section we will use the online NGL Viewer, which has more advanced
display options than the viewers currently available at NCBI or the PDB itself.
Page 6
Now we can display the
porphyrin by entering
“not polymer and not
water” in the white box.
To have a close look at
Val 6 in Chain H, we
can add “or 6:H” here
also.
NOTE: Some folks have reported issues using NGL with older versions of the Chrome
browser. The workaround is to use a different web browser. If, the structure is still not
displayed correctly for you, download its coordinates from the PDB database at:
http://www.rcsb.org/ and ask for assistance.
If deemed appropriate, and you are working on your own computer, you may consider
updating your version of JAVA by downloading from:
https://www.java.com/en/download/manual.jsp
Try and zooming (via scrolling up and down) and rotating (via clicking and moving your
mouse). You can always “reset” the view by clicking the target like circular icon. Also
experiment with different settings and views.
Q18: What do you notice about the location of the Val6 residue in chain H of the
2HBS structure in relation to porphyrin?
[HINT: see Figure below.]
In this representation,
one of the central
mutant β chains is
highlighted in orange
ribbon. Also
highlighted is the side
chain of the E6V (i.e.
Val6) mutation (white)
and porphyrin
prosthetic group (ball
and stick
representation).
Page 7
Discussion:
The original paper discussing the 2HBS crystal structure is available online:
http://www.sciencedirect.com/science/article/pii/S0022283697912535
In this article, Figure 3 demonstrates how the Glu6->Val6 mutation could result in the
characteristic "sickle" phenotype. The charged Glu6 mutating to Val6 creates a
superficial hydrophobic patch on one HbS molecule that interacts with hydrophobic
surface residues of another. The molecules thus polymerize, creating extended fibers
that distort the shape of the red blood cell.
The sickled blood cells have a short lifetime and cannot be replaced fast enough,
leading to chronic anaemia. Sickle cell anemia was one of the first diseases to be linked
to a defect at the molecular level, providing a clear demonstration that a single base
mutation can change a single amino acid, which in turn can result in a defective protein.
Section 5 (Optional)
Pick one of the following three genes to investigate. Again the only identifying
information you are given is a nucleotide or peptide sequence. Use the various NCBI
and EBI resources to answer questions 5 to 10 from section 1. However, do not limit
yourself to these five questions as you may find other directions of exploration more
interesting. As always, please ask for assistance if you get stuck.
>Transcript 1
gccactgccaacatttcccttcttccagttgcactattctgagggaaaatctgacaccta
agaaatttactgtgaaaaagcattttaaaaagaaaaggttttagaatatgatctatttta
tgcatattgtttataaagacacatttacaatttacttttaatattaaaaattaccatatt
atgaaattgctgatagta
Page 8
Gene 2 – the following cDNA sequence maps to a human genomic location identified by
mapping as being of interest.
>Transcript 2
ctgcgagaagagcagcgacacttgcaaccccctgtcaggcgccttctcaggagtgtccaa
cattttcagcttctggggggacagtcggggccgccagtaccaggagctccctcgatgccc
cgcccccacccccagcctcctcaacatccccctctccagcccgggtcggcggccccgggg
cgacgtggagagcaggctggatgccctccagcgccagctcaacaggctggagacccggct
gagtgcagacatggccactgtcctgcagctgctacagaggcagatgacgctggtcccgcc
Gene 3 – The following peptide was found to be more abundant in human patient
samples vs. control samples by mass spec analysis.
>Peptide 3
achpcspmck
Q19: What one part of this exercise or associated lecture material is still confusing?
If appropriate please also indicate the question number from this document and
answer the question in the following anonymous form:
http://tinyurl.com/bggn13-02
[Your comments will let us know which material needs to be further clarified and will
help us gain stronger control of the material in this course. Thank you!]
Page 9
Appendix
http://www.rcsb.org/pdb/files/2hbs.pdb
The mutation causing sickle cell anemia is a single nucleotide substitution (A to T) in the
codon for amino acid 6. The change converts a glutamic acid codon (GAG) to a valine
codon (GTG). Changing a hydrophilic amino acid to a hydrophobic one, see http://
themedicalbiochemistrypage.org/sicklecellanemia.php
Note there is also a T -> A difference at position 162 (162/3 => codon 54 GCT -> GCA).
This is in the third position of the codon and hence does not change the corresponding
amino-acid.
Page 10