Human Variation Disease Handson
Human Variation Disease Handson
The Gene database is the central place to obtain biomolecular data related to a
particular gene. This exercise shows how to find data (sequence, SNPs, and
phenotypes) associated with the human apolipoprotein E gene and maps variants
associated with disease onto the genomic sequence for the human reference genome
and J. Craig Venter’s personal genome.
Use the query APOE in the PubMed database. This search triggers the gene sensor
one of the discovery related sensors that provides rapid access to potentially more
relevant records from the Entrez Gene database.
Follow the link to the human APOE gene (Gene ID: 348) in the gene sensor ad.
The Gene record provides rapid access to NCBI reference sequence transcript, protein,
and genomic sequences including three different genome builds as shown in the
Sequences, Genomes and Maps module. The graphical sequence viewer allows rapid
access to the various features on these reference sequences.
The Gene record also provides access to phenotypes associated with variations in the
gene. Use the Table of Contents to jump to the Phenotypes section of the record.
2 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
The data in this table come from NHGRI’s compiled results of Genome Wide
Association Studies and from traditional sources such as Online Mendelian Inheritance
in Man (OMIM). Click open the Alzheimer disease-2 link and follow the link to MIM:
104310
OMIM contains literature reviews on human diseases with genetic components as well
as separate articles on genes. MIM:104310 is a phenotype article about late-onset
Alzheimer’s disease, associated with the risk allele called APOE-E4. Follow the link in
the first paragraph of the Alzheimer disease-2 article to MIM:107741.
Use the Table of Contents to jump to the Allelic Variants section of the record and scroll
down to .0016 ALZHEIMER DISEASE 2 [APOE, CYS112ARG ]
3 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Retrieve the linked SNP summary. The SNP summary shows the source and provides
analysis for the RefSNP record. The RefSNP is a non-redundant record that
summariszed information for a number of submitted SNP records. Jump to the
Submitted records section of the RefSNP record. Notice that there are 25 submitted
SNPs. A SNP submission contains the variable residue and flanking sequence. The
longest submitted SNP record (ss76884559) was used to create the RefSnp record
(rs429358).
Look at the integrated maps section of the SNP report. Notice that the both the Celera
and HuRef (JC Venter) genomes both have the ‘C’ allele at this position while the
reference genome, GRC37, has the ‘T’ allele.
4 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
The Gene View section shows that the ‘T’ allele codes for Arg at the corresponding
position in the protein. This is the risk allele for late onset Alzheimer disease.
A more detailed Gene View is available by clicking the “Go” button in the Gene View
section of the report. First check the box for “include clinically associated” and then click
“Go”.
This view is also available through the “SNP:GeneView” from the Gene record.
5 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
The more detailed Gene View maps all SNPs onto the sequences for the APOE gene
and indicates the effect on the coding regions. The radio buttons allow adjusting the
nature of the SNPs displayed; coding region only, near the gene, whether there is
population genetic data. Find rs429358 in the Gene View table.
This residue change is only one position that contributes to the e-4 allele. The other
polymorphic position is represented by rs7412, another Cys/Arg change. Both of the
most common alleles, e-3 – in the reference genome, and e-4 – in the JC Venter
genome, have an Arg in this position.
The reference genome and the JC Venter genome both have the Arg in this position.
Another view that emphasizes clinically associated variations is available through the
Variation Viewer, linked through the VarView icon at the upper right of the Gene View.
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/
To see the distribution of genotypes for rs429358, enter the position of this variant on
chromosome 19 in the search box chr19: 45,411,941.
You can also link to the browser from the RefSNP cluster report for rs429358 as shown
immediately below.
6 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Expand rows for populations to see individual level genotypes, for example the Luhya,
from Kenya, as shown above.
7 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
-Search literature citations in PubMed, using the disease name and the name of an
expert investigator.
-Select PMID: 19277060, which will provide a general background on the disease. You
can go to the free article at the publisher’s site and view Table 1 for genetic
classification of the disease. Numerous variants/genes can lead to CMT-type.
For the purpose of this exercise, we are going to follow a discovery (Select PMID:
17572665 and link to Free-text in PMC) of one of the disease genes, namely FIG4, and
simulate some steps of the discovery process on the NCBI web pages.
In a mouse colony, the mutant mice had impaired motor coordination and muscle
weakness. The phenotype resembled human CMT. By genetic mapping, the location of
the mutation was established to be in the genomic region of two markers (microsatellite
and SNP): D10Umi13 and D10Mit184. D10Mit184 is annotated on the mouse reference
genomic sequence. D10Umi13 is not annotated, but the primers are listed within the
article.
Step two: Investigation of the genomic region between D10Umi13 and D10Mit184
(NCBI website):
What genes/transcripts are annotated for the region between the two markers?
8 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
To address this question, find the location of D10Umi13 on the mouse genome.
-Use primer information provided in the article and Primer BLAST to locate D10Umi1 on
the genomic contig
Forward: 5′-CCACCACATCAACAGGCTCACAGG
Reverse: 5′-AATGCAACCGTGACACAAGTACAC
-Access the genomic region (Nucleotide database) and display it graphically (Display
Settings à Graphics).
-Find D10Mit184 in the surrounding genomic region and interrogate the region between
these two markers. For example, locate the Fig4 (A530089I17Rik) transcript.
What type of sequence information (e.g. genomic, mRNA, protein) can you obtain for
the Fig4 (A530089I17Rik) transcript? (Do you see exon numbering on the genomic
sequence?) How could a researcher use this information in an experiment?
In the study, reverse transcription polymerase chain reaction (RT-PCR) was used as a
method to analyze transcripts from brain tissue of healthy and diseased mice. Although,
individual primers are not listed in the study, we can assume that several exon-specific
primers were generated for each candidate transcript. The group has eventually
demonstrated that Fig4 (A530089I17Rik) transcript was truncated after exon 18 and
further established that the causative factor was an insertion of a transposon in the
genomic region of this gene.
In the published study, a mouse model was used to mimic a human neurological
disorder. The genetic basis of the mouse disease helped focusing on a single-gene in
CMT- individuals lacking clinical variants in already known CMT-associated genes.
Having the FIG4 gene as the candidate in human disease, the investigators were able
to sequence this gene in affected individuals and find single nucleotide variants that
introduced a stop codon in the genomic regions of exon 4 and therefore resulted in
protein truncation. However, this mutation resulted in the disease phenotype *only*
when accompanied by a single nucleotide variant I41T.
To address this question, link to the mouse Gene record and then link to the
Homologene database.
Link from Homologene to the human Gene record. Find information/links pertaining to
disease phenotypes and variation.
Find the ARG183TER variant within the protein sequence. Is this variant reported in the
NCBI SNP database as a clinically associated variant?
-Search literature citations in PubMed using the disease name and one of the expert
investigators: Charcot-marie-tooth disease AND lupski[au]
-Select PMID: 19277060 which will provide a general background on the disease.
10 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
- Go to the free article in PubMed Central. Table 1 in the article has genetic
classification of the disease indicating that numerous variants/genes can lead to the
disease phenotype.
Use primer information provided in the article and Primer BLAST to locate D10Umi1 on
the genomic contig
Forward: 5′-CCACCACATCAACAGGCTCACAGG
Reverse: 5′-AATGCAACCGTGACACAAGTACAC
12 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
- Mark/lock the region -- use a right-mouse click on an empty area of the screen to get
the marker option.
14 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Find D10Mit184 in the surrounding genomic region by using the search box that is
Human_Variation_Disease_Genes
15 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
NCBI_BLAST
Select a feature from the region, in this case Lace1. Since the feature is on the reverse
strand, the coordinates are negative values. Use the Tools menu to flip the strands.
Note the D10Mit184 annotation in the STS markers track. Mark/lock the position.
16 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
To preserve this page with locked markers, use the “Link To this Page” option:
The resulting URL specifies the settings for this view. The tiny URL is provided to easily
bookmark or share this complex URL.
17 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Investigate the region between these two markers. For example, locate the Fig4
(A530089I17Rik) transcript. There are several ways to navigate the graphic display. For
instance you can slide along the genome – the top ruler has a sliding window or the
features window scrolls by simply clicking the screen and dragging. You can also zoom
in/out on the region using the zoom slider.
18 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Click on the Configure menu. Set the Rendering options for the Gene track to show all.
19 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
The transcript, NM_133999.1, is represented in blue, with bars indicating exons (no
numbering provided), and arrows indicating the orientation. Note the accession formats
of the sequences that you find: NT_, NM_, NP_ Right click on the mRNA bar to access
Properties menu or Views & Tools.
20 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
21 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
The Sequence Text View, available through the Tools menu allows displaying the
sequence with or without translation and provides a text version of the sequence that
can be selected and copied.
23 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Close the sequence window click on the mRNA graphic and select the link to the Gene
record that is available through the Properties or Views and Tools links.
24 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
From the mouse Gene record, link to the HomoloGene database that lists homologous
genes and sequences of several completely sequenced eukaryotic genomes.
25 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
26 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Determine the protein percent identity between the mouse and human sequences.
Explore the information in the record to learn more about the gene. Within the record,
find information/links pertaining to the CMT disease.
28 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Under the Links menu, find the “RefSeqGene” link. RefSeqGene is a genomic
sequence record for the gene. The RefSeqGene records are generated in part to
provide for a standard and stable genomic coordinate system for reporting of mutations.
29 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
You can explore the NG_007977.1 record in the graphical sequence viewer. Notice that
exon numbering is provided in this record. Find exon 6 and single nucleotide
polymorphisms annotated for this exon.
30 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
31 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
32 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
Expand the Table of Contents on the record and jump to the Allelic Variants section.
Perform the same tasks performed above for APOE using the human mitochondrial
short chain acetyl-CoA dehydrogenase (ACADS). Use the NCBI Website to find the
following information
Step one: Investigation of the genomic region between D2S156 and D2S399
(NCBI website):
A candidate gene region for generalized epilepsy with febrile seizures was established
(by radiation-hybrid panel) between markers D2S156 and D2S399 on human
chromosome 2, NC_000002.
1. What genes/transcripts are annotated for the region between the two markers?
-Access the genomic region (the NC_000002 record in the Nucleotide database) and
display it graphically (Display Settings à Graphics).
-Search for D2S156, and mark/lock its position. Reverse strand, so that the genomic
coordinates are positive.
-Locate the SCN1A transcript (that the above study identified as a causative disease
gene).
1. Learn more about the SCN1A human gene (e.g. what type of complexes are
voltage-sensitive sodium channels). Find information/links pertaining to
disease phenotypes and variation.
-Link to the Gene record and explore the Summary, Bibliography and Phenotypes
section of the record.
2. What is the accession number of the *genomic* sequence associated with this
gene?
-Return to the Gene record. The Table of Contents allows navigation within the record.
Navigate to the “Reference sequences” section of the record, and find the genomic
record. Note the accession format and the length of the sequence.
3. In the graphic display of the above genomic record, find exon 4. Assume that
an investigator wants to assess 20 bp of surrounding intronic region. How can
you obtain this sequence?
4. Is there clinically associated variants reported in the SNP database? How is
each reported SNP represented in the table?
To answer this question, return to the Gene record and link to the SNP database, using
the SNP: GeneView link. On the resulting page, select the “Include clinically associated”
check box and refresh the page.
5. Link to the rs28934003 entry. Note the variant’s nomenclature under the HGVS
table. What is the importance of representing the same variants on different
sequences?
6. Link to the VarView display. Was this variant reported to NCBI by a locus
specific database (LSDB)?
35 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
http://tinyurl.com/2wdvtor
http://tinyurl.com/3y3qmkj
39 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
40 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
41 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
42 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
43 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012