0% found this document useful (0 votes)
22 views43 pages

Human Variation Disease Handson

Uploaded by

my jw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views43 pages

Human Variation Disease Handson

Uploaded by

my jw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

1 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Guided Exercise 1: APOE and Late Onset Alzheimer Disease

The Gene database is the central place to obtain biomolecular data related to a
particular gene. This exercise shows how to find data (sequence, SNPs, and
phenotypes) associated with the human apolipoprotein E gene and maps variants
associated with disease onto the genomic sequence for the human reference genome
and J. Craig Venter’s personal genome.

Finding the human apolipoprotein E gene record

Use the query APOE in the PubMed database. This search triggers the gene sensor
one of the discovery related sensors that provides rapid access to potentially more
relevant records from the Entrez Gene database.

Follow the link to the human APOE gene (Gene ID: 348) in the gene sensor ad.
The Gene record provides rapid access to NCBI reference sequence transcript, protein,
and genomic sequences including three different genome builds as shown in the
Sequences, Genomes and Maps module. The graphical sequence viewer allows rapid
access to the various features on these reference sequences.

Phenotype and genotypes associated with APOE

The Gene record also provides access to phenotypes associated with variations in the
gene. Use the Table of Contents to jump to the Phenotypes section of the record.
2 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

The data in this table come from NHGRI’s compiled results of Genome Wide
Association Studies and from traditional sources such as Online Mendelian Inheritance
in Man (OMIM). Click open the Alzheimer disease-2 link and follow the link to MIM:
104310

OMIM contains literature reviews on human diseases with genetic components as well
as separate articles on genes. MIM:104310 is a phenotype article about late-onset
Alzheimer’s disease, associated with the risk allele called APOE-E4. Follow the link in
the first paragraph of the Alzheimer disease-2 article to MIM:107741.

Use the Table of Contents to jump to the Allelic Variants section of the record and scroll
down to .0016 ALZHEIMER DISEASE 2 [APOE, CYS112ARG ]
3 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Follow the link to the NCBI SNP database for rs429358.

Retrieve the linked SNP summary. The SNP summary shows the source and provides
analysis for the RefSNP record. The RefSNP is a non-redundant record that
summariszed information for a number of submitted SNP records. Jump to the
Submitted records section of the RefSNP record. Notice that there are 25 submitted
SNPs. A SNP submission contains the variable residue and flanking sequence. The
longest submitted SNP record (ss76884559) was used to create the RefSnp record
(rs429358).

Alleles in HuREF and GRCh37

Look at the integrated maps section of the SNP report. Notice that the both the Celera
and HuRef (JC Venter) genomes both have the ‘C’ allele at this position while the
reference genome, GRC37, has the ‘T’ allele.
4 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

The Gene View section shows that the ‘T’ allele codes for Arg at the corresponding
position in the protein. This is the risk allele for late onset Alzheimer disease.

The HuRef sequence is a haploid representation of the JC Venter genome. The


Genotypes section of the report shows that JC Venter is heterozygous for this SNP.

A more detailed Gene View is available by clicking the “Go” button in the Gene View
section of the report. First check the box for “include clinically associated” and then click
“Go”.

This view is also available through the “SNP:GeneView” from the Gene record.
5 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

The more detailed Gene View maps all SNPs onto the sequences for the APOE gene
and indicates the effect on the coding regions. The radio buttons allow adjusting the
nature of the SNPs displayed; coding region only, near the gene, whether there is
population genetic data. Find rs429358 in the Gene View table.

This residue change is only one position that contributes to the e-4 allele. The other
polymorphic position is represented by rs7412, another Cys/Arg change. Both of the
most common alleles, e-3 – in the reference genome, and e-4 – in the JC Venter
genome, have an Arg in this position.

The reference genome and the JC Venter genome both have the Arg in this position.

Another view that emphasizes clinically associated variations is available through the
Variation Viewer, linked through the VarView icon at the upper right of the Gene View.

1000 Genomes Project Browser and rs429358

Data from the 1000 Genomes Project (http://www.1000genomes.org/), a large dataset


of human sequence variation, can be viewed aligned to the human genome sequence in
the 1000 Genomes Browser at the NCBI. The browser is available at the following
address

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/

To see the distribution of genotypes for rs429358, enter the position of this variant on
chromosome 19 in the search box chr19: 45,411,941.

You can also link to the browser from the RefSNP cluster report for rs429358 as shown
immediately below.
6 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Expand rows for populations to see individual level genotypes, for example the Luhya,
from Kenya, as shown above.
7 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Guided Exercise 2: Charcot-Marie-Tooth Disease and FIG1

-Search literature citations in PubMed, using the disease name and the name of an
expert investigator.

Charcot-marie-tooth disease AND lupski[au]

-Access those citations that link to the free-full text articles.

-Select PMID: 19277060, which will provide a general background on the disease. You
can go to the free article at the publisher’s site and view Table 1 for genetic
classification of the disease. Numerous variants/genes can lead to CMT-type.

For the purpose of this exercise, we are going to follow a discovery (Select PMID:
17572665 and link to Free-text in PMC) of one of the disease genes, namely FIG4, and
simulate some steps of the discovery process on the NCBI web pages.

Step one: Description of genetic mapping of mouse disease (outside NCBI):

In a mouse colony, the mutant mice had impaired motor coordination and muscle
weakness. The phenotype resembled human CMT. By genetic mapping, the location of
the mutation was established to be in the genomic region of two markers (microsatellite
and SNP): D10Umi13 and D10Mit184. D10Mit184 is annotated on the mouse reference
genomic sequence. D10Umi13 is not annotated, but the primers are listed within the
article.

Step two: Investigation of the genomic region between D10Umi13 and D10Mit184
(NCBI website):

What genes/transcripts are annotated for the region between the two markers?
8 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

To address this question, find the location of D10Umi13 on the mouse genome.

-Use primer information provided in the article and Primer BLAST to locate D10Umi1 on
the genomic contig

Forward: 5′-CCACCACATCAACAGGCTCACAGG
Reverse: 5′-AATGCAACCGTGACACAAGTACAC

-Access the genomic region (Nucleotide database) and display it graphically (Display
Settings à Graphics).

-Mark/lock the region.

-Find D10Mit184 in the surrounding genomic region and interrogate the region between
these two markers. For example, locate the Fig4 (A530089I17Rik) transcript.

Step three: Analysis of transcripts.

What type of sequence information (e.g. genomic, mRNA, protein) can you obtain for
the Fig4 (A530089I17Rik) transcript? (Do you see exon numbering on the genomic
sequence?) How could a researcher use this information in an experiment?

In the study, reverse transcription polymerase chain reaction (RT-PCR) was used as a
method to analyze transcripts from brain tissue of healthy and diseased mice. Although,
individual primers are not listed in the study, we can assume that several exon-specific
primers were generated for each candidate transcript. The group has eventually
demonstrated that Fig4 (A530089I17Rik) transcript was truncated after exon 18 and
further established that the causative factor was an insertion of a transposon in the
genomic region of this gene.

Step four: From mouse model to human neurological disorder

In the published study, a mouse model was used to mimic a human neurological
disorder. The genetic basis of the mouse disease helped focusing on a single-gene in
CMT- individuals lacking clinical variants in already known CMT-associated genes.
Having the FIG4 gene as the candidate in human disease, the investigators were able
to sequence this gene in affected individuals and find single nucleotide variants that
introduced a stop codon in the genomic regions of exon 4 and therefore resulted in
protein truncation. However, this mutation resulted in the disease phenotype *only*
when accompanied by a single nucleotide variant I41T.

Investigate the following questions on the NCBI website:


9 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Are there human- and other homologs for this gene?

To address this question, link to the mouse Gene record and then link to the
Homologene database.

Link from Homologene to the human Gene record. Find information/links pertaining to
disease phenotypes and variation.

Find the ARG183TER variant within the protein sequence. Is this variant reported in the
NCBI SNP database as a clinically associated variant?

Guided Exercise - Illustrated steps

-Search literature citations in PubMed using the disease name and one of the expert
investigators: Charcot-marie-tooth disease AND lupski[au]

-Access those citations that link to the free-full text articles:

-Select PMID: 19277060 which will provide a general background on the disease.
10 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

- Go to the free article in PubMed Central. Table 1 in the article has genetic
classification of the disease indicating that numerous variants/genes can lead to the
disease phenotype.

-Select PMID: 17572665 and link to Free-text in PMC


11 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Use primer information provided in the article and Primer BLAST to locate D10Umi1 on
the genomic contig
Forward: 5′-CCACCACATCAACAGGCTCACAGG
Reverse: 5′-AATGCAACCGTGACACAAGTACAC
12 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

-Displaythe genomic contig NT_039492.7, region 32667739 to 32667881 in the


Nucleotide database and link to the Graphics display (Display Settings → Graphics).
13 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

- Mark/lock the region -- use a right-mouse click on an empty area of the screen to get
the marker option.
14 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Find D10Mit184 in the surrounding genomic region by using the search box that is

opened by the magnifying glass icon .

Human_Variation_Disease_Genes
15 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

NCBI_BLAST

Select a feature from the region, in this case Lace1. Since the feature is on the reverse
strand, the coordinates are negative values. Use the Tools menu to flip the strands.

Note the D10Mit184 annotation in the STS markers track. Mark/lock the position.
16 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Zoom out to display the region between the markers.

To preserve this page with locked markers, use the “Link To this Page” option:
The resulting URL specifies the settings for this view. The tiny URL is provided to easily
bookmark or share this complex URL.
17 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Investigate the region between these two markers. For example, locate the Fig4
(A530089I17Rik) transcript. There are several ways to navigate the graphic display. For
instance you can slide along the genome – the top ruler has a sliding window or the
features window scrolls by simply clicking the screen and dragging. You can also zoom
in/out on the region using the zoom slider.
18 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Click on the Configure menu. Set the Rendering options for the Gene track to show all.
19 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

The transcript, NM_133999.1, is represented in blue, with bars indicating exons (no
numbering provided), and arrows indicating the orientation. Note the accession formats
of the sequences that you find: NT_, NM_, NP_ Right click on the mRNA bar to access
Properties menu or Views & Tools.
20 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
21 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Link to the Graphical View of NM_133999.1.


22 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

The Sequence Text View, available through the Tools menu allows displaying the
sequence with or without translation and provides a text version of the sequence that
can be selected and copied.
23 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Close the sequence window click on the mRNA graphic and select the link to the Gene
record that is available through the Properties or Views and Tools links.
24 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

From the mouse Gene record, link to the HomoloGene database that lists homologous
genes and sequences of several completely sequenced eukaryotic genomes.
25 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
26 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Determine the protein percent identity between the mouse and human sequences.

Follow the link to the human Gene record.


27 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Explore the information in the record to learn more about the gene. Within the record,
find information/links pertaining to the CMT disease.
28 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Under the Links menu, find the “RefSeqGene” link. RefSeqGene is a genomic
sequence record for the gene. The RefSeqGene records are generated in part to
provide for a standard and stable genomic coordinate system for reporting of mutations.
29 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

You can explore the NG_007977.1 record in the graphical sequence viewer. Notice that
exon numbering is provided in this record. Find exon 6 and single nucleotide
polymorphisms annotated for this exon.
30 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
31 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
32 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Notice that rs121908299 is a clinically associated change that results in an early


termination of the protein. This is analogous to the disruption of fig4 in the pale tremor
mouse. There are links to alleles in Online Mendelian Inheritance in Man (OMIM)
records. Follow the link to OMIM 609390.003, which is one of two associated with the
ARG183TER change. OMIM is now hosted outside of the NCBI site.

Expand the Table of Contents on the record and jump to the Allelic Variants section.

Independent Practice1: ACADS


33 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Perform the same tasks performed above for APOE using the human mitochondrial
short chain acetyl-CoA dehydrogenase (ACADS). Use the NCBI Website to find the
following information

• The ACADS Gene record


• Reference Sequences
• A clinically significant mapped variation in the 5th exon of ACADS with different
alleles in the Reference Genome and the HuRef (JC Venter) genome
• The functional consequence of this variant on the protein and the overall human
• JC Venter’s genotype at this position
• The 1000 Genomes allele frequencies for the CEU (Utah) population.
• Homologous gene in the mouse, cow, and chicken
• UniGene cluster for homologous sequences in Daphnia
• Homologous protein in the naked mole rat
• Expression information in human tissues / organs
• The genomic context and nearby genes using the graphical sequence viewer and
the Map Viewer

Independent Practice2: Generalized Epilepsy with Febrile Seizures


and SCN1A
This exercise is based on the study reported in PMID:10742094. The study
demonstrated that mutations of SCN1A gene caused generalized epilepsy with febrile
seizures.

Step one: Investigation of the genomic region between D2S156 and D2S399
(NCBI website):

A candidate gene region for generalized epilepsy with febrile seizures was established
(by radiation-hybrid panel) between markers D2S156 and D2S399 on human
chromosome 2, NC_000002.

1. What genes/transcripts are annotated for the region between the two markers?

-Access the genomic region (the NC_000002 record in the Nucleotide database) and
display it graphically (Display Settings à Graphics).

-Search for D2S156, and mark/lock its position. Reverse strand, so that the genomic
coordinates are positive.

-Search for D2S399.

-Select XIRP2 as the label so that the region included is maximized.


34 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

-Mark/lock the position of D2S399.


-Examine the region between these two markers.
What is the approximate number of annotated transcripts in the region?

-Locate the SCN1A transcript (that the above study identified as a causative disease
gene).

Step two: Gene information on SCN1A

1. Learn more about the SCN1A human gene (e.g. what type of complexes are
voltage-sensitive sodium channels). Find information/links pertaining to
disease phenotypes and variation.

-Link to the Gene record and explore the Summary, Bibliography and Phenotypes
section of the record.

2. What is the accession number of the *genomic* sequence associated with this
gene?

-Return to the Gene record. The Table of Contents allows navigation within the record.
Navigate to the “Reference sequences” section of the record, and find the genomic
record. Note the accession format and the length of the sequence.

3. In the graphic display of the above genomic record, find exon 4. Assume that
an investigator wants to assess 20 bp of surrounding intronic region. How can
you obtain this sequence?
4. Is there clinically associated variants reported in the SNP database? How is
each reported SNP represented in the table?

To answer this question, return to the Gene record and link to the SNP database, using
the SNP: GeneView link. On the resulting page, select the “Include clinically associated”
check box and refresh the page.

5. Link to the rs28934003 entry. Note the variant’s nomenclature under the HGVS
table. What is the importance of representing the same variants on different
sequences?

6. Link to the VarView display. Was this variant reported to NCBI by a locus
specific database (LSDB)?
35 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

Independent Practice - Illustrated steps


36 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
37 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
38 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

http://tinyurl.com/2wdvtor

http://tinyurl.com/3y3qmkj
39 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
40 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
41 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
42 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012
43 NCBI Discovery Workshops: Human Variation and Disease Genes 08/28/2012

You might also like