Clinical Genomic Database
Clinical Genomic Database
Edited by C. Thomas Caskey, Baylor College of Medicine, Houston, TX, and approved May 2, 2013 (received for review February 19, 2013)
Technological advances have greatly increased the availability of variant, and associations of the gene or specific variant with known
human genomic sequencing. However, the capacity to analyze health conditions. After this level of curation, however, a key ob-
genomic data in a clinically meaningful way lags behind the ability stacle arises when trying to determine which detected variants may
to generate such data. To help address this obstacle, we reviewed warrant further follow-up, including potential clinical interventions,
all conditions with genetic causes and constructed the Clinical or would otherwise alter patient-based care. Typically, clinically
Genomic Database (CGD) (http://research.nhgri.nih.gov/CGD/), a oriented analysis involves an approach in which detected potentially
searchable, freely Web-accessible database of conditions based pathogenic variants affecting known disease-associated loci are
on the clinical utility of genetic diagnosis and the availability of largely individually queried to determine their clinical applicability
specific medical interventions. The CGD currently includes a total of (9–13).
2,616 genes organized clinically by affected organ systems and To help address this problem, we manually investigated all
interventions (including preventive measures, disease surveillance, conditions with known genetic causes. We constructed a data-
and medical or surgical interventions) that could be reasonably base focusing on genetic data as relates to the availability of
warranted by the identification of pathogenic mutations. To aid condition-specific interventions and how finding a pathogenic
independent analysis and optimize new data incorporation, the mutation would be anticipated to affect medical care. The cur-
CGD also includes all genetic conditions for which genetic knowl- rent goal of this project is to disseminate the database to solicit
edge may affect the selection of supportive care, informed medical content-oriented input, related to both clinical and molecular
decision-making, prognostic considerations, reproductive deci- aspects of the database, from experts in individual genes and
sions, and allow avoidance of unnecessary testing, but for which conditions. Eventually, this database may be used to aid in the
specific interventions are not otherwise currently available. For efficient analysis of individual genomes for clinically significant
each entry, the CGD includes the gene symbol, conditions, allelic health information. The Clinical Genomic Database (CGD) is
conditions, clinical categorization (for both manifestations and freely available at: http://research.nhgri.nih.gov/CGD.
interventions), mode of inheritance, affected age group, descrip-
tion of interventions/rationale, links to other complementary data- Results
bases, including databases of variants and presumed pathogenic At the time of writing, (April 2013), the CGD includes 2,616 genes
mutations, and links to PubMed references (>20,000). The CGD will in which mutations are known to cause human disease or have
be regularly maintained and updated to keep pace with scientific clinically significant pharmacogenomic implications. For 1,333 of
discovery. Further content-based expert opinions are actively soli- these genes, medical interventions meeting the described criteria
cited. Eventually, the CGD may assist the rapid curation of individ- are available (Materials and Methods). The CGD includes an ad-
ual genomes as part of active medical care. ditional 1,283 genes for which these types of clinical interventions
are not yet available based on current medical knowledge, but in
genome sequencing | genomic medicine | whole-genome sequencing which mutations may nonetheless be clinically relevant. Knowl-
edge of mutations resulting in one of this latter group of conditions
GENETICS
diploid genome, depending on the platform used—are sequenced manifestations and interventions). The CGD can be queried using
(reviewed in ref. 1). Whole-genome sequencing, which addition- single or multiple search terms, including large files of gene names
ally includes introns and gene regulatory regions, as well as the or terms. For each entry, the database includes the gene symbol,
rest of the genome, is anticipated to become more widely used as conditions, allelic conditions, clinical categorization (by manifes-
methodologies evolve to allow decreased cost and to meet in- tation and intervention categories), mode of inheritance, age cat-
formatics challenges. Whole-genome sequencing may supplant egory (pediatric or adult) in which interventions are indicated
exome sequencing in the relatively near future. based on descriptions in the medical literature, general descrip-
To date, the most impressive applications of human genomic tions of the interventions/rationale, and individually linked refer-
sequencing (we refer here to both exome and genome sequencing ences (>20,000). See Table 1 for a summary of the categories
as “genomic sequencing”) have been the detection of the genetic included and the numbers of genes within each category; see Table
causes of relatively rare conditions (1–3). However, genomic se-
quencing has myriad potential applications in more general clinical
medicine, including in healthy individuals (3–8). Author contributions: B.D.S. designed research; B.D.S., A.-D.N., and K.A.B. performed re-
Despite the promise of the “age of genomic medicine,” a key search; B.D.S., A.-D.N., K.A.B., and T.G.W. contributed new reagents/analytic tools; B.D.S.,
barrier to translating the power of genomic sequencing to the A.-D.N., K.A.B., and T.G.W. analyzed data; and B.D.S. and T.G.W. wrote the paper.
general clinical setting involves the time and resources required for The authors declare no conflict of interest.
clinically relevant analysis beyond searching for the cause of a single, This article is a PNAS Direct Submission.
usually relatively severe, disease. A number of freely or commer- 1
To whom correspondence should be addressed. E-mail: [email protected].
cially available tools allow curation of individual genomes, including This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
analysis of variant type, predicted pathogenicity of a particular 1073/pnas.1302575110/-/DCSupplemental.
GENETICS
The CGD can be queried with a single gene symbol or condition, or with a list. Searches can also be limited by organ-system–based clinical categories (for
both Manifestation and Intervention categories). To allow efficient access to complementary resources, including databases of variants and apparently
pathogenic mutations, each gene/condition-specific links to the relevant entry (where available) in: 1000 Genomes (www.1000genomes.org); the Short
Genetic Variations Database (The Database of Single Nucleotide Polymorphisms, dbSNP) (www.ncbi.nlm.nih.gov/snp/); GeneTests (www.ncbi.nlm.nih.gov/
sites/GeneTests/); the HGMD (www.hgmd.cf.ac.uk/); the National Center for Biotechnology Information gene database (www.ncbi.nlm.nih.gov/gene); the
National Heart, Lung, and Blood Institute Gene Ontology Exome Sequencing Project Exome Variant Server (http://evs.gs.washington.edu/EVS); OMIM (http://
www.omim.org); Genetics Home Reference (http://ghr.nlm.nih.gov/); ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/). Each reference is directly linked to
a PubMed abstract (>20,000 articles are individually referenced).
Information shown above may differ from the updated version available on the CGD website. For the sake of viewability, the “Comments” column (which
appears on the CGD website) has been omitted from this example entry.
genetic/genomic diagnosis (i.e., before diagnosis on clinical a truncating mutation or a large deletion in a gene in which loss-
grounds alone) of a disorder is challenging in itself, but is only one of-function is predicted to cause disease. The second reason, used
piece of the puzzle. An equally large piece involves the evaluation for variants whose pathogenicity is more difficult to assess (e.g.,
of specific variants, and strategies have been devised to address novel missense variants), would involve the presence of strong
this issue. One such strategy involves “binning” variants (11, 18). evidence showing that the exact detected variant had been pre-
According to this type of algorithm, a variant would be judged to viously established as disease-causing (11, 12, 18, 20).
be pathogenic for one of two reasons. The first reason is that the Along these lines, in any approach involving the clinical man-
type of variant may by itself predict pathogenicity: for example, agement of individual patients, genotypic information should not
Solomon et al. PNAS | June 11, 2013 | vol. 110 | no. 24 | 9853
operate in a vacuum (6, 29). Just as genotypic data, such as the point, precise diagnosis is challenging for many conditions, and correct
type, location, and novelty of a particular amino acid substitution recognition based on genetic/genomic diagnosis may allow interventions
can aid in the interpretation of a variant, clinical information, related to specific manifestations. The efficacy of these interventions would
be diminished or lost with later diagnosis, such as might occur based pri-
including family and medical history, can help determine the
marily upon clinical presentation. For example, in certain types of Ehlers-
consideration of a specific variant (5, 12, 29). Danlos syndrome, which may not always be recognized early enough to
The first goal of the CGD is broad dissemination to solicit allow optimal medical care, genotype-based recognition may allow inter-
content-oriented feedback and input from experts studying rele- ventions related to certain cardiovascular manifestations, which may reduce
vant genes and conditions. This input can be used to continually associated morbidity and mortality (31).
revise and improve this resource, as the CGD will be regularly For the Intervention categories, all genes not meeting the above criteria
updated through a combination of automated and manual cura- were included in the General category. As described above, for many such
tion. The first long-term objective is the establishment of a user- conditions although a more specific intervention may not be currently
friendly resource relevant to a wide group of clinicians that can be available, genetic knowledge may be beneficial related to a number of issues,
used as a reference resource in a variety of situations. Eventually, including the selection of optimal supportive care, prognostic considerations
in addition to serving as a reference tool, the CGD may be used related to medical-decision making, informing reproductive decisions, and
avoidance of unnecessary testing as part of the diagnostic process. These
as a filter superimposed on automated binning algorithms to help
entries contain similar information to those classified by organ system, with
allow efficient, clinically relevant annotation of human genomes. the exception that the interventions and rationale are not specifically de-
scribed. Individual experts were contacted in many instances where the
Materials and Methods availability or efficacy of interventions was unclear.
To investigate conditions with identified genetic underpinnings, we indi- The Web interface to the CGD allows searching by gene or condition or
vidually read all entries in OMIM (www.omim.org) that included conditions browsing by categories. For each entry, the database includes the gene
with genetic causes, then cross-referenced all entries—and searched for ad- symbol, conditions, allelic conditions (conditions resulting from mutations
ditional entries—within the following publicly available databases: GeneTests in the same gene, but which themselves may not have a specific intervention
(www.ncbi.nlm.nih.gov/sites/GeneTests), Pharmacogenomics Knowledge Base
available; it must be noted that for many reportedly distinct conditions, there
(www.pharmgkb.org), HGMD (www.hgmd.cf.ac.uk), and HGMD Professional
is clearly a phenotypic continuum, such that division into clinically separate
(https://portal.biobase-international.com/hgmd) through a site-specific li-
conditions can be challenging), clinical categorization (as described above, by
cense. Pertaining to each gene and condition described in these databases,
both manifestations as well as more specific interventions), inheritance, age
we directly analyzed the content of all cited primary references. Published
[designated as either pediatric (less than 18 y of age) or adult] in which
literature that was not included in these databases was also queried through
interventions are indicated based on descriptions in the medical literature,
independent PubMed search (by gene and condition name). The most recent
and general descriptions of the interventions/rationale. This latter category is
date of query was April 17, 2013.
not intended to serve in place of comprehensive treatment guidelines nor act
The CGD has been constructed to reflect the multisystemic nature of many
as a clinical guide, but rather briefly describes the types of interventions that
genetic conditions to allow more comprehensive browsing by clinical cate-
may be considered.
gories. In the CGD, genes were first categorized into Manifestation categories,
The CGD currently includes only single gene alterations; it does not include
or the organ systems primarily affected by mutations in the corresponding
contiguous gene syndromes, although conditions with, for example, dem-
gene. For many of these organ systems, recognition of the condition’s effects
onstrated digenic inheritance are included. Similarly, somatic alterations,
and related supportive care may be clinically beneficial. Conditions not
grouped within a specific organ system under the Manifestation categories such as commonly occur in cancerous processes, are not included, although if
were included in the General category. a germ-line change in the same gene has been shown to result in disease,
Next, genes were separately categorized under Intervention categories by those latter conditions are included. The current version does not include
the organ systems for which specific medical interventions were available. In susceptibilities or genetic associations, such as those identified through an
determining the Intervention categories, the following points were consid- association-based study. As the database expands in the future, these types
ered. These points are based in part on arguments related to the selection of of additions would be considered.
targets for routine newborn screening (30): (i) the condition must be clini-
cally significant (i.e., at least some manifestations must result in morbidity or ACKNOWLEDGMENTS. The authors thank Leslie G. Biesecker, Derek A. T.
mortality); (ii) there must be a currently available, potentially beneficial Cummings, James P. Evans, Donald W. Hadley, and Maximilian Muenke for
support, mentorship, and critical input; Andreas D. Baxevanis for bioinfor-
intervention (this intervention may include preventive measures, surveil-
matic discussions and support; Mark Fredriksen for programming assistance;
lance, or medical or surgical treatments, although experimental/research- and all the experts who provided input related to individual genes and
based interventions were not included); (iii) there should be advantage to conditions. This research was supported by the Intramural Research Program
early (genomic) diagnosis as opposed to discovery of the condition on purely of the National Human Genome Research Institute, National Institutes
clinical grounds (i.e., without genetic/genomic testing). Regarding this last of Health.
1. Bamshad MJ, et al. (2011) Exome sequencing as a tool for Mendelian disease gene 11. Berg JS, Khoury MJ, Evans JP (2011) Deploying whole genome sequencing in clinical
discovery. Nat Rev Genet 12(11):745–755. practice and public health: Meeting the challenge one bin at a time. Genet Med 13(6):
2. Biesecker LG, Shianna KV, Mullikin JC (2011) Exome sequencing: The expert view. 499–504.
Genome Biol 12(9):128. 12. Solomon BD, et al.; NISC comparative Sequencing Program (2012) Incidental medical
3. Gonzaga-Jauregui C, Lupski JR, Gibbs RA (2012) Human genome sequencing in health information in whole-exome sequencing. Pediatrics 129(6):e1605–e1611.
and disease. Annu Rev Med 63:35–61. 13. Teer JK, Green ED, Mullikin JC, Biesecker LG (2012) VarSifter: Visualizing and ana-
4. Bainbridge MN, et al. (2011) Whole-genome sequencing for optimized patient lyzing exome-scale sequence variation data on a desktop computer. Bioinformatics
management. Sci Transl Med 3(87):87re3. 28(4):599–600.
5. Solomon BD, et al.; NISC Comparative Sequencing Program (2011) Personal- 14. Oetting WS, et al. (2013) Getting ready for the Human Phenome Project: The 2012
ized genomic medicine: Lessons from the exome. Mol Genet Metab 104(1-2): Forum of the Human Variome Project. Hum Mutat 34(4):661–666.
189–191. 15. Hamosh A, et al. (2013) PhenoDB: A new web-based tool for the collection, storage,
6. Ball MP, et al. (2012) A public resource facilitating clinical use of genomes. Proc Natl and analysis of phenotypic features. Hum Mutat 34(4):566–571.
Acad Sci USA 109(30):11920–11927. 16. Green RC, et al. (2012) Exploring concordance and discordance for return of incidental
7. Johnston JJ, et al. (2012) Secondary variants in individuals undergoing exome se- findings from clinical sequencing. Genet Med 14(4):405–410.
quencing: Screening of 572 individuals identifies high-penetrance mutations in can- 17. Manolio TA, et al. (2013) Implementing genomic medicine in the clinic: The future is
cer-susceptibility genes. Am J Hum Genet 91(1):97–108. here. Genet Med 15(4):258–267.
8. Solomon BD, Pineda-Alvarez DE, Bear KA, Mullikin JC, Evans JP; NISC Comparative 18. Berg JS, et al. (2013) An informatics approach to analyzing the incidentalome. Genet
Sequencing Program (2012) Applying genomic analysis to newborn screening. Mol Med 15(1):36–44.
Syndromol 3(2):59–67. 19. Evans JP, Berg JS (2011) Next-generation DNA sequencing, regulation, and the limits
9. Lupski JR, et al. (2010) Whole-genome sequencing in a patient with Charcot-Marie- of paternalism: The next challenge. JAMA 306(21):2376–2377.
Tooth neuropathy. N Engl J Med 362(13):1181–1191. 20. Xue Y, et al.; 1000 Genomes Project Consortium (2012) Deleterious- and disease-allele
10. Tong P, et al. (2010) Sequencing and analysis of an Irish human genome. Genome Biol prevalence in healthy individuals: Insights from current predictions, mutation data-
11(9):R91. bases, and population-scale resequencing. Am J Hum Genet 91(6):1022–1032.
GENETICS
Solomon et al. PNAS | June 11, 2013 | vol. 110 | no. 24 | 9855