Bmir 2001 0898
Bmir 2001 0898
net/publication/11552390
CITATIONS READS
109 403
2 authors, including:
Russ B Altman
Stanford University
853 PUBLICATIONS 43,323 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Russ B Altman on 30 May 2014.
Biomedical informatics is the study of information flow within biology and medi-
cine. The use of computational techniques in biomedical research dates back to the
first general purpose computers but interest in the techniques has exploded in the
last decade (1). The increased interest stems from the availability of experimental
techniques that create data that simply cannot be manually analyzed and require
computational intervention. Many areas of biology and medicine are being revo-
lutionized by the introduction of new experimental techniques, accompanied by
informatics methodologies that fundamentally change the way that investigators
do their work.
The two flows of information that are studied by informatics are the flow of
information from the DNA code to biological function and the flow of information
in the design and analysis of experiments. In the first flow, we are interested in the
transfer of information within biology, while in the other, we are interested in the
transfer of information about biology. Thus, the first information flow deals with
0362-1642/02/0210-0113$14.00 113
114 ALTMAN KLEIN
the central dogma of biology: DNA is transcribed into RNA, RNA is translated into
protein, and protein molecules have functions that carry out biological processes.
Interacting proteins produce signaling and metabolic pathways that coalesce to
form networks at the cellular level, and cells interact at an organismal level to
produce physiology. Informatics approaches to studying different aspects of this
flow, therefore, include methods for gene finding (2–5), 3D structure prediction
(6, 7), modeling of genetic networks (8–14), and statistical population biology
(15, 16).
In the second flow, we are interested in the ways in which biological and medi-
cal information is gathered. This flow begins with a scientific hypothesis, followed
by a plan to collect data, execution of an experiment, analysis of the results, and
subsequent refinement of the hypothesis. Informatics applications within this flow
are usually created to support investigators in the practice of science. Informat-
ics approaches to studying this flow, therefore, include methods for organizing
and searching databases of literature, sequence, and function, as well as meth-
ods for helping to create and evaluate scientific models (17). If both of these
information flows are included in a definition of biomedical informatics, then vir-
tually all biomedical informatics research can be placed in one or both of these
areas.
Biomedical informatics has gained prominence recently because biologists can
now collect more data. The success of the genome sequencing projects has cat-
alyzed a new way of thinking in biology, whereby data are collected on a large scale
and without a particular hypothesis in mind. The data are then placed in a database,
and scientists with hypotheses can extract information from the database in order
to evaluate the merits of the hypotheses. This leads to a fundamental change in
how some investigators do their work: Instead of first moving to the laboratory,
they first move to the database, and only after assessment of the available data
are experiments planned. There has been much debate about the merits of such an
approach, but there is no doubt that the emergence of these large-scale and high-
throughput methods for data collection makes such an approach feasible (18). The
data explosion is not limited to DNA sequencing, and we are seeing increased ca-
pacity to assess the levels of mRNA expression (19, 20), to detect protein-protein
interactions (21), to locate gene products within the cell (22), to detect and iden-
tify compounds using mass spectroscopy (23), and even to understand the detailed
atomic three-dimensional structure of macromolecules and their small molecule
ligands (24).
As long as clever experimentalists continue to create these high-throughput
experimental methods, informatics professionals will have a surfeit of data and
data analytic challenges. Success in informatics usually means the acceleration in
understanding the processes of interest, and increased access to the information
required to generate and test scientific hypotheses. One of the areas that has re-
cently attracted the attention of biomedical informaticians is pharmacology, and
particularly pharmacogenetics and pharmacogenomics.
INFORMATICS FOR PHARMACOGENOMICS 115
textbooks indicates a core set of 500 to 1000 genes, as shown in the PharmGKB
Web site.1
1
http://www.pharmgkb.org/
INFORMATICS FOR PHARMACOGENOMICS 117
Phenotype-to-Genotype Approaches
Phenotype-to-genotype approaches toward pharmacogenomic discovery are dif-
ferent. Instead of identifying a family of genes in which to characterize genetic
variation, investigators search for a phenotypic measure that shows significant
variation. This measure can be a clinical measure (such as the rate of clearance
of a drug or the peak level of the drug for a given dose), a cellular measure (the
rate of cellular uptake of a drug or the profile of gene expression), or a molecular
measure (the enzymatic turnover rate of an enzyme or a substrate binding con-
stant). In any case, it is the phenotypic variation that first draws attention and then
follows a search for the genes that are responsible for this variation. The steps of
a phenotype-to-genotype approach, therefore, can thus be summarized:
1. Identify a phenotype that shows significant variation.
2. Search for genes that may explain this variation.
3. Characterize genetic variations and check for association with the phenotype.
4. Confirm proposed genetic basis for the variation and its clinical relevance.
The challenges in the first step are to identify phenotypes that are both clinically
relevant and also measurable. The second step is the most difficult and requires
the investigator to use any means available to identify genes that could be involved
with the phenotypes. It may involve using animal models and comparative ge-
nomics, DNA microarray analysis to measure changes in expression in response to
drugs, database (literature and sequence) searches for associations between genes
and related phenotypes, or analytic chemistry methods to identify gene products
contributing to variation (47). The third step is similar to the second step of the
genotype-to-phenotype process. A major challenge in this step is the large amount
of variability in human genes that is not functionally significant, so investigators
must focus efforts on variations that can be shown to have functional consequence.
The final step is focused particularly on this problem of ensuring that the discovered
genetic component really explains the phenotypic variation of interest.
Both approaches to pharmacogenomics have strengths and weaknesses. Investi-
gators must assess the current knowledge base for a given drug class of interest
in order to determine whether there is enough genetic information to justify a
genotype-to-phenotype approach, or whether there are more striking phenotypic
data suggesting a phenotype-to-genotype approach.
for contributions that now exists. One of the key themes in pharmacogenomics is
that the relevant informatics expertise includes information from molecular biol-
ogy (sequences, structures, pathways) as well as from clinical medicine (medica-
tions, diseases, side effects), and of course from pharmacology (pharmacokinetics
and pharmacodynamics). Thus it represents a new wave of informatics problems
where both basic biological and clinical information must be combined and ana-
lyzed. Whereas previously bioinformatics focused solely on issues of relevance to
molecular biology (sequence and structure analysis), applications are now mov-
ing closer to the parts of clinical informatics that focus on the organization of
clinical information, particularly for research purposes. The main challenges for
biomedical informatics within pharmacogenomics fall into nine areas:
1. Representing the diversity of pharmacogenomic data
2. Developing standards for data exchange
3. Integrating data from multiple data resources
4. Mining literature for knowledge
5. Using expression data to understand regulation
6. Understanding the structural basis for variability
7. Using comparative genomics
8. Managing laboratory information
9. Protecting sensitive patient information
structures behind Genbank (48), Human Genome Database (49), BIOML2, and
others. The human genome browsers offered by UC Santa Cruz3, Ensembl4,
National Center for Biotechnology Information (NCBI)5, and Celera6 offer a basic
look at gene structure, but these are still evolving because the genome is in draft.
In addition to the representation of basic gene structure, it is critical to also under-
stand the locations and types of genetic variation. The dbSNP resource at NCBI
provides an excellent source of reported SNPs (50), including those submitted by
The SNP Consortium (51), an industrial group that is performing large-scale SNP
detection and submitting many of these to the public domain.
Genome data are also made more useful by their connections to databases of
biological function, including the Online Mendelian Inheritance in Man (OMIM)
database of inherited human disorders (52, 53), and a number of specialty databases
that provide valuable in-depth information about individual gene families, such
as the Cell Signaling Network Database (54), the transcription factor database
TRANSFAC (55), and the protein kinase database (56).
2
http://www.bioml.com/BIOML/
3
http://genome.ucsc.edu/
4
http://www.ensembl.org/genome/central/
5
http://www.ncbi.nlm.nih.gov/genome/guide/central.html
6
http://.public.celera.com/index.cfm
7
http://www.ccdc.cam.ac.uk/
8
http://www.rcsb.org/pdb/
120 ALTMAN KLEIN
9
http://www.mged.org/
10
http://www.cdc.gov/nchs/about/otheract/icd9/abticd10.htm
11
http://www.snomed.org/
12
http://www.ama-assn.org/ama/pub/category/3113.html
13
http://www.usc.edu/dept/biomed/BMSR/Software/adptmenu.html
14
http://c255.ucsf.edu/nonmem0.html
15
http://www-saam.nci.nih.gov/index.html
INFORMATICS FOR PHARMACOGENOMICS 121
Although the raw data are used to compute these intermediate representations (for
example, the raw time points of blood levels are used to compute pharmacokinetic
parameters), it can be difficult to determine the appropriate level of data to make
routinely available in databases. The raw data can be cumbersome, and the com-
puted parameters may be of real interest to most. However, there are times when
the raw data must be retrieved in order to check conclusions or alternative inter-
pretations. Thus, a major challenge for pharmacogenomic information resources
is to provide easy access both to the computed/derived parameters as well as to
the basic information upon which they are based.
16
http://www.ncbi.nlm.nih.gov/PubMed/
122 ALTMAN KLEIN
genes, and this should stabilize over time and provide a useful set of indices
for the human genome browsers. Included in this activity is the identifica-
tion of the function of new genes of pharmacogenomic interest, particularly
transporters [which are classified in a taxonomy by Saeir17 (61)] and the
cytochrome p450 system (classified based on isoform similarity) (62, 63).
2. Drug and compound names. There are efforts proposed to build a con-
trolled list of drug categories, their structural and biological features, and
the associated specific compounds. The Unified Medical Language System
(UMLS) contains the 1997 Food and Drug Administration Standard Product
Nomenclature.18
3. Side effects. Standards are required for coding drug side effects, at a clinical
level and perhaps at a lower biological level. A vocabulary that is used for
clinical trials is a good initial start. The UMLS contains the World Health
Organization Adverse Drug Reaction Terminology and the coding symbols
for a thesaurus of adverse reaction terms (COSTART) from the Food and
Drug Administration.19
17
http://www-biology.ucsd.edu/∼msaier/transport/titlepage.html
18
http://www.fda.gov/cder/ndc/database/default.htm and http://www.fda.gov/cder/ob/
default.htm
19
http://www.fda.gov/cder/aers/index.htm
20
http://www.w3.org/XML/
21
http://pharmgkb.stanford.edu/xml-schemas.html
INFORMATICS FOR PHARMACOGENOMICS 123
22
http://www.w3.org/RDF/
124 ALTMAN KLEIN
to avoid the loss of this information and to assist in the automatic population of
databases, informatics researchers are building systems for extracting information
from text. The general problem of understanding the full details of a natural lan-
guage text has been studied for more than four decades and remains unsolved (70);
however, the more tractable goal of reliably identifying relationships within text
is within reach. For example, texts can be analyzed to extract protein names (71),
and protein-protein (11, 72, 73) or protein-drug interactions (74), based on the oc-
currence of protein names and verbs such as “inhibits,” “activates,” “represses,”
“enhances,” etc.).
Within pharmacogenomics there are good opportunities for natural language
processing (NLP) techniques to assist in the organization of data. First, there is no
definitive list of drug-gene interactions, and the literature (both published medical
literature and the U.S. patent application literature23) is filled with associations
that are pharmacokinetic (e.g., “X is metabolized by CYP2D6”) and pharmaco-
dynamic (e.g., “Y is active at the beta-adrenergic receptor”). In general, NLP
techniques work best in well-defined domains that use standardized vocabulary. A
second area of opportunity within pharmacogenomics is the extraction of cellular
localization information (“X is localized to the Golgi”) from text (75). A third
area that holds much promise is the processing of mRNA expression data with
microarrays (8, 20, 57, 59, 76–79). Because of the great volume of information
generated by these experiments, it is often useful to cluster the genes based on
expression pattern, and it is very challenging subsequently to summarize the key
features of each cluster. NLP-based techniques may be able to combine the pub-
lished information about genes with information about how they cluster to create
automatic cluster labels that provide biological insight. A fourth area for NLP ap-
plications is in the identification of genes of pharmacogenomic interest from text
and in the classification of these genes as primarily of pharmacokinetic or phar-
macodynamic importance. The language within pharmacokinetic papers is quite
idiosyncratic (including discussions of “area under curve” and “bioavailability”),
so it may be relatively straightforward to classify abstracts that are discussing this
topic and then to extract key data elements from them.
23
http://www.uspto.gov/
INFORMATICS FOR PHARMACOGENOMICS 125
options for building 3D models of proteins that are globular and share 30%
or more sequence identity with a known structure (88). In these applica-
tions, the location and possible functional significance of nonsynonymous
SNPs in the coding portion of proteins can be evaluated (89). A recent paper
estimated that 30% of all nonsynonymous SNPs may be associated with
significant changes in function (90) based on an analysis of a large set of
mutations in DNA binding proteins. There are also some early indications
that even synonymous SNPs may change RNA stability and affect the level
of activity for some proteins.
3. Methods for predicting protein-protein interactions. It is clear that many
proteins have multiple partners with which they interact as activators, in-
hibitors, or otherwise as modifiers. There has been progress in molecular
docking algorithms (91–96) that allows investigators to combine geometric
and energetic properties for the purpose of understanding how two protein
surfaces may interact.
24
http://stke.sciencemag.org/
INFORMATICS FOR PHARMACOGENOMICS 127
25
http://www.nigms.nih.gov/pharmacogenetics/
26
http://www.pharmgkb.org/
INFORMATICS FOR PHARMACOGENOMICS 129
there will be opportunities for the community to create and distribute informatics
methodologies that address each of the challenges outlined here. There is no doubt
that other, perhaps unexpected, informatics challenges will arise in the course of
creating these resources.
LITERATURE CITED
1. Tarter ME. 1979. Biocomputational human genes for high-throughput analysis
methodology: an adjunct to theory and of gene expression. Nat. Genet. 28:21–28
applications. Biometrics 35:9–24 12. Legrain P, Wojcik J, Gauthier J. 2001.
2. Besemer J, Lomsadze A, Borodovsky M. Protein-protein interaction maps: a lead
2001. GeneMarkS: a self-training method towards cellular functions. Trends Genet.
for prediction of gene starts in micro- 17:346–52
bial genomes. Implications for finding se- 13. Hasty J, McMillen D, Isaacs F, Collins
quence motifs in regulatory regions. Nu- JJ. 2001. Computational studies of gene
cleic Acids Res. 29:2607–18 regulatory networks: in numero molecu-
3. Burge CB, Karlin S. 1998. Finding the lar biology. Nat. Rev. Genet. 2:268–79
genes in genomic DNA. Curr. Opin. 14. Salazar-Ciudad I, Newman SA, Sole RV.
Struct. Biol. 8:346–54 2001. Phenotypic and dynamical tran-
4. Lewis S, Ashburner M, Reese MG. sitions in model genetic networks. I.
2000. Annotating eukaryote genomes. Emergence of patterns and genotype-
Curr. Opin. Struct. Biol. 10:349–54 phenotype relationships. Evol. Dev. 3:84–
5. Mural RJ. 1999. Current status of compu- 94
tational gene finding: a perspective. Meth- 15. Terwilliger JD, Goring HH, Besemer J,
ods Enzymol. 303:77–83 Lomsadze A, Borodovsky M. 2000. Gene
6. Landgraf R, Xenarios I, Eisenberg D. mapping in the 20th and 21st centuries:
2001. Three-dimensional cluster analy- statistical methods, data analysis, and ex-
sis identifies interfaces and functional perimental design. Hum. Biol. 72:63–
residue clusters in proteins. J. Mol. Biol. 132
307:1487–502 16. Jansen RC, Nap J. 2001. Genetical ge-
7. Al-Lazikani B, Jung J, Xiang Z, Honig B. nomics: the added value from segregation.
2001. Protein structure prediction. Curr. Trends Genet. 17:388–91
Opin. Chem. Biol. 5:51–56 17. Chen RO, Altman RB. 1999. Automated
8. Altman RB, Raychaudhuri S. 2001. diagnosis of data-model conflicts using
Whole-genome expression analysis: chal- metadata. J. Am. Med. Inform. Assoc. 6:
lenges beyond clustering. Curr. Opin. 374–92
Struct. Biol. 11:340–47 18. Broder S, Venter JC. 2000. Sequencing
9. Vohradsky J. 2001. Neural model of ge- the entire genomes of free-living organ-
netic network. J. Biol. Chem. 6:6 isms: the foundation of pharmacology in
10. Reis BY, Butte AS, Kohane IS. 2001. the new millennium. Annu. Rev. Pharma-
Extracting knowledge from dynamics in col. Toxicol. 40:97–132
gene expression. J. Biomed. Inform. 34: 19. Bartlett J. 2001. Technology evaluation:
15–27 SAGE, Genzyme molecular oncology.
11. Jenssen TK, Laegreid A, Komorowski Curr. Opin. Mol. Ther. 3:85–96
J, Hovig E. 2001. A literature network of 20. Hu Y. 2001. An integrated approach
130 ALTMAN KLEIN
for genome-wide gene expression analy- nology for clinical diagnostics. Mol. Di-
sis. Comput. Methods Programs Biomed. agn. 5:341–48
65:163–74 33. Hess P, Cooper D. 1999. Impact of phar-
21. Tucker CL, Gera JF, Uetz P. 2001. To- macogenomics on the clinical laboratory.
wards an understanding of complex pro- Mol. Diagn. 4:289–98
tein networks. Trends Cell Biol. 11:102–6 34. Meisel C, Roots I, Cascorbi I, Brinkmann
22. Cubitt AB, Heim R, Adams SR, Boyd U, Brockmoller J, et al. 2000. How to
AE, Gross LA, Tsien RY. 1995. Under- manage individualized drug therapy: ap-
standing, improving and using green flu- plication of pharmacogenetic knowledge
orescent proteins. Trends BioChem. Sci. of drug metabolism and transport. Clin.
20:448–55 Chem. Lab. Med. 38:869–76
23. Papac DI, Shahrokh Z. 2001. Mass spec- 35. Yan L, Otterness DM, Weinshilboum
trometry innovations in drug discovery RM. 1999. Human nicotinamide N-
and development. Pharm. Res. 18:131–45 methyltransferase pharmacogenetics:
24. Teichmann SA, Murzin AG, Chothia C. gene sequence analysis and promoter
2001. Determination of protein function, characterization. Pharmacogenetics 9:
evolution and interactions by structural 307–16
genomics. Curr. Opin. Struct. Biol. 11: 36. Glatt CE, DeYoung JA, Delgado S, Ser-
354–63 vice SK, Giacomini KM, et al. 2001.
25. Weber W. 1997. Pharmacogenetics. Ox- Screening a large reference sample to
ford, UK: Oxford Univ. Press identify very low frequency sequence
26. Evans WE, Relling M. 1999. Pharma- variants: comparisons between two genes.
cogenomics: translating functional ge- Nat. Genet. 27:435–38
nomics into rational therapeutics. Science 37. Kuehl P, Zhang J, Lin Y, Lamba J, As-
286:487–91 sem M, et al. 2001. Sequence diversity in
27. Rusnak JM, Kisabeth RM, Herbert DP, CYP3A promoters and characterization of
McNeil DM. 2001. Pharmacogenomics: a the genetic basis of polymorphic CYP3A5
clinician’s primer on emerging technolo- expression. Nat. Genet. 27:383–91
gies for improved patient care. Mayo Clin. 38. Israel E, Drazen JM, Liggett SB,
Proc. 76:299–309 Boushey HA, Cherniack RM, et al. 2001.
28. Meyer UA. 1991. Genotype or phenotype: Effect of polymorphism of the beta(2)-
the definition of a pharmacogenetic poly- adrenergic receptor on response to regular
morphism. Pharmacogenetics 1:66–67 use of albuterol in asthma. Int. Arch. Al-
29. McLeod HL, Evans WE. 2001. Pharma- lergy Immunol. 124:183–86
cogenomics: unlocking the human 39. Ewesuedo RB, Iyer L, Das S, Koenig A,
genome for better drug therapy. Annu. Mani S, et al. 2001. Phase I clinical and
Rev. Pharmacol. Toxicol. 41:101–21 pharmacogenetic study of weekly TAS-
30. Murphy MP. 2000. Current pharmacoge- 103 in patients with advanced cancer. J.
nomic approaches to clinical drug devel- Clin. Oncol. 19:2084–90
opment. Pharmacogenomics 1:115–23 40. Furuya H, Fernandez-Salguero P, Gre-
31. Murphy MP, Beaman ME, Clark LS, gory W, Taber H, Steward A, et al.
Cayouette M, Benson L, et al. 2000. 1995. Genetic polymorphism of CYP2C9
Prospective CYP2D6 genotyping as an and its effect on warfarin maintenance
exclusion criterion for enrollment of a dose requirement in patients undergoing
phase III clinical trial. Pharmacogenetics anticoagulation therapy. Pharmacogenet-
10:583–90 ics 5:389–92
32. Leushner J, Chiu NH. 2000. Automated 41. O’Brien SJ, Menotti-Raymond M, Mur-
mass spectrometry: a revolutionary tech- phy WJ, Nash WG, Wienberg J, et al.
INFORMATICS FOR PHARMACOGENOMICS 131
1999. The promise of comparative geno- mental disorders. Clin. Genet. 57:253–
mics in mammals. Science 286:458–62, 66
479–81 54. Takai-Igarashi T, Nadaoka Y, Kaminuma
42. Clark MS. 1999. Comparative genomics: T. 1998. A database for cell signaling net-
the key to understanding the Human works. J. Comput. Biol. 5:747–54
Genome Project. Bioessays 21:121–30 55. Wingender E, Chen X, Hehl R, Karas
43. Bray MS, Boerwinkle E, Doris PA. 2001. H, Liebich I, et al. 2000. TRANSFAC:
High-throughput multiplex SNP genotyp- an integrated system for gene expression
ing with MALDI-TOF mass spectrome- regulation. Nucleic Acids Res. 28:316–19
try: practice, problems and promise. Hum. 56. Smith CM, Shindyalov IN, Veretnik S,
Mutat. 17:296–304 Gribskov M, Taylor SS, et al. 1997. The
44. Brookes AJ. 1999. The essence of SNPs. protein kinase resource. Trends BioChem.
Gene 234:177–86 Sci. 22:444–46
45. Schork NJ, Fallin D, Lanchbury JS. 57. Jain KK. 2000. Applications of biochip
2000. Single nucleotide polymorphisms and microarray systems in pharmacoge-
and the future of genetic epidemiology. nomics. Pharmacogenomics 1:289–307
Clin. Genet. 58:250–64 58. Scherf U, Ross DT, Waltham M, Smith
46. Judson R, Stephens JC. 2001. Notes from LH, Lee JK, et al. 2000. A gene expres-
the SNP vs. haplotype front. Pharmacoge- sion database for the molecular pharma-
nomics 2:7–10 cology of cancer. Nat. Genet. 24:236–44
47. Mann M, Hendrickson RC, Pandey A. 59. Slonim DK. 2001. Transcriptional profil-
2001. Analysis of proteins and pro- ing in cancer: the path to clinical pharma-
teomes by mass spectrometry. Annu. Rev. cogenomics. Pharmacogenomics 2:123–
BioChem. 70:437–73 36
48. Benson DA, Karsch-Mizrachi I, Lip- 60. Lindberg DA, Humphreys BL, McCray
man DJ, Ostell J, Rapp BA, Wheeler AT. 1993. The Unified Medical Language
DL. 2000. GenBank. Nucleic Acids Res. System. Methods Inf. Med. 32:281–91
28:15–8 61. Saier MH, Jr. 2000. A functional-
49. Cuticchia AJ. 2000. Future vision of the phylogenetic classification system for
GDB human genome database. Hum. Mu- transmembrane solute transporters. Mi-
tat. 15:62–67 croBiol. Mol. Biol. Rev. 64:354–411
50. Sherry ST, Ward MH, Kholodov M, 62. Nelson DR, Kamataki T, Waxman DJ,
Baker J, Phan L, et al. 2001. dbSNP: the Guengerich FP, Estabrook RW, et al.
NCBI database of genetic variation. Nu- 1993. The P450 superfamily: update on
cleic Acids Res. 29:308–11 new sequences, gene mapping, accession.
51. Sachidanandam R, Weissman D, DNA Cell Biol. 12:1–51
Schmidt SC, Kakol JM, Stein LD, et al. 63. Nelson DR, Koymans L, Kamataki T,
2001. A map of human genome sequence Stegeman JJ, Feyereisen R, et al. 1996.
variation containing 1.42 million sin- P450 superfamily: update on new se-
gle nucleotide polymorphisms. Nature quences, gene mapping, accession num-
409:928–33 bers. Pharmacogenetics 6:1–42
52. Hamosh A, Scott AF, Amberger J, 64. Harold E, Means W. 2001. XML in a Nut-
Valle D, McKusick VA. 2000. Online shell: A Desktop Quick Reference. Cam-
Mendelian Inheritance in Man (OMIM). bridge, MA: O’Reilly
Hum. Mutat. 15:57–61 65. Abernethy N, Wu J, Hewett M, Altman
53. Boyadijiev SA, Jabs EW. 2000. Online R. 1999. SOPHIA: a flexible, web-based
Mendelian Inheritance in Man (OMIM) knowledge server. IEEE Intell. Sys. Appl.
as a knowledgebase for human develop- 14:79–85
132 ALTMAN KLEIN
66. Musen MA. 1998. Domain ontologies in 78. Kurella M, Hsiao LL, Yoshida T, Randall
software engineering: use of Protege with JD, Chow G, et al. 2001. DNA microarray
the EON architecture. Methods Inf. Med. analysis of complex biologic processes. J.
37:540–50 Am. Soc. Nephrol. 12:1072–78
67. Berman HM, Westbrook J, Feng Z, 79. Ideker T, Thorsson V, Ranish JA, Christ-
Gilliland G, Bhat TN, et al. 2000. The mas R, Buhler J, et al. 2001. Integrated
Protein Data Bank. Nucleic Acids Res. genomic and proteomic analyses of a sys-
28:235–42 tematically perturbed metabolic network.
68. Chung SY, Wong L, Blaschke C, An- Science 292:929–34
drade MA, Ouzounis C, et al. 1999. 80. Raychaudhuri S, Sutphin PD, Chang JT,
Kleisli: a new tool for data integration in Altman RB. 2001. Basic microarray anal-
biology. Trends Biotechnol. 17:351–55 ysis: grouping and feature reduction.
69. McEntyre J. 1998. Linking up with En- Trends Biotechnol. 19:189–93
trez. Trends Genet. 14:39–40 81. Srivastava M, Eidelman O, Pollard HB.
70. Allen J. 1995. Natural Language Under- 1999. Pharmacogenomics of the cystic fi-
standing. Redwood City, CA: Benjamin/ brosis transmembrane conductance regu-
Cummings lator (CFTR) and the cystic fibrosis drug
71. Yoshida M, Fukuda K, Takagi T. 2000. CPX using genome microarray analysis.
PNAD-CSS: a workbench for construct- Mol. Med. 5:753–67
ing a protein name abbreviation dictio- 82. Kawanishi Y, Tachikawa H, Suzuki T.
nary. Bioinformatics 16:169–75 2000. Pharmacogenomics and schizo-
72. Jenssen TK, Vinterbo S. 2000. A set- phrenia. Eur. J. Pharmacol. 410:227–41
covering approach to specific search for 83. Ekins S, de Groot MJ, Jones JP. 2001.
literature about human genes. Proc. AMIA Pharmacophore and three-dimensional
Symp. pp. 384–88. Philadelphia, PA: Han- quantitative structure activity relationship
ley & Belfus methods for modeling cytochrome p450
73. Marcotte EM, Xenarios I, Eisenberg active sites. Drug Metab. Dispos. 29:936–
D. 2001. Mining literature for protein- 44
protein interactions. Bioinformatics 17: 84. Blundell TL, Mizuguchi K. 2000. Struc-
359–63 tural genomics: an overview. Prog. Bio-
74. Rindflesch TC, Tanabe L, Weinstein JN, phys. Mol. Biol. 73:289–95
Hunter L. 2000. EDGAR: extraction of 85. Ewing TJ, Makino S, Skillman AG,
drugs, genes and relations from the bio- Kuntz ID. 2001. DOCK 4.0: search strate-
medical literature. Pac. Symp. Biocomput. gies for automated molecular docking of
pp. 517–28 flexible molecule databases. J. Comput.
75. Drawid A, Jansen R, Gerstein M. 2000. Aided Mol. Des. 15:411–28
Genome-wide analysis relating expres- 86. Pang YP, Kozikowski AP. 1994. Predic-
sion level with protein subcellular local- tion of the binding site of 1-benzyl-4-
ization. Trends Genet. 16:426–30 [(5,6-dimethoxy-1-indanon-2-yl)methyl]
76. Marcotte EM. 2000. Computational ge- piperidine in acetylcholinesterase by
netics: finding protein function by nonho- docking studies with the SYSDOC pro-
mology methods. Curr. Opin. Struct. Biol. gram. J. Comput. Aided Mol. Des. 8:
10:359–65 683–93
77. Masys DR, Welsh JB, Lynn Fink J, Grib- 87. Sun Y, Ewing TJ, Skillman AG, Kuntz
skov M, Klacansky I, Corbeil J. 2001. ID. 1998. CombiDOCK: structure-based
Use of keyword hierarchies to interpret combinatorial docking and library design.
gene expression patterns. Bioinformatics J. Comput. Aided Mol. Des. 12:597–604
17:319–26 88. Sanchez R, Sali A. 2000. Comparative
INFORMATICS FOR PHARMACOGENOMICS 133
protein structure modeling. Introduction 98. Delcher AL, Kasif S, Fleischmann RD,
and practical examples with modeller. Peterson J, White O, Salzberg SL. 1999.
Methods Mol. Biol. 143:97–129 Alignment of whole genomes. Nucleic
89. Sunyaev S, Lathe W, Bork P III. 2001. Acids Res. 27:2369–76
Integration of genome data and protein 99. Karp PD, Riley M, Saier M, Paulsen IT,
structures: prediction of protein folds, Paley SM, Pellegrini-Toole A. 2000. The
protein interactions and “molecular phe- EcoCyc and MetaCyc databases. Nucleic
notypes” of single nucleotide polymor- Acids Res. 28:56–59
phisms. Curr. Opin. Struct. Biol. 11:125– 100. Wixon J, Kell D. 2000. The Kyoto ency-
30 clopedia of genes and genomes—KEGG.
90. Chasman D, Adams RM. 2001. Predic- Yeast 17:48–55
ting the functional consequences of non- 101. Werner T. 2001. Cluster analysis and pro-
synonymous single nucleotide polymor- moter modelling as bioinformatics tools
phisms: structure-based assessment of for the identification of target genes
amino acid variation. J. Mol. Biol. 307: from expression array data. Pharmacoge-
683–706 nomics 2:25–36
91. Claussen H, Buning C, Rarey M, 102. Zien A, Küffner R, Zimmer R, Lengauer
Lengauer T. 2001. FlexE: efficient molec- T. 2000. Analysis of gene expression data
ular docking considering protein structure with pathway scores. ISMB 2000:407–17
variations. J. Mol. Biol. 308:377–95 103. Nolan PM. 2000. Generation of mouse
92. Goldman BB, Wipke WT. 2000. QSD mutants as a tool for functional genomics.
quadratic shape descriptors. 2. Molecular Pharmacogenomics 1:243–55
docking using quadratic shape descriptors 104. Nadkarni PM, Marenco L, Chen R,
(QSDock). Proteins 38:79–94 Skoufos E, Shepherd G, Miller P. 1999.
93. Moont G, Gabb HA, Sternberg MJ. 1999. Organization of heterogeneous scientific
Use of pair potentials across protein inter- data using the EAV/CR representation. J.
faces in screening predicted docked com- Am. Med. Inform. Assoc. 6:478–93
plexes. Proteins 35:364–73 105. Collins FS, Brooks LD, Chakravarti A.
94. Morris GM, Goodsell DS, Huey R, Olson 1998. A DNA polymorphism discovery
AJ. 1996. Distributed automated docking resource for research on human genetic
of flexible ligands to proteins: parallel ap- variation. Genome Res. 8:1229–31
plications of AutoDock 2.4. J. Comput. 106. Sweeney L. 1998. Privacy and medical-
Aided Mol. Des. 10:293–304 records research. N. Engl. J. Med. 338:
95. Ritchie DW, Kemp GJ. 2000. Protein 1077; discussion-8
docking using spherical polar Fourier cor- 107. Wiederhold G, Bilello M, Sarathy V,
relations. Proteins 39:178–94 Qian X. 1996. A security mediator for
96. Sternberg MJ, Aloy P, Gabb HA, Jack- health care information. Proc. AMIA
son RM, Moont G, et al. 1998. A com- Annu. Fall Symp. 120–24
putational system for modelling flexible 108. Sweeney L, Nolan PM. 1997. Guaran-
protein-protein and protein-DNA dock- teeing anonymity when sharing medical
ing. Proc. Int. Conf. Intell. Syst. Mol. Biol. data, the Datafly System. Proc. AMIA
6:183–92 Annu. Fall Symp. 1:51–55. Philadelphia,
97. Dubchak I, Brudno M, Loots GG, PA: Hanley & Belfus
Pachter L, Mayor C, et al. 2000. Active 109. Malin B, Sweeney L. 2000. Determin-
conservation of noncoding sequences re- ing the identifiability of DNA database
vealed by three-way species comparisons. entries. Proc. AMIA Symp. pp. 537–41.
Genome Res. 10:1304–6 Philadelphia, PA: Hanley & Belfus