0% found this document useful (0 votes)
68 views

Bioinformatics The Machine Learning Approach

The document summarizes a book about applying machine learning approaches to bioinformatics. It discusses topics covered in the book including Bayesian probability, algorithms, neural networks, hidden Markov models, graphical modeling, phylogeny, and linguistics. The review provides high-level commentary on the book's strengths in covering relevant machine learning topics and its focus on practical applications over extensive mathematics.

Uploaded by

nitraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Bioinformatics The Machine Learning Approach

The document summarizes a book about applying machine learning approaches to bioinformatics. It discusses topics covered in the book including Bayesian probability, algorithms, neural networks, hidden Markov models, graphical modeling, phylogeny, and linguistics. The review provides high-level commentary on the book's strengths in covering relevant machine learning topics and its focus on practical applications over extensive mathematics.

Uploaded by

nitraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/237044460

Bioinformatics: The Machine Learning Approach

Book · January 2001

CITATIONS READS
883 3,162

2 authors:

Pierre Baldi Søren Brunak


University of California, Irvine University of Copenhagen
409 PUBLICATIONS   25,977 CITATIONS    583 PUBLICATIONS   74,450 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Genomic seuence-structure-function relations View project

Dynamic regulation of miRNAs and mRNAs in embryonic stem cells differentiation View project

All content following this page was uploaded by Søren Brunak on 17 August 2015.

The user has requested enhancement of the downloaded file.


Book reviews

shorten the time-scale. The second is that, Bioinformatics: The Machine


in contrast to building computers, Learning Approach
biological systems are probabilistic rather Pierre Baldi and Søren Brunak
than deterministic in their development. MIT Press; 2nd edn; ISBN 0 262
In particular, they are chaotically sensitive 02506 X; 400pp; US$49.95/
to initial conditions and random £34.95 (hbk); 2001
fluctuations. Thus, while I can believe it
may become possible to create a human
being, it will prove impossible to replicate Although this second edition of a classic
a specific human being. A bit of book is of general interest to any
Brownian motion in the zygote and, bioinformatician, its main audience is
whoops, Cyrano has a cutely retroussé anyone with a keen interest in machine
nose and no story to tell. Indeed we know learning. Ranging from optimisation
this already from our experience of techniques to neural networks, from
monozygotic, but emphatically not hidden Markov models (HMMs) to
identical, twins. So the problem of self and grammars and linguistics, most, if not all,
self-identity in a world swamped with relevant topics are covered in this book.

Downloaded from http://bib.oxfordjournals.org/ by guest on February 11, 2013


clones and spare body parts remains much My personal problem with books such
the same as it has been since Socrates was as this is that they usually presuppose an
at the Symposium. unhealthy appetite for mathematics in
On the other hand, technology has their readers; which I definitely have not
delivered us situations that could not got! In this case the maths is presented
have been imagined, let alone agonised clearly and in chewable amounts.
over, by any reasonable Athenian up Moreover, the appendices contain concise
until the present generation. Baldi’s introductions to statistics, information
response to most of these issues, and I theory, graphical (Bayesian) networks up
applaud him in this, is to accept and to HMM technical detail. ‘The Machine
develop the principal of utilitarianism – Learning Approach’ is a highly practical
the greatest good for the greatest number book, though presenting an appropriate
– by asserting that there are very few amount of detail or the theory behind the
absolutes in ethical matters. In the practical applications.
difficult areas of abortion, cloning, stem Chapter 1 contains the obligatory
cells and genetically modified organisms introduction to molecular biology.
we must take it one step at a time and Fortunately the focus is on the
deal with the problems as they occur. If information content of biological
those steps have to come faster than we’d sequences, and does not try to provide a
like (and progress will relentlessly assure crash course into molecular biology, as is
that they will) then we are bound to so often the case in bioinformatics books,
make some poor decisions. The although it still contains some interesting
adaptability of humans is such, however, biology of a more general nature. Did you
that we will learn from these decisions know, for example, of the existence of
and do better next time. There is no crosses between lions and tigers, called
future, both literally and metaphorically, ligers and tigrons (like mules and hinnies
in trying to stop the deluge of the name differs depending on the sex of
technological advance. Indeed, the most the male parent)? I didn’t.
heartening aspect of the book is Baldi’s Chapters 2, 3 and 4 lay down the
boundless optimism for the future. So framework for machine learning methods.
another excellent reason for reading it is The first two chapters explain Bayesian
because it will be good for the morale. probability theory, while Chapter 4
introduces the algorithms commonly
Andrew T. Lloyd used, such as dynamic programming,
INCBI, the Irish EMBnet Node expectation maximisation (EM),

& HENRY STEWART PUBLICATIONS 1477-4054. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 3. 318–323. SEPTEMBER 2002 321
Book reviews

Markov chain Monte Carlo methods, gene expression, is still a bit thin for a
simulated annealing and genetic subject that is widely considered an
algorithms. important field of research in
After having laid down these bioinformatics. See, for example, the
foundations, we arrive at the core of this December 2001 issue of Briefings in
book, in which the basic theory is applied Bioinformatics (Vol. 2, No. 4). It is also a
to real world methods and problems. pity that methods such as kernel methods
These following chapters cover diverse and SVMs (support vector machines) have
subjects such as neural networks (Chapters been tucked away in an appendix, and
5 and 6), HMMs (Chapters 7 and 8), have not received more attention. I
graphical modelling (Chapter 9), would have liked to see these topics
phylogeny (Chapter 10) and grammars covered in more detail and, as in the other
and linguistics (Chapter 11). Chapter 12, book chapters, accompanied by some
new in the second edition of this book, is relevant examples.
devoted to the analysis of DNA Chapter 13 ‘Internet resources and
microarray data. public databases’, the final chapter, is
What I particularly liked about these somewhat of a disappointment. As the

Downloaded from http://bib.oxfordjournals.org/ by guest on February 11, 2013


chapters is the inclusion of a plethora of great Dutch soccer-legend and
examples to accompany and exemplify homemade philosopher Johan Cruyff
the theory. Neural network examples once said: ‘every advantage ’as its
include protein secondary structure drawback’. The same principle applies
prediction, signal peptide sites, gene here. Of course it is always dangerous to
finding and splice sites. The examples start compiling lists of links to servers,
described in the HMM chapters include software and databases, as it will never be
a number of protein applications such complete. For example consider the
as protein classification, detection of references to the obsolete SRS 5 server in
G-protein coupled receptors in expressed Heidelberg (why not use SRS 6 at the
sequence tag (EST) databases, signal EBI?) or to the NRL_3D database (which
peptides and signal anchors. In the DNA has not been updated for ages, and the
and RNA field, topics such as gene link even does not exist anymore).
finding, splice site, intron and exon Fortunately there is a reference to the
prediction, and prediction of promoter web page1 where most of these links were
regions are covered. taken from, maintained at Brunak’s
Chapter 9 moves on to more exotic Center for Biological Sequence Analysis.
models, such as hybrid models in which But even this site suffers from the same
HMMs are combined with neural problems as the book does. It puzzles me
networks. The applications again include for example why the reference of the
protein secondary structure prediction WhatIf program by Gert Vriend
and gene finding. (previously EMBL, now CMBI) points to
The odd one out in my view is the the HGMP in Hinxton, UK. Likewise,
chapter on phylogeny. When looked the link for Terri Attwood’s PRINTS
upon in the light of Bayesian probabilistic database is still to UCL, while the
models of evolution, it fits in with the database has actually been in Manchester
general concept of the book. But for over three years.
describing phylogeny reconstruction Beside these minor points of criticism,
purely in the light of probability theory having second editions of books such as
brings the subject down to the mere ‘The Machine Learning Approach’ and
mechanics of tree construction. This does Baxevanis and Ouelette’s ‘Bioinformatics:
not do justice to such a broad and A Practical Guide to the Analysis of
complex field of research as Genes and Proteins’ (reviewed in Briefings
phylogenetics. in Bioinformatics, Vol. 2, No. 4) is another
Chapter 12, on DNA microarrays and proof that the field of bioinformatics has

322 & HENRY STEWART PUBLICATIONS 1477-4054. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 3. 318–323. SEPTEMBER 2002
Book reviews

finally come to a state of maturity. Will Jack Leunissen


we live to see the first edition of this book CMBI, EMBnet
become a collector’s item? I think not: it The Netherlands
is not as if we are collecting firsts of R. L.
Stevenson’s ‘Treasure Island’. For a field
that is as young as bioinformatics is, Reference
however, it may be considered a classic. 1. URL: http://www.cbs.dtu.dk/biolink.html

Downloaded from http://bib.oxfordjournals.org/ by guest on February 11, 2013

& HENRY STEWART PUBLICATIONS 1477-4054. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 3. 318–323. SEPTEMBER 2002 323

View publication stats

You might also like