The Elements of Statistical Data Mining, Inferece and Prediction
The Elements of Statistical Data Mining, Inferece and Prediction
This has the advantage that an in- Like the Incredible Hulk, statistics
structor can choose articles depending
on the level of the students.
The Elements of Statistical has burst out of its constricting gar-
ments in several directions. In the
Another example is the choice of pa- Learning: Data Mining, foundational direction, Bayesians, es-
pers concerned with reaction-diffusion pecially those of an objectivist stamp
equations and pattern formation in Inference and Prediction like E. T. Jaynes, have reconnected sta-
Chapter 18. We all know that the Tur- by Trevor Hastie, Robert Tibshirani, tistics with inference under uncertainty,
ing mechanism creates wonderful pat- and Jerome Friedman or rational degree of belief on non-con-
terns, but it is still unclear if this mech- clusive evidence. In the direction of en-
anism is responsible for animal skin NEW YORK, SPRINGER-VERLAG,2001. 533 PP. $82.95 gagement with the large and messy
HARDBACK ISBN 0 387 95284-5
patterns, for example. Taubes's selec- data sets thrown up by the computer
tion of papers shows the controversy REVIEWED BY JAMES FRANKLIN revolution, the disciplines of data min-
quite nicely. An initial publication on ing and risk measurement, represented
fish-pattern is opposed by a second ar- by the books of Hastie et al. and Mar-
ticle, which then is commented on by A standard view of probability and rison, have developed data analysis
the authors of the first article. This statistics centres on distributions and tools well outside the traditional
leaves the true impression that this dis- and hypothesis testing. To solve a real boundaries.
cussion is still open. problem, say in the spread of disease, The essence of Jaynes's position is
Other topics of the text: ODEs, one chooses a "model," a distribution that (some) probability is logic, a rela-
phase-plane analysis, linearization, or process that is believed from tradi- tion of partial implication between ev-
vector-matrix notation, advection, dif- tion or intuition to be appropriate to idence and conclusion. According to
fusion, separation of variables, reaction- the class of problems in question. One this point of view, statistical inference
diffusion equations, pattern formation, uses data to estimate the parameters of is in the same line of business as "proof
traveling waves, periodic solutions, the model, and then delivers the re- beyond reasonable doubt" in law and
fast and slow dynamics, and chaos. sulting exactly specified model to the the evaluation of scientific hypotheses
Although the text is not suitable for customer for use in prediction and in the light of experimental evidence.
a course in mathematics, the enormous classification. As a gateway to these Just as "all ravens are black and this is
number of well-chosen references mysteries, the combinatorics of dice a raven" makes it logically certain that
makes it a useful addition for the shelf and coins are recommended; the ener- this is black, so "99% of ravens are
of a generally interested researcher. As getic youth who invest heavily in the black and this is a raven" makes it log-
Taubes says in his preface, his "goal is calculation of relative frequencies will ically highly probable that this is black
to introduce to future experimental bi- be inclined to protect their investment (in the absence of further relevant ev-
ologists some potentially useful tools through faith in the frequentist philos- idence). That is why the results of drug
and modes of thought." ophy that probabilities are all really rel- trials give rational confidence in the ef-
ative frequencies. Those with a taste fects of drugs. Galileo and Kepler used
Department of Mathematical and Statistical for foundational questions are referred the language of objective probability
Sciences to measure theory, an excursion from about the way evidence supported
University of Alberta which few return. their theories, and in the last hundred
Edmonton, Alberta T6G 2G1 That picture, standardised by Fisher years a number of books have filled out
Canada and Neyman in the 1930s, has proved the theory of logical probability--
e-mail: [email protected] in many ways remarkably serviceable. Keynes's Treatise on Probability (the
It is especially reasonable where it is great work of his early years, before he
known that the data are generated by went on to easier pickings in econom-
84 THE MATHEMATICALINTELUGENCER
Elements of Statistical Learning is the heading "ethno-sciences"? But what
ideal introduction. Assuming basic sta-
tistical concepts and an ability to read
Mathematics Across Cultures then are the criteria that include sci-
ence in Mesopotamia or ancient Egypt
formulas, it runs through the methods Helaine Selin and Ubiratan in ethno-sciences, but exclude Greek
of supervised learning (that is, gener- D'Ambrosio, editors science, although all the authors in
alisation from data) that have come Greco-Hellenistic Antiquity regard sci-
KLUWER ACADEMIC PUBLISHERS,2000, 479 PAGES
from many sources: neural networks, HARDBOUND, ISBN 0-7923-6481-3, ~ 195.50
ence in ancient Egypt as the origin of
kernel smoothing, smoothed splines, PAPERBACK, ISBN 1-4020-0260-2, ~63.00 Greek science? Why should Arab math-
nearest-neighbour techniques, logistic ematics, the heir to Greek mathemat-
REVIEWED BY HELENE BELLOSTA
regression and newer techniques like ics, whose contribution is essential to
bagging and boosting. The unified understand the constitution of classical
treatment and illustration with well- his book is meant as a supplement mathematics in 17th-century Europe, be
chosen (and well-graphed) real data- T to the Encyclopaedia of the His- included in ethno-mathematics?
sets makes for efficient understanding tory of Science, Technology and Med- This book seems to make a rather
of the whole field. It is possible to icine in Non-Western Cultures (Kluwer strange division. On one side we have
appreciate how different methods are Academic Publishers, 1997) and is ethno-sciences, bringing together sci-
really attempting the same task--for aimed at a more scholarly audience; ences as different as science in ancient
example, that classification trees de- the aim is to explore the same topics China and science in present-day Abo-
veloped by computer scientists to suit in greater depth. riginal societies, these being viewed as
their discrete mindset are really per- The book is divided into two parts: sciences of unusual societies, the pe-
forming non-linear regression. But the the authors of the six essays in the first culiarities of which, together with their
differences between methods are well section try to define the field of ethno- incommunicabihty, some papers dili-
laid out too: the table on p. 313 com- mathematics and to make a general gently stress; and on the other, by de-
pares the methods with respect to such study of the connection between math- fault, Greek science, European science
crucial qualities as scalability to large ematics and culture as well as the vari- from the Renaissance to nowadays as
data sets, robustness to outliers, han- ability of the concept of rationality, well as science in the USA, would be
dling of missing values, and inter- while the second part is devoted to the left as non-ethnic sciences (white sci-
pretability. The less-tamed territory of description of fifteen individual cul- ence versus colored science?). If we
unsupervised learning, such as cluster tures and their mathematics in various continue to follow the unspoken logic
analysis, is also well covered. One "non Euro-Anmrican" areas: the Middle of this division, these sciences should
topic of current interest missing is the East, America (native cultures), the Pa- then show the opposite qualities and be
attempt to infer causes from data, but, cific and Australia, Africa, and the Far a contrario universal. We should not
as is clear from Richard Neapolitan's East. be surprised then to find here and there
Learning Bayesian Networks (Har- If the intention behind the b o o k - - t o in some papers hasty judgments and
low, Prentice Hall, 2004), that theory is rehabilitate the so-called non-Western worn-out commonplaces on these "dif-
still in a primitive state. Spatial statis- cultures and to denounce the damag- ferent" civilizations, which could be de-
tics and text mining are not covered ei- ing effects of cultural imperialism and fined as the eurocentrism the editors
ther; they too await readable textbooks eurocentrism, the consequence of intended to stigmatize: "The transfor-
of their own. which is a certain contemptuous dis- mation of the word science as a dis-
Mathematicians, pure and applied, regard for these cultures--is highly tinct rationality valued above magic is
think there is something weirdly dif- laudable, this enterprise is not entirely uniquely European" (H. Selin, p. vi) or
ferent about statistics. They are right. free from danger. The main difficulty is "the development of this concept of ra-
It is not part of combinatorics or mea- defining and naming the field of study: tionality (i.e., European's 17th century)
sure theory but an alien science with how should we divide sciences into was not universal. For example, it was
its own modes of thinking. Inference Western and non-Western, or Euro- not paralleled in Islamic society where
is essential to it, so it is, as Jaynes pean and non-European? The criterion men were denied rational agency; they
says, more a form of (non-deductive) is not geographical but cultural (H. were held to lack the capacity to
logic. And, unlike mathematics, it Selin, p. v), for the studies in this book change nature or to understand it.
does have a nice line in colourful deal with mathematics in the Far and Knowledge was instead to be derived
polemic. Middle East, as well as mathematics in from traditional authority" (D. Turn-
Aboriginal, Amerindian, or African so- bull, Rationality and the disunity of
cieties. Should we, as some authors do, the sciences, p. 47). One of the authors
School of Mathematics speak of "non-modern" or "traditional" (R. Eglash, Anthropological perspec-
University of New South Wales sciences, even though this mixes up tives on ethnomathematics) is clearly
Sydney 2052 different eras, from the 3rd millennium conscious of the difficulty of defining
Australia BC to today? Should we then group what ethno-mathematics or non-West-
e-mail: [email protected] these sciences together under the ern mathematics actually are, and also