A Handbook of Statistical Analyses Using R
A Handbook of Statistical Analyses Using R
Introductory comments
This text is one of a series of five handbooks that present an overview on how to use a major
statistical software package. Handbooks include S-PLUS, Stata, SPSS, SAS, and R. Although
R is not strictly speaking a statistical package, it is a currently popular statistical language
that is downloaded into ones computer from various mirror sites. It is similar in logic to the
S language of the 1980s, which later became transformed to the S-PLUS commercial package.
Brian Everitt, an Emeritus Professor with the Department of Biostatistics and Computing at
King’s College, London, is the primary author of the series. He is a co-author of each book
in the series, serving as the lead author in several, including the subject text of this review.
Other series authors are uniquely competent in the use of the particular statistical package
of the title.
As other handbooks, this text on R comes in both hardback and paperback. Libraries tend
to purchase the hardback editions, all other tend to prefer the paperback. The list price of
the paperback edition is USD 49.95, but it can be purchased through Amazon or Barnes &
Noble for USD 44.95. This is a reasonable cost for an academic text of 304 pages.
The book has fifteen chapters, each devoted to particular aspect of the software. Each chapter
ends with a list of three to five exercise questions, based on the subject of the related chapter.
The bibliography contains in excess of two hundred entries, providing the reader with an
excellent resource of primary readings.
R’s higher-level computing language and statistical, data management, and graphical capa-
bilities are outlined in the text. Useful examples are presented to assist understanding. In
addition, examples incorporate the R commands which produce the output of interest. A
package containing the data sets used for examples can be downloaded from the Compre-
hensive R Archive Network (CRAN). I shall outline the contents of each chapter, offering
comments along the way.
2 A Handbook of Statistical Analyses Using R
Chapter 1: An introduction to R
This chapter introduces the reader to the R language. Hints are given regarding installation
of R from mirror sites, and examples are provided about using the R help and documentation
facility. In addition, the authors give a summary on how to understand R data objects, how to
import and export data, how to engage in simple data manipulation, and how to produce both
summary statistics and essential graphical plots, e.g., histograms, bar graphs, and so forth.
Chapter 1 should be considered as necessary reading for those without a prior knowledge of
R. Those with competency using R can quickly skim the chapter.
in evaluating factor levels. Other tests demonstrated include Hotelling-Lawley, Wilks, Pillai,
and so forth. The authors also show how R can be used to develop scatterplot matrices and
comparison graphs to accompany ANOVA results.
that the multiplication of a value to the variance function makes a quasi-likelihood model.
However, the quasi-Poisson “family” function used by R is not a quasi-likelihood model. All
that was done to the original Poisson model shown on page 105 was to apply a χ2 scaling
factor to the variance. The χ2 scale is based on the χ2 dispersion, or the Pearson χ2 statistic
divided by the residual degrees of freedom (model observations minus predictors, including
constant). The model standard errors are multiplied by the inverse square of the dispersion
statistic after the model parameters have been estimated. It is not a quasi-likelihood model,
rather, it is traditonally termed a model with scaled standard errors. The authors are aware
of these distinctions, yet did not point them out in the chapter.
Another shortcoming of the chapter relates to logistic regression diagnostics. Most major
software packages incorporate Hosmer & Lemeshow diagnostics into their logistic regression
procedure. Usually these diagnostics follow when using ML methods of estimation – however,
this does not need to be the case. Regardless, Hosmer & Lemeshow diagnostics are now
considered somewhat as a standard when dealing with logistic regression models. Yet nothing
at all is mentioned of these diagnostics, nor does it appear that R has this capability. If
another R function allows it, the authors should mention it.
nice graph. Diagnostics, e.g., martingale residuals, are discussed and plotted.
One of the two examples used relates to how members of the New Jersey US House of rep-
resentatives voted on some 19 environmental bills. The other dealt with a comparison of 13
characteristics of British and continental water voles. Although both example data sets were
used to demonstrate the use of R code, primary emphasis was given to the water vole data.
In any case, theory is presented, including a good discussion of measurement distances, e.g.,
Euclidian distance.
Summary
Everitt and Hothorn have written an excellent tutorial on using R to analyze data using a
wide range of standard statistical methods. They use numerous examples throughout the
text, present 100 figures, and show 54 tables to augment discussion. And this is all done in a
book of only 275 pages in length.
I highly recommend the text or anyone learning R, and who want to use it for the sophisticated
analysis of data. No knowledge of R is presumed, but it is expected that the reader have a
basic well-rounded knowledge of statistics.
Reviewer:
Joseph M. Hilbe
Emeritus Professor, University of Hawaii, and
Adjunct Professor, Sociology and Statistics, Arizona State University
Tempe, Arizona, United States of America
E-mail: [email protected] or [email protected]