An Introduction to Generalized Linear Models Annette J. Dobson 2024 scribd download
An Introduction to Generalized Linear Models Annette J. Dobson 2024 scribd download
com
https://textbookfull.com/product/an-introduction-to-
generalized-linear-models-annette-j-dobson/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/generalized-linear-models-and-
extensions-fourth-edition-hardin/
textboxfull.com
https://textbookfull.com/product/linear-and-generalized-linear-mixed-
models-and-their-applications-2nd-edition-jiming-jiang/
textboxfull.com
https://textbookfull.com/product/generalized-linear-models-and-
extensions-4th-edition-james-w-hardin/
textboxfull.com
https://textbookfull.com/product/generalized-additive-models-an-
introduction-with-r-second-edition-simon-n-wood/
textboxfull.com
Data analysis using hierarchical generalized linear models
with R 1st Edition Youngjo Lee
https://textbookfull.com/product/data-analysis-using-hierarchical-
generalized-linear-models-with-r-1st-edition-youngjo-lee/
textboxfull.com
https://textbookfull.com/product/an-introduction-to-orthodontics-
simon-j-littlewood/
textboxfull.com
https://textbookfull.com/product/numerical-linear-algebra-an-
introduction-1st-edition-holger-wendland/
textboxfull.com
https://textbookfull.com/product/earth-an-introduction-to-physical-
geology-13e-edward-j-tarbuck/
textboxfull.com
An Introduction to
Generalized Linear
Models
Fourth Edition
CHAPMAN & HALL/CRC
Texts in Statistical Science Series
Series Editors
Joseph K. Blitzstein, Harvard University, USA
Julian J. Faraway, University of Bath, UK
Martin Tanner, Northwestern University, USA
Jim Zidek, University of British Columbia, Canada
Nonlinear Time Series: Theory, Methods, Statistical Rethinking: A Bayesian Course
and Applications with R Examples with Examples in R and Stan
R. Douc, E. Moulines, and D.S. Stoffer R. McElreath
Stochastic Modeling and Mathematical Analysis of Variance, Design, and
Statistics: A Text for Statisticians and Regression: Linear Modeling for Unbalanced
Quantitative Scientists Data, Second Edition
F.J. Samaniego R. Christensen
Introduction to Multivariate Analysis: Essentials of Probability Theory for
Linear and Nonlinear Modeling Statisticians
S. Konishi M.A. Proschan and P.A. Shaw
Linear Algebra and Matrix Analysis for Extending the Linear Model with R:
Statistics Generalized Linear, Mixed Effects and
S. Banerjee and A. Roy Nonparametric Regression Models, Second
Bayesian Networks: With Examples in R Edition
M. Scutari and J.-B. Denis J.J. Faraway
Linear Models with R, Second Edition Modeling and Analysis of Stochastic Systems,
J.J. Faraway Third Edition
V.G. Kulkarni
Introduction to Probability
J. K. Blitzstein and J. Hwang Pragmatics of Uncertainty
J.B. Kadane
Analysis of Categorical Data with R
C. R. Bilder and T. M. Loughin Stochastic Processes: From Applications to
Theory
Statistical Inference: An Integrated P.D Moral and S. Penev
Approach, Second Edition
H. S. Migon, D. Gamerman, and F. Louzada Modern Data Science with R
B.S. Baumer, D.T Kaplan, and N.J. Horton
Modelling Survival Data in Medical
Research, Third Edition Logistic Regression Models
D. Collett J.M. Hilbe
Design and Analysis of Experiments with R Generalized Additive Models: An
J. Lawson Introduction with R, Second Edition
S. Wood
Mathematical Statistics: Basic Ideas and
Selected Topics, Volume I, Second Edition Design of Experiments: An Introduction
P. J. Bickel and K. A. Doksum Based on Linear Models
Max Morris
Statistics for Finance
E. Lindström, H. Madsen, and J. N. Nielsen Introduction to Statistical Methods for
Financial Models
Spatio-Temporal Methods in Environmental T. A. Severini
Epidemiology
G. Shaddick and J.V. Zidek Statistical Regression and Classification:
From Linear Models to Machine Learning
Mathematical Statistics: Basic Ideas and N. Matloff
Selected Topics, Volume II
P. J. Bickel and K. A. Doksum Introduction to Functional Data Analysis
P. Kokoszka and M. Reimherr
Mathematical Statistics: Basic Ideas and
Selected Topics, Volume II Stochastic Processes: An Introduction, Third
P. J. Bickel and K. A. Doksum Edition
P.W. Jones and P. Smith
Discrete Data Analysis with R: Visualization
and Modeling Techniques for Categorical
and Count Data
M. Friendly and D. Meyer
An Introduction to
Generalized Linear
Models
Fourth Edition
By
Annette J. Dobson
and
Adrian G. Barnett
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2018 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
Version Date: 20180306
International Standard Book Number-13: 978-1-138-74168-3 (Hardback)
International Standard Book Number-13: 978-1-138-74151-5 (Paperback)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Names: Dobson, Annette J., 1945- author. | Barnett, Adrian G., author.
Title: An introduction to generalized linear models / by Annette J. Dobson,
Adrian G. Barnett.
Other titles: Generalized linear models
Description: Fourth edition. | Boca Raton : CRC Press, 2018. | Includes
bibliographical references and index.
Identifiers: LCCN 2018002845| ISBN 9781138741683 (hardback : alk. paper) |
ISBN 9781138741515 (pbk. : alk. paper) | ISBN 9781315182780 (e-book : alk.
paper)
Subjects: LCSH: Linear models (Statistics)
Classification: LCC QA276 .D589 2018 | DDC 519.5--dc23
LC record available at https://lccn.loc.gov/2018002845
Preface xv
1 Introduction 1
1.1 Background 1
1.2 Scope 1
1.3 Notation 6
1.4 Distributions related to the Normal distribution 8
1.4.1 Normal distributions 8
1.4.2 Chi-squared distribution 9
1.4.3 t-distribution 10
1.4.4 F-distribution 10
1.4.5 Some relationships between distributions 11
1.5 Quadratic forms 11
1.6 Estimation 13
1.6.1 Maximum likelihood estimation 13
1.6.2 Example: Poisson distribution 15
1.6.3 Least squares estimation 15
1.6.4 Comments on estimation 16
1.6.5 Example: Tropical cyclones 17
1.7 Exercises 17
2 Model Fitting 21
2.1 Introduction 21
2.2 Examples 21
2.2.1 Chronic medical conditions 21
2.2.2 Example: Birthweight and gestational age 25
2.3 Some principles of statistical modelling 35
2.3.1 Exploratory data analysis 35
2.3.2 Model formulation 36
2.3.3 Parameter estimation 36
vii
viii
2.3.4 Residuals and model checking 36
2.3.5 Inference and interpretation 39
2.3.6 Further reading 40
2.4 Notation and coding for explanatory variables 40
2.4.1 Example: Means for two groups 41
2.4.2 Example: Simple linear regression for two groups 42
2.4.3 Example: Alternative formulations for comparing the
means of two groups 42
2.4.4 Example: Ordinal explanatory variables 43
2.5 Exercises 44
4 Estimation 65
4.1 Introduction 65
4.2 Example: Failure times for pressure vessels 65
4.3 Maximum likelihood estimation 70
4.4 Poisson regression example 73
4.5 Exercises 76
5 Inference 79
5.1 Introduction 79
5.2 Sampling distribution for score statistics 81
5.2.1 Example: Score statistic for the Normal distribution 82
5.2.2 Example: Score statistic for the Binomial distribution 82
5.3 Taylor series approximations 83
5.4 Sampling distribution for maximum likelihood estimators 84
ix
5.4.1 Example: Maximum likelihood estimators for the
Normal linear model 85
5.5 Log-likelihood ratio statistic 86
5.6 Sampling distribution for the deviance 87
5.6.1 Example: Deviance for a Binomial model 88
5.6.2 Example: Deviance for a Normal linear model 89
5.6.3 Example: Deviance for a Poisson model 91
5.7 Hypothesis testing 92
5.7.1 Example: Hypothesis testing for a Normal linear
model 94
5.8 Exercises 95
Postface 347
Appendix 355
Software 357
References 359
Index 371
Preface
The original purpose of the book was to present a unified theoretical and
conceptual framework for statistical modelling in a way that was accessible
to undergraduate students and researchers in other fields.
The second edition was expanded to include nominal and ordinal logistic
regression, survival analysis and analysis of longitudinal and clustered data.
It relied more on numerical methods, visualizing numerical optimization and
graphical methods for exploratory data analysis and checking model fit.
The third edition added three chapters on Bayesian analysis for general-
ized linear models. To help with the practical application of generalized linear
models, Stata, R and WinBUGS code were added.
This fourth edition includes new sections on the common problems of
model selection and non-linear associations. Non-linear associations have a
long history in statistics as the first application of the least squares method
was when Gauss correctly predicted the non-linear orbit of an asteroid in
1801.
Statistical methods are essential for many fields of research, but a
widespread lack of knowledge of their correct application is creating inaccu-
rate results. Untrustworthy results undermine the scientific process of using
data to make inferences and inform decisions. There are established practices
for creating reproducible results which are covered in a new Postface to this
edition.
The data sets and outline solutions of the exercises are available on
the publisher’s website: http://www.crcpress.com/9781138741515. We also
thank Thomas Haslwanter for providing a set of solutions using Python:
https://github.com/thomas-haslwanter/dobson.
We are grateful to colleagues and students at the Universities of Queens-
land and Newcastle, Australia, and those taking postgraduate courses through
the Biostatistics Collaboration of Australia for their helpful suggestions and
comments about the material.
Annette J. Dobson and Adrian G. Barnett
Brisbane, Australia
xv
Chapter 1
Introduction
1.1 Background
This book is designed to introduce the reader to generalized linear models,
these provide a unifying framework for many commonly used statistical tech-
niques. They also illustrate the ideas of statistical modelling.
The reader is assumed to have some familiarity with classical statistical
principles and methods. In particular, understanding the concepts of estima-
tion, sampling distributions and hypothesis testing is necessary. Experience
in the use of t-tests, analysis of variance, simple linear regression and chi-
squared tests of independence for two-dimensional contingency tables is as-
sumed. In addition, some knowledge of matrix algebra and calculus is re-
quired.
The reader will find it necessary to have access to statistical computing
facilities. Many statistical programs, languages or packages can now perform
the analyses discussed in this book. Often, however, they do so with a dif-
ferent program or procedure for each type of analysis so that the unifying
structure is not apparent.
Some programs or languages which have procedures consistent with the
approach used in this book are Stata, R, S-PLUS, SAS and Genstat. For
Chapters 13 to 14, programs to conduct Markov chain Monte Carlo methods
are needed and WinBUGS has been used here. This list is not comprehensive
as appropriate modules are continually being added to other programs.
In addition, anyone working through this book may find it helpful to be
able to use mathematical software that can perform matrix algebra, differen-
tiation and iterative calculations.
1.2 Scope
The statistical methods considered in this book all involve the analysis of
relationships between measurements made on groups of subjects or objects.
1
2 INTRODUCTION
For example, the measurements might be the heights or weights and the ages
of boys and girls, or the yield of plants under various growing conditions.
We use the terms response, outcome or dependent variable for measure-
ments that are free to vary in response to other variables called explanatory
variables or predictor variables or independent variables—although this
last term can sometimes be misleading. Responses are regarded as random
variables. Explanatory variables are usually treated as though they are non-
random measurements or observations; for example, they may be fixed by the
experimental design.
Responses and explanatory variables are measured on one of the follow-
ing scales.
1. Nominal classifications: e.g., red, green, blue; yes, no, do not know, not
applicable. In particular, for binary, dichotomous or binomial variables
there are only two categories: male, female; dead, alive; smooth leaves,
serrated leaves. If there are more than two categories the variable is called
polychotomous, polytomous or multinomial.
2. Ordinal classifications in which there is some natural order or ranking be-
tween the categories: e.g., young, middle aged, old; diastolic blood pres-
sures grouped as ≤ 70, 71–90, 91–110, 111–130, ≥ 131 mmHg.
3. Continuous measurements where observations may, at least in theory, fall
anywhere on a continuum: e.g., weight, length or time. This scale includes
both interval scale and ratio scale measurements—the latter have a well-
defined zero. A particular example of a continuous measurement is the time
until a specific event occurs, such as the failure of an electronic component;
the length of time from a known starting point is called the failure time.
Nominal and ordinal data are sometimes called categorical or discrete
variables and the numbers of observations, counts or frequencies in each
category are usually recorded. For continuous data the individual measure-
ments are recorded. The term quantitative is often used for a variable mea-
sured on a continuous scale and the term qualitative for nominal and some-
times for ordinal measurements. A qualitative, explanatory variable is called
a factor and its categories are called the levels for the factor. A quantitative
explanatory variable is sometimes called a covariate.
Methods of statistical analysis depend on the measurement scales of the
response and explanatory variables.
This book is mainly concerned with those statistical methods which are
relevant when there is just one response variable although there will usu-
ally be several explanatory variables. The responses measured on different
subjects are usually assumed to be statistically independent random variables
SCOPE 3
although this requirement is dropped in Chapter 11, which is about correlated
data, and in subsequent chapters. Table 1.1 shows the main methods of statis-
tical analysis for various combinations of response and explanatory variables
and the chapters in which these are described. The last three chapters are de-
voted to Bayesian methods which substantially extend these analyses.
The present chapter summarizes some of the statistical theory used
throughout the book. Chapters 2 through 5 cover the theoretical framework
that is common to the subsequent chapters. Later chapters focus on methods
for analyzing particular kinds of data.
Chapter 2 develops the main ideas of classical or frequentist statistical
modelling. The modelling process involves four steps:
1. Specifying models in two parts: equations linking the response and ex-
planatory variables, and the probability distribution of the response vari-
able.
2. Estimating fixed but unknown parameters used in the models.
3. Checking how well the models fit the actual data.
4. Making inferences; for example, calculating confidence intervals and test-
ing hypotheses about the parameters.
The next three chapters provide the theoretical background. Chapter 3 is
about the exponential family of distributions, which includes the Normal,
Poisson and Binomial distributions. It also covers generalized linear models
(as defined by Nelder and Wedderburn (1972)). Linear regression and many
other models are special cases of generalized linear models. In Chapter 4
methods of classical estimation and model fitting are described.
Chapter 5 outlines frequentist methods of statistical inference for gener-
alized linear models. Most of these methods are based on how well a model
describes the set of data. For example, hypothesis testing is carried out by
first specifying alternative models (one corresponding to the null hypothesis
and the other to a more general hypothesis). Then test statistics are calculated
which measure the “goodness of fit” of each model and these are compared.
Typically the model corresponding to the null hypothesis is simpler, so if it
fits the data about as well as a more complex model it is usually preferred on
the grounds of parsimony (i.e., we retain the null hypothesis).
Chapter 6 is about multiple linear regression and analysis of variance
(ANOVA). Regression is the standard method for relating a continuous re-
sponse variable to several continuous explanatory (or predictor) variables.
ANOVA is used for a continuous response variable and categorical or qual-
itative explanatory variables (factors). Analysis of covariance (ANCOVA)
is used when at least one of the explanatory variables is continuous. Nowa-
4 INTRODUCTION
Table 1.1 Major methods of statistical analysis for response and explanatory vari-
ables measured on various scales and chapter references for this book. Extensions of
these methods from a Bayesian perspective are illustrated in Chapters 12–14.
Response (chapter) Explanatory variables Methods
Continuous Binary t-test
(Chapter 6)
Nominal, >2 categories Analysis of variance
1.3 Notation
Generally we follow the convention of denoting random variables by upper-
case italic letters and observed values by the corresponding lowercase letters.
For example, the observations y1 , y2 , ..., yn are regarded as realizations of the
random variables Y1 ,Y2 , . . . ,Yn . Greek letters are used to denote parameters
and the corresponding lowercase Roman letters are used to denote estimators
and estimates; occasionally the symbol b is used for estimators or estimates.
For example, the parameter β is estimated by βb or b. Sometimes these con-
ventions are not strictly adhered to, either to avoid excessive notation in cases
where the meaning should be apparent from the context, or when there is a
strong tradition of alternative notation (e.g., e or ε for random error terms).
Vectors and matrices, whether random or not, are denoted by boldface
lower- and uppercase letters, respectively. Thus, y represents a vector of ob-
servations
NOTATION 7
y1
..
.
yn
or a vector of random variables
Y1
..
. ,
Yn
f (y; θ )
1 N 1
y= ∑
N i=1
yi = y · .
N
E(W ) = a1 µ1 + a2 µ2 + . . . + an µn (1.2)
because each of the variables Zi = (Yi − µi ) /σi has the standard Normal
distribution N(0, 1).
4. Let Z1 , . . . , Zn be independent random variables each with the distribution
N(0, 1) and let Yi = Zi + µi , where at least one of the µi ’s is non-zero. Then
the distribution of
1.4.3 t-distribution
The t-distribution with n degrees of freedom is defined as the ratio of two
independent random variables. The numerator has the standard Normal distri-
bution and the denominator is the square root of a central chi-squared random
variable divided by its degrees of freedom; that is,
Z
T= (1.6)
(X 2 /n)1/2
where Z ∼ N(0, 1), X 2 ∼ χ 2 (n) and Z and X 2 are independent. This is denoted
by T ∼ t(n).
1.4.4 F-distribution
1. The central F-distribution with n and m degrees of freedom is defined
as the ratio of two independent central chi-squared random variables, each
QUADRATIC FORMS 11
divided by its degrees of freedom,
X2 X22
F= 1 , (1.7)
n m
where X12 ∼ χ 2 (n), X22 ∼ χ 2 (m) and X12 and X22 are independent. This is
denoted by F ∼ F(n, m).
2. The relationship between the t-distribution and the F-distribution can be
derived by squaring the terms in Equation (1.6) and using definition (1.7)
to obtain
2 Z2 X 2
T = ∼ F(1, n) , (1.8)
1 n
that is, the square of a random variable with the t-distribution, t(n), has the
F-distribution, F(1, n).
3. The non-central F-distribution is defined as the ratio of two independent
random variables, each divided by its degrees of freedom, where the nu-
merator has a non-central chi-squared distribution and the denominator has
a central chi-squared distribution, that is,
X12 X22
F= ,
n m
where X12 ∼ χ 2 (n, λ ) with λ = µ T V−1 µ , X22 ∼ χ 2 (m), and X12 and X22 are
independent. The mean of a non-central F-distribution is larger than the
mean of central F-distribution with the same degrees of freedom.
Standard X- Multivariate
Normal Normal n=1 Normal
N(0,1) X N( 2) MVN( 2)
n X21+...+Xn2
t Chi-square
t(n) 2(n)
X12/X22
n m
X2 m R=1
nX
F Wishart
F(n,m) W(R, n )
Figure 1.1 Some relationships between common distributions related to the Normal
distribution, adapted from Leemis (1986). Dotted line indicates an asymptotic rela-
tionship and solid lines a transformation.
1. Non-malignant: Fibroma.
Lipoma.
Angioma.
Adenoma.
Papilloma.
2. Malignant: Sarcoma.
Carcinoma.
RENAL PARASITES.
Echinococcus: Herbivora, Omnivora.
Bilharzia Crassa: Egyptian cattle.
Strongylus Gigas: Horse, ox, dog, man.
(Cysticercus Tenuicollis: Ruminants: Pig).
Tænia serrata: Dog. Pelvis.
Sclerostoma equinum: (renal arteries, kidney pelvis), soliped.
Stephanurus dentatus: Pig, (pus cavities).
Trichosoma plicata: (Urinary bladder), dog.
T. felis: (Cat), bladder.
Indetermined embryos: Kidneys, dog; small tumors.
Cytodites nudus: Kidneys; hens.
Œstrus, (Gast. Hemorrhoidalis): Bladder walls: horse.
Mucorimyces: Kidneys; dog.
Coccidia: Kidney, Horse, dog, goose.
INJURIES OF THE URETERS.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com