100% found this document useful (10 votes)
155 views15 pages

Statistical Analysis with Measurement Error or Misclassification Strategy, Method and Application pdf docx

This book by Grace Y. Yi provides a comprehensive overview of statistical analysis methods addressing measurement error and misclassification, particularly in the context of survival analysis and related fields. It covers various topics including longitudinal data analysis, multi-state models, and case-control studies, offering both theoretical insights and practical applications. The text serves as a resource for researchers and graduate students in statistics and biostatistics, emphasizing the importance of accurate measurement in statistical modeling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (10 votes)
155 views15 pages

Statistical Analysis with Measurement Error or Misclassification Strategy, Method and Application pdf docx

This book by Grace Y. Yi provides a comprehensive overview of statistical analysis methods addressing measurement error and misclassification, particularly in the context of survival analysis and related fields. It covers various topics including longitudinal data analysis, multi-state models, and case-control studies, offering both theoretical insights and practical applications. The text serves as a resource for researchers and graduate students in statistics and biostatistics, emphasizing the importance of accurate measurement in statistical modeling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistical Analysis with Measurement Error or

Misclassification Strategy, Method and Application

Visit the link below to download the full version of this book:

https://medipdf.com/product/statistical-analysis-with-measurement-error-or-miscl
assification-strategy-method-and-application/

Click Download Now


More information about this series at http://www.springer.com/series/692
Grace Y. Yi

Statistical Analysis with


Measurement Error or
Misclassification
Strategy, Method and Application

123
Grace Y. Yi
Department of Statistics and Actuarial Science
University of Waterloo
Waterloo, Canada

ISSN 0172-7397 ISSN 2197-568X (electronic)


Springer Series in Statistics
ISBN 978-1-4939-6638-7 ISBN 978-1-4939-6640-0 (eBook)
DOI 10.1007/978-1-4939-6640-0

Library of Congress Control Number: 2016951935

© Springer Science+Business Media, LLC 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer Science+Business Media LLC
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
To my husband, Wenqing, my children, Morgan and Joy,
and my parents, Liangyu and Zhizhen
Foreword

This book is an authoritative addition to the literature on measurement error and


misclassification. I like to think of the field more broadly as statistical analysis when
variables are subject to uncertainty of measurement, although the context of measure-
ment error and misclassification is different from the context of uncertainty quantifi-
cation in applied mathematics and computer modeling.
This book differs considerably from previous books by Fuller (1987), Carroll
et al. (1995, 2006), Gustafson (2004), and Buonaccorsi (2010) because of its com-
prehensive overview of topics in lifetime data analysis, often called survival analysis.
If they touch at all on this important topic, which has quite a large literature, they
touch it only very lightly. Grace Yi’s book covers proportional hazard/Cox regres-
sion, additive hazard survival models, and recurrent event data and is the first text to
cover these important topics in detail. Of course, the fact that the author is an expert
on these topics is very important, and anyone wanting to know about uncertainty of
measurement in lifetime data analysis will want this text as their guide.
Three other chapters are also unique: (a) longitudinal data analysis, (b) multistate
and Markov models, and (c) case–control studies. Again, these topics are touched
upon only lightly by the other books, but Grace Yi has given us a terrific overview of
the literature, one not available elsewhere. I happen to know quite a lot about case–
control and other retrospective studies, and I am impressed by the book’s coverage
of the area, and the important warnings that go with this form of sampling.
Not only are new topics covered in this book, but in addition they are covered
extremely well. Not just authoritatively, but also Grace Yi has made great efforts to
communicate the important ideas well. The book can be used in teaching courses, at

VII
VIII Foreword

all levels ranging all the way up to advanced seminars. I though treasure the book
because I know that I have a resource for understanding issues in lifetime data anal-
ysis, not an area I am comfortable with, but one I confront on a regular basis.

Department of Statistics Raymond J. Carroll


Texas A&M University
College Station, TX 77843-3143, USA
and
School of Mathematical and Physical Sciences
University of Technology Sydney
Broadway, NSW 2007, Australia
Preface

Measurement error and misclassification arise ubiquitously and have been a long-
standing concern in statistical analysis. The effects of measurement error and mis-
classification have been well documented for many settings such as linear regression
and nonlinear regression models. Consequences of ignoring measurement error or
misclassification vary from problem to problem; sometimes the effects are negligible
while other times they can be drastic. A general consensus is to conduct a case-by-
case examination in order to reach a valid statistical analysis for error-contaminated
data.
Over the past few decades, extensive research has been directed to various fields
concerning such problems. Research interest in measurement error and misclassifi-
cation problems has been rapidly spurred in a wide spectrum of data, including event
history data (such as survival data and recurrent event data), correlated data (such as
longitudinal data and clustered data), multi-state event data, and data arising from
case–control studies. The literature on this topic is enormous with many methods
scattered diversely. The goal of this monograph is to bring together assorted meth-
ods under the same umbrella and to provide an update on the recent development
for a variety of settings. Measurement error effects and strategies of handling mis-
measurement for different models are to be closely examined in combination with
applications to specific problems.
A number of books concerning measurement error and misclassification have
been published with distinct focuses. An early book by Fuller (1987) summarizes
the development of linear regression models with errors-in-variables. Focusing on
nonlinear measurement error models, Carroll, Ruppert and Stefanski (1995) pro-
vide analysis strategies for regression problems in which covariates are measured
with error; the second edition, Carroll et al. (2006), further documents up-to-date
methods with a comprehensive discussion on many topics on nonlinear measure-
ment error models, including Bayesian analysis methods. With the emphasis on the
use of relatively simple methods, Buonaccorsi (2010) describes methods to correct

IX
X Preface

for measurement error and misclassification effects for regression models. Under the
Bayesian paradigm, Gustafson (2004) provides a dual treatment of mismeasurement
in both continuous and categorical variables. Other relevant books on this topic
include Biemer et al. (1991), Cheng and Van Ness (1999), Wansbeek and Meijer
(2000), and Dunn (2004).
This monograph covers the material that complements those books, although
there is overlap in some of the topics. While general principles and strategies may
share certain similarities, this book emphasizes unique features in modeling and
analyzing measurement error and misclassification problems arising from medical
research and epidemiological studies. The emphasis is on gaining insight into prob-
lems coming from a wide range of fields. This book aims to present both statistical
theory and applications in a self-contained and coherent manner. To increase read-
ability and ease the access for the readers, necessary background and basic inference
frameworks for error-free contexts are presented at the beginning of Chapters 3–8, in
addition to the discussion in Chapter 1. Each chapter is concluded with bibliographic
notes and discussion, supplemented with exercise problems which may be used for
graduate course teaching. Extensive references to recent development are given for
the readers interested in research on various measurement error and misclassification
problems. Applications and numerical illustrations are supplied.
This monograph is designed for multiple purposes. It can serve as a reference
book for researchers who are interested in statistical methodology for handling data
with measurement error or misclassification. It may be used as a textbook for grad-
uate students, especially for those majoring in Statistics and Biostatistics. This book
may also be used by applied statisticians whose interest focuses on analysis of error-
contaminated data.
This monograph is intended to be read by readers with diverse backgrounds.
Familiarity with inference methods (such as likelihood and estimating function the-
ory) or modeling schemes in varying settings (such as survival analysis and longitu-
dinal data analysis) can result in a full appreciation of the text, but this is not essential.
Readers who are not familiar with those topics may enjoy reading the book by going
through relevant topics. Chapters 1–2 and the first section of each following chapter
provide basic inference frameworks and background material which are useful for
unfamiliar readers. The book does not have to be read according to the sequential or-
der of the chapters. Readers may directly read a chapter of interest by skipping prior
chapters. The exercises at the end of each chapter supplement the development in the
text. Some problems are organized to provide justification of the results discussed in
the text; some problems are modified from research papers or monographs to serve as
applications of the methods discussed in the text; and some problems are designed to
be potential research topics which are worth further explorations. References at the
end of the problems indicate the sources from which the problems are modified.
Preface XI

The book is laid out as three parts: Chapters 1 and 2, Chapters 3–8, and Chapter 9.
Chapter 1 provides a broad overview of general statistical theory on modeling and
inferences for the error-free context, followed by an introductory chapter, Chapter 2,
on measurement error and misclassification. Chapter 2 introduces examples and
issues on mismeasurement, and outlines a number of measurement error models.
This chapter also describes the scope of the coverage of this book and lays out gen-
eral strategies of handling measurement error models.
The second part is the central body of the book with six chapters, each devoted
to a particular field. Chapter 3 concerns the basic ideas and methods for survival
analysis with covariate measurement error, where proportional hazards models and
additive hazards models are the main emphases. Chapter 4 shares some similarity
in theme, but focuses on recurrent event data analysis with error-prone covariates.
Chapter 5 discusses various strategies for handling longitudinal data with covari-
ate measurement error. In particular, methods of dealing with covariate measure-
ment error in combination with other features of longitudinal data, such as missing
observations and joint modeling with survival data, are described in detail. Chapter 6
concerns multi-state models with error-contaminated variables where Markov mod-
els are particularly considered in many cases. Unlike the previous chapters which
pertain to prospective studies, Chapter 7 considers issues on measurement error and
misclassification which arise from retrospective studies. In this chapter, measure-
ment error effects and inference techniques of accounting for mismeasurement are
specifically given for case–control studies. Most of the discussion in Chapters 2–7
addresses measurement error and misclassification related to covariate variables,
although some sections in Chapter 7 touch on error-prone response variables (i.e.,
state misclassification). To complement those topics, Chapter 8 takes up the topic
on mismeasurement in response variables. Both univariate and multivariate response
variables are considered for settings where measurement error or misclassification
may arise. Finally, Chapter 9 is designed to supply an outline of miscellaneous top-
ics which are not touched on in the previous chapters.
I aim to include the main themes and typical methods that have emerged on
the subject of measurement error and misclassification. However, just like any other
monograph, this book is impossible to comprehensively include all relevant research.
The selection of topics, methods, and references is a reflection of my own research
interest. I apologize to those authors whose work was missed being cited or should
have been better presented in this book. Incompleteness in citations is not a sign
of under-appreciation of relevant work but is just an outcome of limited space and
inexhaustive access to the daunting amount of the literature on this subject.
I am indebted to many people who, directly or indirectly, helped with the birth of
this book. I greatly acknowledge collaboration with Wenqing He, Raymond Carroll,
Yanyuan Ma, Donna Spiegelman, Jerry Lawless, Richard Cook, and Lang Wu on
measurement error problems. I thank my students, Ying Yan, Zhijian Chen, Feng
He, and Di Shu, for their interest in working in this direction for their Ph.D. thesis
research. I am extremely thankful to Raymond Carroll, Donna Spiegelman, Nancy
XII Preface

Reid, and Len Stefanski for their useful comments and discussion during the course
of the book writing. In particular, I would like to thank Raymond Carroll for reading
the manuscript and writing a foreword to this book. I am deeply grateful to Jerry
Lawless, Mary Thompson, Ross Prentice, and J.N.K. Rao for reading through the
manuscript; I can’t thank them enough for providing detailed and constructive sug-
gestions. This book came as an outcome of teaching a research topic course for grad-
uate students in the Department of Statistics and Actuarial Science at the University
of Waterloo over the past 10 years, and the students who took this course deserve
thanks as well. I would also like to acknowledge the Department of Statistics and
Actuarial Science at the University of Waterloo for providing a stimulating research
environment and the Natural Sciences and Engineering Research Council of Canada
(NSERC) for funding my research.
Above all, I owe my family big thanks for their tremendous support. My parents
have been maintaining a great interest in seeing a hard copy of this book at its ear-
liest date. I am particularly grateful to my husband, Wenqing He, my son, Morgan
He, and my daughter, Joy He, for their strongest ever-lasting support during the long
process of this book writing as well as my career. My husband, who deserves the
most credit and has been my close collaborator on many research projects, is always
critical and has carefully read through this book by providing numerous constructive
suggestions, criticisms, and corrections. My son, who just entered a Master’s pro-
gram in Engineering, has always been supportive and has offered his best to help.
He assisted me with typing and formatting the material to comply with the required
template, reading through the book draft as an amateur reader with little background
in Statistics, and providing comments as a general reader. The development of this
book also accompanies my daughter’s growth from Grade 4 to her current year in
Grade 10. She started constantly asking me why I was so slow in my book writing
and then became eager to learn to edit with LaTeX in order to help me with some
exercise problem typing. My family is my inspiration and momentum that constantly
push me forward to many new exciting destinations. Without their support, criticism,
encouragement, and appreciation, this book would not have been possible.

University of Waterloo
Waterloo, Canada Grace Y. Yi
June 8, 2016
Guide to Notation and Terminology
 Parameters are represented by Greek letters. Random variables and their real-
izations are usually denoted by upper case letters and the corresponding lower
case letters, respectively, except that Ti and ti represent different quantities in
Chapter 3.
 Usually we differentiate random variables and their realizations by respectively
using upper and lower case letters, but sometimes we simply use upper case
letters to highlight the presence of the variables, especially when discussing the
probability behavior of estimators.
 A binary random variable assumes value 0 or 1 unless otherwise stated.
 In the context of mismeasurement in covariates alone, the response variable is
often denoted by Y ; X and Z are used to differentiate error-prone and error-free
covariates, respectively. The surrogate measurement of X is denoted by X  .
 In the context of measurement error in response alone, covariates are simply
expressed as Z; the true response variable is denoted by Y and its surrogate
version is written as Y  .
 In the case where both response and covariate variables are subject to mismea-
surement, Y and X represent the true, error-prone response and covariate vari-
ables, respectively; and Y  and X  represent the corresponding surrogate mea-
surements. Error-free covariates are denoted by Z.
 The subscript i is often used with random variables to label measurements for
individuals or units; occasionally, we dispense with the subscript from the nota-
tion for ease of exposition. For example, if Yi represents the response variable for
the i th subject, then Y would represent the same type of random variable whose
distribution is identical to that of Yi .
 The dependence on time of a random variable may be indicated by the attached
argument of t or a subscript. For example, Y .t / represents the response measure-
ment at time t and Yij may stand for the response measurement for subject i at
time point j .
 Vectors are written in column form; the superscript T is used to denote the trans-
pose of a vector or matrix.
 The terms “distribution”, “conditional distribution”, and “marginal distribution”
are liberally used to refer to “probability density or mass function”, “conditional
probability density or mass function”, “marginal probability density or mass
function”, respectively.
 When referring to “estimating function(s)”, “parameter(s)”, and “random vari-
able (vector)”, we usually describe them in the singular form for simplicity.
 Notation EU fg.U /g or Efg.U /g represents the expectation of g.U / taken with
respect to the model for the distribution of U ; EU jV fg.U /g or Efg.U /jV g stands
for the conditional expectation of g.U / taken with respect to the model of the
conditional distribution of U given V . Similar usage of notation applies to the
variance or conditional variance of g.U /.
XIV Guide to Notation and Terminology

The following list provides quick access to the key notation used in the book.
Precise definitions should be referred to the text.

Key Notation Throughout the Book

Symbol Description

R The set of all real numbers


1r r  1 unit vector
Ir r  r unit matrix
0r r  1 zero vector
0rq r  q zero matrix
0 (or zero) Depending on the context, it may represent real number zero, a
zero vector, or a zero matrix without confusion
a˝2 a˝2 D aaT for column vector a
h./ or h.j/ True (conditional) probability mechanism for the random vari-
able(s) indicated by the argument(s)
f ./ or f .j/ Statistical (conditional) model that represents a (conditional)
probability density or mass function for the random variable(s)
indicated by the argument(s)
I./ Indicator function
M./ Moment generating function
˚./ Cumulative distribution function of distribution N.0; 1/
d./ Lebesque or counting measure featuring a continuous or discrete
variable (vector)
g 1 ./ Inverse function of g./
J 1 Inverse matrix of nonsingular matrix J
X Error-prone covariate (vector) of dimension px
X Surrogate version of X
Z Precisely measured covariate (vector) of dimension p´
ˇx Effects of error-prone covariates X
ˇ´ Effects of precisely measured covaraites Z
ˇ Parameter (vector) of interest which includes ˇx and ˇ´
e2 or ˙e Variance or covariance matrix for measurement error terms
n Sample size
ui Random effects for i D 1; : : : ; n
M Subject index set for the main study
V Subject index set for the validation sample
t  or t C A time that is infinitesimally smaller or larger than t
p
! Convergence in probability
d
! Convergence in distribution
Guide to Notation and Terminology XV

Chapter 1

Symbol Description

Y Random variable (or vector)


 Parameter (vector) that takes values in the parameter space
 Parameter space which is a subset of Euclidean space Rp ;
p is the dimension of 
0 True value of parameter 
 D .˛ T ; ˇ T /T ˛ is a nuisance parameter subvector;
ˇ is a subvector of interest
Y Random sample fY1 ; : : : ; Yn g with each Yi independently
chosen from the same population
y.n/ Measurements fy1 ; : : : ; yn g of Y
b
 or b
n Estimator (or estimate) of 
L. / Likelihood function
S. / Likelihood score function
U. I y/ Estimating function (or a vector of estimating functions) for
parameter 

Chapter 2

Symbol Description

Y Response variable (or vector)


e Measurement error variable (vector)
O Observed data f.yi ; xi ; ´i / W i D 1; : : : ; ng
LO . / Likelihood for the observed data
LC . / Likelihood for the complete data
U./ Estimating function (or a vector of estimating functions)
expressed in terms of fY; X; Zg or their realizations
U  ./ Estimating function (or a vector of estimating functions)
expressed in terms of fY; X  ; Zg or their realizations
j k (Mis)classification probabilities
P .X  D kjX D j; Z/ for j; k D 0; 1
XVI Guide to Notation and Terminology

Chapter 3

Symbol Description

Ti Survival time for subject i


Ci Censoring time for subject i
ti Observed time min.Ti ; Ci /
ıi Censoring indicator for subject i
.t / or .t jX; Z/ (Conditional) hazard function
0 .t / Baseline hazard function
S.t / or S.t jX; Z/ (Conditional) survivor function
HitX
History fXi .v/ W 0  v  t g for subject i up to time t
dNi .t / Indicator variable I fTi 2 Œt; t C t /I ıi D 1g
Ri .t / At risk indicator I.ti  t /
i Indicator variable I.i 2 V/
O Observed data f.ti ; ıi ; xi ; ´i / W i D 1; : : : ; ng
T Collection of fT1 ; : : : ; Tn g
C Collection of fC1 ; : : : ; Cn g
X Collection of fX1 ; : : : ; Xn g
X Collection of fX1 ; : : : ; Xn g
Z Collection of fZ1 ; : : : ; Zn g

Chapter 4

Symbol Description

Tij Time of the j th event for individual i


Wij Waiting (or gap) time between events .j  1/ and j for in-
dividual i
Ni .t / Number of events over Œ0; t experienced by subject i
Hit N
Event history fNi .v/ W 0  v < t g until (not including) time
t for subject i
Hit XZ
Covariate history f.Xi .v/; Zi .v// W 0  v  t g up to and
including time t for subject i
i .t / or .t jXi ; Zi / (Conditional) mean function EfNi .t /jXi ; Zi g at time t
i Stopping time for individual i
Ri .t / At risk indicator I.t  i /
jk (Mis)classification probabilities
P .X D kjX  D j; Z/ for j; k D 0; 1

You might also like