100% found this document useful (5 votes)
75 views

Unified Methods for Censored Longitudinal Data and Causality Full-Resolution Download

The document discusses unified methods for analyzing censored longitudinal data and addressing causal questions in various fields such as epidemiology. It includes methodologies for estimation, robustness, and examples of applications, particularly in the context of air pollution and health outcomes. The book aims to provide a comprehensive framework for researchers dealing with complex data structures and causal inference challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
75 views

Unified Methods for Censored Longitudinal Data and Causality Full-Resolution Download

The document discusses unified methods for analyzing censored longitudinal data and addressing causal questions in various fields such as epidemiology. It includes methodologies for estimation, robustness, and examples of applications, particularly in the context of air pollution and health outcomes. The book aims to provide a comprehensive framework for researchers dealing with complex data structures and causal inference challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Unified Methods for Censored Longitudinal Data and

Causality

Visit the link below to download the full version of this book:

https://medipdf.com/product/unified-methods-for-censored-longitudinal-data-and-c
ausality/

Click Download Now


vi Preface

bard, Nick Jewell, Susan Murphy, Dan Scharfstein and Aad van der Vaart.
The enthusiasm of epidemiologists John Colford, Bill Satariano, and Ira
Tager for statistical methods suitable for addressing causal questions of
interest in their studies on AIDS, air pollution, drinking water, and effect
of excercise on survival, has been particularly stimulating.
As always, efforts such as writing a book are a product of the commu-
nity one works in. Therefore it is difficult to mention everybody who has
contributed in one way or another to this book. Mark van der Laan would
like to give special thanks to Martine since writing this book would have
been hard without her wonderful support, and to Laura, Lars, and Robin
as well for not having lost track of the important things in life. Mark van
der Laan also thanks Nick Jewell for his continuous intellectual interest and
moral support during the writing of this book, and Bonnie Hutchings for
being an incredible asset to our Department. James Robins would like to
specially thank Andrea Rotnitzky for being both intellectual soulmate and
supportive friend, Susanna Kaysen for her extended hospitality at Casa
Kaysen, and Valerie Ventura for just about everything.
Mark van der Laan has been supported in his research efforts over the
period of work on this book (1994-2002) by a FIRST award (GM53722)
grant from the National Institute of General Medical Sciences and a NIAID
grant (1-R01-AI46182-01), National Institute of Health. James Robins has
been supported by an NIAID grant.

Mark J. van der Laan

James M. Robins
Contents

Preface v

Notation 1

1 Introduction 8
1.1 Motivation, Bibliographic History, and an Overview of the
book. . . . . . . . . . . . . . . . . . . . 8
1.2 Tour through the General Estimation Problem. . . 16
1. 2.1 Estimation in a high-dimensional full data model 17
1.2.2 The curse of dimensionality in the full data model 21
1.2.3 Coarsening at random . . 23
1.2.4 The curse of dimensionality revisited . . . . 27
1.2.5 The observed data model. . . . . . . . . . 40
1.2.6 General method for construction of locally efficient
estimators .. .. 40
1.2.7 Comparison with maximum likelihood estimation 45
1.3 Example: Causal Effect of Air Pollution on Short-Term
Asthma Response . . . . . . . . . 48
1.4 Estimating Functions . . . . . . . . . . 55
1.4.1 Orthogonal complement of a nuisance tangent space 55
1.4.2 Review of efficiency theory. 61
1.4.3 Estimating functions. . . . . . . . . . . . 62
1.4.4 Orthogonal complement of a nuisance tangent space
in an observed data model . . 64
viii Contents

1.4.5 Basic useful results to compute projections . . .. 68


1.5 Robustness of Estimating Functions . . . . . . . . . . .. 69
1.5.1 Robustness of estimating functions against misspec-
ification of linear convex nuisance parameters. . . 69
1.5.2 Double robustness of observed data estimating
functions. . . . . . . . . . . . . . . 77
1.5.3 Understanding double robustness for a general
semiparametric model . . . . . . . . . . . . . . . 79
1.6 Doubly robust estimation in censored data models. . .. 81
1. 7 Using Cross-Validation to Select Nuisance Parameter
Models. . . . . . . . . . . . . . . . . . . . . . . . . . .. 93
1.7.1 A semiparametric model selection criterian . . . . 94
1.7.2 Forward/backward selection of a nuisance parame-
ter model based on cross-validation with respect to
the parameter of interest. . . . . . . . . . . . . . 97
1.7.3 Data analysis example: Estimating the causal rela-
tionship between boiled water use and
diarrhea in HIV-positive men .. . . . . . . . .. 99

2 General Methodology 102


2.1 The General Model and Overview. . . . . . . . . . . .. 102
2.2 Full Data Estimating Functions. . . . . . . . . . . . . .. 103
2.2.1 Orthogonal complement of the nuisance tan-
gent space in the multivariate generalized linear
regression model (MGLM) . . . . . . . . . . . .. 105
2.2.2 Orthogonal complement of the nuisance tangent
space in the multiplicative intensity model . . .. 107
2.2.3 Linking the orthogonal complement of the nuisance
tangent space to estimating functions. . . . . .. 111
2.3 Mapping into Observed Data Estimating Functions . .. 114
2.3.1 Initial mappings and reparametrizing the full data
estimating functions . . . . . . . . . . . . . . . . 114
2.3.2 Initial mapping indexed by censoring and protected
nuisance parameter . . . . . . . . . . . . . . . .. 124
2.3.3 Extending a mapping for a restricted censoring
model to a complete censoring model . . . . . .. 125
2.3.4 Inverse weighting a mapping developed for a
restricted censoring model . . . . . . . . . . . .. 126
2.3.5 Beating a given RAL estimator . . . . . . . . .. 128
2.3.6 Orthogonalizing an initial mapping w.r.t. G: Double
robustness . . . . . . . . . . . . . . . . . . . . .. 131
2.3.7 Ignoring information on the censoring mechanism
improves efficiency . . . . . . . . . . . . . . . .. 135
2.4 Optimal Mapping into Observed Data Estimating Func-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 137
Contents ix

2.4.1 The corresponding estimating equation. . . . .. 139


2.4.2 Discussion of ingredients of a one-step estimator. 141
2.5 Guaranteed Improvement Relative to an Initial Estimating
Function . . . . . . . . . . . . . . . . . . 142
2.6 Construction of Confidence Intervals . . . . . . . . . . . 144
2.7 Asymptotics of the One-Step Estimator. . . . . . . . .. 145
2.7.1 Asymptotics assuming consistent estimation of the
censoring mechanism . . . . . . . . . . . . . . .. 146
2.7.2 Proof of Theorem 2.4 . . . . . . . . . . . . . . .. 150
2.7.3 Asymptotics assuming that either the censoring
mechanism or the full data distribution is estimated
consistently . . . . . . 151
2.7.4 Proof of Theorem 2.5 . . . . . . . . . . . . . . .. 152
2.8 The Optimal Index . . . . . . . . . . . . . . . . . . . .. 153
2.8.1 Finding the optimal estimating function among a
given class of estimating functions . . . . . . . .. 159
2.9 Estimation of the Optimal Index . . . . . . . . . . . . . 166
2.9.1 Reparametrizing the representations of the optimal
full data function . . . . . . . . . . . . . . . . .. 167
2.9.2 Estimation of the optimal full data structure
estimating function . . . . . . . . . . . . . . . .. 169
2.10 Locally Efficient Estimation with Score-Operator Repre-
sentation . . . . . . . . . . . . . . . . . . . . . . . . . .. 170

3 Monotone Censored Data 172


3.1 Data Structure and Model 172
3.1.1 Cause-specific censoring 175
3.2 Examples............. 176
3.2.1 Right-censored data on a survival time 176
3.2.2 Right-censored data on quality-adjusted survival
time .. . . . . . . . . . . . . . . . . . . . . . .. 177
3.2.3 Right-censored data on a survival time with
reporting delay ... . . . . . . . . . . . . . . .. 179
3.2.4 Univariately right-censored multivariate failure
time data . . . . . . . . . . . . . . . . . . . . . . 181
3.3 Inverse Probability Censoring Weighted (IPCW) Estima-
tors . 183
3.3.1 Identifiability condition. . . . . . . . . . . . . . . 183
3.3.2 Estimation of a marginal multiplicative intensity
model . . . . . . . . . . . . . . . . . . . . . . . . 184
3.3.3 Extension to proportional rate models. . . . . . . 191
3.3.4 Projecting on the tangent space of the Cox
proportional hazards model of the censoring
mechanism . . . . . . . . . . . . . . . 192
3.4 Optimal Mapping into Estimating Functions . . . . . . . 195
x Contents

3.5 Estimation of Q . . . . . . . . . . . . . . . . . . . . . .. 196


3.5.1 Regression approach: Assuming that the censoring
mechanism is correctly specified . . . . . . . . .. 197
3.5.2 Maximum likelihood estimation according to a
multiplicative intensity model: Doubly robust .. 198
3.5.3 Maximum likelihood estimation for discrete models:
Doubly robust . . . . . . . . . . . . . 200
3.5.4 Regression approach: Doubly robust . . . . . 201
3.6 Estimation of the Optimal Index . . . . . . . . . . . 204
3.6.1 The multivariate generalized regression model 205
3.6.2 The multivariate generalized regression model when
covariates are always observed. . . . . . . . . .. 206
3.7 Multivariate failure time regression model . . . . . . . . 208
3.8 Simulation and data analysis for the nonparametric full
data model. . . . . . . . . . . . . . . . . . . . . . . 211
3.9 Rigorous Analysis of a Bivariate Survival Estimate 217
3.9.1 Proof of Theorem 3.2 . 221
3.10 Prediction of Survival. . . . . . . . . . . . . . . . . 224
3.10.1 General methodology. . . . . . . . . . . . . 225
3.10.2 Prediction of survival with Regression Trees 230

4 Cross-Sectional Data and Right-Censored Data Com-


bined 232
4.1 Model and General Data Structure 232
4.2 Cause Specific Monitoring Schemes 234
4.2.1 Overview........... 235
4.3 The Optimal Mapping into Observed Data Estimating
Functions. . . . . . . . . . . . . . . . . . . . . . . . . .. 236
4.3.1 Identifiability condition. . . . . . . . . . . . . .. 239
4.3.2 Estimation of a parameter on which we have current
status data. . . . . . . . . . . . . . . . . . . . .. 241
4.3.3 Estimation of a parameter on which we have right-
censored data . . . . . . . . . . . . . . . . . . .. 243
4.3.4 Estimation of a joint-distribution parameter on
which we have current status data and right-
censored data . . . . . . . . . . . . . . . . . . .. 244
4.4 Estimation of the Optimal Index in the MGLM . . . .. 245
4.5 Example: Current Status Data with Time-Dependent
Covariates . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.5.1 Regression with current status data. . . . . . . 248
4.5.2 Previous work and comparison with our results 250
4.5.3 An initial estimator . . . . . . . . . . . 251
4.5.4 The locally efficient one-step estimator 252
4.5.5 Implementation issues . . . . . . . . 253
4.5.6 Construction of confidence intervals. . 255
Contents xi

4.5.7 A doubly robust estimator . . . . . . . . . . . .. 256


4.5.8 Data-adaptive selection of the location parameter 257
4.5.9 Simulations . . . . . . . . . . . . . . 257
4.5.10 Example 1: No unmodeled covariate. . . . 258
4.5.11 Example 2: Unmodeled covariate . . . . . 258
4.5.12 Data Analysis: California Partners' Study 260
4.6 Example: Current Status Data on a Process Until Death 262

5 Multivariate Right-Censored Multivariate Data 266


5.1 General Data Structure. . . . . . . . . . . 266
5.1.1 Modeling the censoring mechanism . . . . 268
5.1.2 Overview................... 270
5.2 Mapping into Observed Data Estimating Functions 271
5.2.1 The initial mapping into observed estimating data
functions . . . . . . . . . . . . . . . . . . . . . .. 271
5.2.2 Generalized Dabrowska estimator of the survival
function in the nonparametric full data model .. 273
5.2.3 Simulation study of the generalized Dabrowka
estimator. . . . . . . . . . . . . . . . . . . . . .. 275
5.2.4 The proposed mapping into observed data estimat-
ing functions . . . . . . . . . . . . . . . . . 276
5.2.5 Choosing the full data estimating function in
MGLM. . . . . . . . . . . . . . . . . 282
5.3 Bivariate Right-Censored Failure Time Data 282
5.3.1 Introduction.............. 282
5.3.2 Locally efficient estimation with bivariate right-
censored data . . . . . . . . . . . . . . . . . . .. 286
5.3.3 Implementation of the locally efficient estimator. 290
5.3.4 Inversion of the information operator . . . . . .. 292
5.3.5 Asymptotic performance and confidence intervals 293
5.3.6 Asymptotics..................... 294
5.3.7 Simulation methods and results for the nonparamet-
ric full data model . . . . . . . . . . . . . 299
5.3.8 Data analysis: Twin age at appendectomy . . .. 302

6 Unified Approach for Causal Inference and Censored


Data 311
6.1 General Model and Method of Estimation . . 311
6.2 Causal Inference with Marginal Structural Models. . 318
6.2.1 Closed Form Formula for the Inverse of the
Nonparametric Information Operator in Causal
Inference Models. . . . . . . . . . . . . . . 324
6.3 Double Robustness in Point Treatment MSM .. 326
6.4 Marginal Structural Model with Right-Censoring 329
xii Contents

6.4.1 Doubly robust estimators in marginal structural


models with right-censoring . . . . . . . . . . . . 334
6.4.2 Data Analysis: SPARCS . . . . . . . . . . . . . . 338
6.4.3 A simulation for estimators of a treatment-specific
survival function . . . . . . . . . . . . . . . . . . 343
6.5 Structural Nested Model with Right-Censoring . . . . . . 347
6.5.1 The orthogonal complement of a nuisance tangent
space in a structural nested model without censoring 353
6.5.2 A class of estimating functions for the marginal
structural nested model. . . . . . . . . . . . . . . 357
6.5.3 Analyzing dynamic treatment regimes . . . . . . 359
6.5.4 Simulation for dynamic regimes in point treatment
studies . . . . . . . . . . . . 360
6.6 Right-Censoring with Missingness. . . . . . . . . . . . . 362
6.7 Interval Censored Data . . . . . . . . . . . . . . . . . . 366
6.7.1 Interval censoring and right-censoring combined 368

References 371

Author index 388

Subject index 394

Example index 397


Notation

CAN: consistent and asymptotically normal.


RAL: regular and asymptotically linear.
SRA: sequential randomization assumption.
CAR: coarsening at random.
i.i.d.: identically and independently distributed.
MGLM: multivariate generalized linear regression model.
m~~s f(x): this denotes the argument at which the function f : S -+ 1R
is maximal.
min;is f(x): this denotes the argument at which the function f : S -+ lR
is minimal.
P l == P 2 for two probability measures PI, P2: this means that PI is ab-
solutely continuous w.r.t. P2 (which we denote with PI «: P2) and P2 is
absolutely continuous w.r.t. PI (P2 «: Pt). In other words, dPddP2 and
dP2/ dPI exist.
dGl/dG < 00: same as e l «: e, but where eI, e refer to the conditional
distributions of Y, given X.
y = q;(X, C): denotes observed data on a subject (or more general, the
experimental unit), which is a function of a full data structure X and cen-
soring variable C. It is always assumed that we observe n i.i.d. Y I , ... , Yn
copies of Y.
X: outcome space of X.
Y: outcome space of Y.
X(t) = (X(s) : s :::; t): full data process up to point t.
X == X(T): time-dependent full data process up to a possibly random end-
point T.
1
2 Notation

Fx: the probability distribution of the full data X.


G: the conditional probability distribution of C, given X, also called the
censoring mechanism, treatment mechanism, or action mechanism, depend-
ing on what C stands for. When we define CAR, we do this in terms of the
conditional distribution of Y, given X, which is determined by G, and, by
CAR, it is the identifiable part of G. In this book we frequently denote the
conditional distribution of Y, given X, with G as well.
PFx,G: the distribution of the observed data Y, which only depends on G
through the conditional distribution of Y, given X.
Q: a model for the censoring mechanism G (Le., it is known that G E 9).
Q(CAR): all conditional distributions G of C, given X, satisfying coars-
ening at random (CAR).
MF: a model for Fx (Le., the full data model).
M = {PFx,G : Fx E MF,w,G E Q(CAR)} U {PFx,G : Fx E MF,G E Q},
the observed data model allowing that either the working model MF,w for
Fx or the censoring model Q for Gis misspecified, but not both.
M(Q) = {PFx,G : Fx EMF, G E Q}: the observed data model when as-
suming a correctly specified model Q for G.
M(G) = {PFx,G : Fx EMF}: the observed data model if the censoring
mechanism G is known.
M(CAR) = {PFx,G: Fx E MF,G E Q(CAR)}: the observed data model
if the censoring mechanism is only known to satisfy CAR.
I-" = I-"(Fx) E JRk: the Euclidean parameter of Fx of interest.
Z = g(X* I a) + €: a multivariate generalized regression model (a partic-
ular choice of full data model), where Z is a p-variate outcome, X* is a
vector of covariates, g(X* I a) is a p-dimensional vector whose components
are regression curves parametrized with a regression parameter a E JRk,
€ is a p-variate residual satisfying E(K(€j) I X*) = 0, j = 1, .. . ,p, for a
given monotone nondecreasing function K.
K(.): monotone function specifying the location parameter (e.g., mean,
median, truncated mean, smooth median) of the conditional distribution
of Z, given X*, modeled by g(X* I a). For example, 1) K(€) = €, 2)
K(€) = J(€ > 0) - (1 - p), 3) K(€) = € on [-7,7] and K(€) = 7 for
€ > 7, K(€) = -7 for € < -7 correspond with mean regression, pth quan-
tile regression (e.g., p = 0.5 gives median regression), and truncated mean
regression, respectively.
N (t): a counting process being a part of the full data X.
>.(t)dt = E(dN(t) I Z(t-)) = Y(t)>'o(t) exp(,6W(t))dt: a multiplicative in-
tensity model (a particular full data model), where Z(t) is a given function
of X(t) including the past N(t) of the counting process N, Y(t) is an indi-
cator function of Z(t-) (indicator that N(.) is at risk of jumping at time
t), and W(t) is a vector of covariates extracted from Z(t-). We also con-
sider the case where Z(t-) does not include the past of N. In this case, we
refer to these models as proportional rate models. We also consider discrete
multiplicative intensity models, where >'(t) = Y(t)Ao(dt) exp(betaW(t)) is
Notation 3

now a conditional probability.


F= I-F.
< 8 1 , ... , 8k >: linear span of k elements (typically scores in L~(Fx) or
L~(PFx,G)) in a Hilbert space.
< § > ==< 8 1 , ... , 8 k >: linear span of the k components of S.
(f, g): inner product defined in a Hilbert Space.
(f, (g1, ... , gk)) == ((f, gl), ... , (f, gk)).
H1 EB H2 = {h1 + h2 : hj E Hj,j = 1,2}: the sum space spanned by two
orthogonal sub-Hilbert spaces H 1 , H2 of a certain Hilbert space.
H1 + H2 = {h 1 + h2 : hj E Hj,j = 1, 2}: the sum space spanned by two
sub-Hilbert spaces Hb H2 of a certain Hilbert space.
II(· I H): the projection operator onto a subspace H of a certain Hilbert
space.
L~(Fx): Hilbert space of functions h(X) with EFxh(X) = 0 with inner
product (h,g)Fx = EFxh(X)g(X) and corresponding norm I h IIFx=
J EFx h2 (X).
TF(Fx) C L~(Fx): the tangent space at Fx in the full data model MF.
This is the closure of the linear space spanned by scores of a given class of
one-dimensional submodels f ---+ F€ that cross Fx at f = O.
T!uis(Fx) C L~(Fx): the nuisance tangent space at Fx in the full data
model M F. This is the closure of the linear space spanned by scores of a
given class of one-dimensional submodels f ---+ F€ that cross Fx at f = 0
and satisfy djdf/.L(F€)I€=o = O.
T!~Ts(Fx) C L~(Fx): the orthogonal complement of the nuisance tangent
space T!uis(Fx) in model M F , where JL is the parameter of interest. The
class of full data estimating functions D h(· I JL,p), h E 1{F, is chosen so
that T:~t(Fx) ~ {Dh(X I JL(Fx),p(Fx)) : hE 1{F}, where the right hand
side is chosen as rich as possible so that we might even have equality.
D h , defined in full data model MF: full data estimating function Dh :
X x ((JL(Fx), p(Fx)) : Fx EMF} ---+ lR for parameter JL with nuisance
parameter p. Here h E 1{F indexes different possible choices of full data
estimating functions.
1{F: index set providing a rich class of full data estimating functions
satisfying:
Dh(X I JL(Fx),p(Fx)) E T:~t(Fx) for all h E 1{F.
Dh, hE 1{F,k: for h = (h1"'" hk) E 1{Fk,

A full data structure estimating function D h, h E 1{Fk, defines an


estimating equation for JL: given an estimate of p, one can estimate JL with
the solution of the k-dimensional equation 0 = E~=l Dh(Xi I JL, Pn).
Dh(X I JL, p): full data estimating function Dh evaluated at X, JL, p.
Sometimes, Dh(X I JL, p) is used to denote the actual estimating function,
4 Notation

just to make its arguments explicit.


V(J..L,p) = {Dh(X I J..L,p) : h E 1tF}: all possible full data functions
obtained by varying h, but fixing J..L, p.
V = {Dh (X I J..L(Fx) , p(Fx)) : Fx E M F , h E 1tF}: all possibly full data
structure estimating functions obtained by varying hand Fx.
S;k(' I Fx): the canonical gradient (also called efficient influence curve)
of the pathwise derivative of the parameter J..L(Fx) in the full data model
MF.
T!~tt(Fx): the set of all gradients of the pathwise derivative at Fx of
the parameter p(Fx) in the full data model MF whose components span
T~t(Fx).
Dhetr('1 J..L(Fx) , p(Fx)) = 8;]f(' I Fx): that is, hef! indexes the optimal
estimating function in the full data structure model. Here hef! = hef!(Fx)
depends on Fx. Off course, one still obtains an optimal estimating function
by putting a k x k fixed matrix in front of 8;]f'
F(t): a predictable observed subject-specific history up to time t, typically
representing all observed data up to time point t on a subject.
A: a time-dependent possibly multivariate process A(t)
(A 1 (t), ... ,Ak(t)) whose components describe specific censoring (e.g.,
treatment) actions at time t. Here A represents the censoring variable C
for the observed data structure. Typically, Aj(t), j = 1, ... , k, are counting
processes.
A: the support of the marginal distribution of A.
a(t) = E(dA(t) I F(t)): the intensity (possibly discrete, a(t) = P(dA(t) =
1 I F(t)) at given grid points) of counting process A(t) w.r.t. the history
F(t).
Y = (A, XA): a particular type of observed censored data, where for the
full data we have X = (Xa : a E A), and A tells us what component of
X we observe. For example, A(t) can be the indicator I(C :::; t) of being
right-censored by a dropout time C. If the full data model is a causal
model and there is no censoring, then A(t) is the treatment that the
subject receives at time t. If the observed data structure includes both
treatment assignment and censoring, then A(t) is the multivariate process
describing the treatment actions and censoring actions assigned to the
subject at time t.
L~(PFx,G): Hilbert space of functions V(Y) with EpFx.G V(Y) = 0 with
inner-product (h,g)PFx. G = EPFX.Gh(Y)g(Y) and corresponding norm
II h IIpFx.G= VEPFX.Gh2(Y).
T(PFx,G) c L~(PFx,G), Tnuis(PFx,G) C L~(PFx,G), T*uis(PFx,G) C
L~(PFx.G) are the observed data tangent space, observed data nuisance
tangent space, and the orthogonal complement of the observed data
nuisance tangent space at PFx,G, respectively, in model M(CAR) (or, if
made explicit in M(g)), where J..L is the parameter of interest.
TCAR(PFx,G) = {V(Y) : EG(V(Y) I X) = O} C L~(PFx,G): the nuisance
Notation 5

tangent space of G in model M (C AR).


T2(PFx,G) C TCAR(PFx,G) or TG(PFx,G) C TCAR(PFx,G): the nuisance
tangent space of G in the observed data model M(g).
D ~ ICo(Y I Qo,G,D), D ~ IC(Y I Fx,G,D), D ~ IC(Y I Q,G,D):
mapping from a full data function into an observed data function in-
dexed by nuisance parameters Qo(Fx, G), G, F x , G or Q(Fx, G), G.
ICo(Y I Qo,G,D) stands for an initial mapping and IC(Y I Fx,G,D)
and IC(Y I Q(Fx, G), G, D) for the optimal mapping orthogonalized
w.r.t. TeAR or a mapping orthogonalized w.r.t. a subspace of TeAR. In
many cases, it is not convenient to parametrize IC in terms of Fx, G, but
instead parametrize it by a parameter Q = Q(Fx, G) and G. We note
that the dependence of these functions on FX and G is only through the
Fx-part of the density of Y and the conditional distribution of Y, given
X, respectively.
The mapping ICo satisfies for each PFx,G E M(Q): for a non empty set of
full data functions V(PI(Fx), G), we have
Ee(ICo(Y I Q, G, D) I X) = D(X) Fx-a.e. for all Q E Qo. (1)
For IC, we have the additional property at each PFx,G E M(CAR):
IC(Y I Q(Fx,G),G,D) ICo(Y I Qo(Fx,G),G,D)
- IIFx,e(ICo(Y I Qo(Fx,G),G,D) I TeAR),
or the projection term can be a projection on a subspace of TeAR. Here
II(· I TeAR) denotes the projection operator in the Hilbert space L~(PFx,e)
with inner product (f,g)PFx,G = EPFX,G!(Y)g(Y).
V(PI(FX), G): the set of full data functions in V for which (1) holds.
Thus, these are the full data structure functions that are mapped by
leo into unbiased observed data estimating functions. By making the
appropriate assumption on the censoring mechanism, one will have that
V(PI (Fx ), G) = V, but one can also decide to make this membership re-
quirement D h(· I f..L(Fx),p(Fx)) E V(PI(Fx), G) a nuisance parameter of
the full data structure estimating function: see next entry.
D h (· I f..L(Fx) , p(Fx, G)), h E 1iF : these are full data structure estimat-
ing functions satisfying Dh(' I f..L(Fx) , p(Fx, G)) E V(PI (Fx), G)) for all
h E 1iF . Formally, they are defined in terms of initially defined full data
estimating functions Dh as

Di;,(- I f..L,p, PI, G) == Drr(hl1t F(/-t,p,Pl,e))('1 f..L,p),


where 1iF (f..L, p, PI, G) C 1iF are the indexes that guarantee that
Ee(ICo(Y I Qo, G, Dh(' I f..L, p)) I X) = Dh(X I f..L, p) Fx-a.e and
II(I 1iF(f..L, p, PI. G)) is a mapping from 1iF into 1iF(f..L, p, PI. G) that is
the identity mapping on 1i F (f..L,P, PI. G). Thus, if Dh(' I f..L(Fx),p(Fx)) E
V(pi (Fx ), G) for all PFx ,e E M (Q), then D'h = Dh. For notational con-
venience, we denote D'h (. I f..L, p, PI, G) with Dh (. I f..L, p) again, but where P
6 Notation

now includes the old p, PI, and G.


IC(Y I Q, G, Dh(· I J.L, p)): an observed data estimating function for J.L with
nuisance parameters Q(Fx, G), G, and p, which is obtained by applying
the mapping D --t IC(Y I Q, G, D) to the particular full data estimating
function Dh. If h = (hI, ... , hk) E 1{Fk, then IC(Y I Q, G, Dh(· I J.L, p))
denotes

(IC(Y I Q, G, Dhl (. I J.L, p)), ... , IC(Y I Q, G, DhJ I J.L, p))).


S:ff(Y I Fx, G): the canonical gradient (also called the efficient influence
curve) of the pathwise derivative of the parameter J.L in the observed data
model M.
IC(Y I Fx,G,DhoPt(·1 J.L(Fx) , p(Fx, G))) = S:ff(Y I Fx,G): that is,
hopt indexes the choice of full data estimating function that results in the
optimal observed data estimating function for J.L in the observed data model
M, M(Q), and M(CAR). Here hopt = hopt (Fx , G) depends on Fx and G.
hind,Fx : L~(Fx) --t 1{F: We call it the index mapping since it maps a full
data function into an index h defining the projection onto T:~t(Fx). It is
defined by

Dh;nd,Fx(D)(X I J.L(Fx),p(Fx)) = II(D I T:~t(Fx)).

AFx : L~(Fx) --t L~(PFx,a) : AFx(h)(Y) = EFx(h(X) I Y): the nonpara-


metric score operator that maps a score of a one-dimensional fluctuation
FE at Fx into the score of the corresponding one-dimensional fluctuation
PF.,a at PFx,c.
AG : L~(PFx,a) --t L~(Fx): Aa(V)(X) = Ea(V(Y) I X): the adjoint of
the nonparametric score operator AFx.
IFx,G = A6AFx:L~(Fx) --t L~(Fx): IFx,a(h)(X) = Ea(EFx(h(X) I Y) I
X): the nonparametric information operator. If we write IF~,a(h), then it
is implicitly assumed that IFx,a is 1-1 and h lies in the range of IFx,a.
IC(Y I Fx, G, D) == AFxIFx,a(D) is an optimal mapping (assuming that
the generalized inverse is defined) from full data estimating functions
into the observed data estimating function. For any ICo(Y I Fx, G, D)
satisfying E(ICo(Y I Fx, G, D) I X) = D(X) Fx-a.e., the optimal map-
ping can be more generally defined by IC(Y I Fx,G,D) = ICo(Y I
Fx, G, D) - II(ICo(Y I Fx, G, D) I TCAR(PFx,a)).
IFx,G = II(IFx,a I TF(Fx)): the information operator. If we write
I;~;dh), then it is implicitly assumed that Ih,a is 1-1 and h lies in
the range of IF-x ,a·
The projection operator can be expressed as a sum of a projection on
a finite space and the projection on T:~t since TF(Fx) = (Seff(· I
Fx)) EB Tr!"uis(FX).
S:ff(Y I Fx, G) = AFxI~~,dS:rf(· I Fx)): that is, the efficient influence
curve can be expressed in terms of the inverse of the information operator,
assuming that the inverse is defined.
Notation 7

R(B): the range of a previously defined linear operator B.


N(B): the null space of a previously defined linear operator B.
H for a set of elements in a Hilbert space L~(PFx,G) is defined as the clo-
sure of its linear span.
Pf == I f(y)dP(y).
£(X): all real-valued functions of X that are uniformly bounded on a set
that contains the true X with probability one.
1i: some index set indexing observed data estimating functions.
c(J.L) = dJdJ.LEIC(Y I Q,G,Dh('1 J.L,p)) (h = (h1, ... ,hk)): the derivative
matrix of the expected value of the observed data estimating function.

You might also like