0% found this document useful (0 votes)
5 views

Graham2009 Missing Values Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Graham2009 Missing Values Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

ANRV364-PS60-21 ARI 27 October 2008 16:22

ANNUAL
REVIEWS Further Missing Data Analysis:
Click here for quick links to
Annual Reviews content online,
including:
Making It Work
• Other articles in this volume
• Top cited articles
• Top downloaded articles
in the Real World
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

• Our comprehensive search


John W. Graham
by Pennsylvania State University on 08/18/10. For personal use only.

Department of Biobehavioral Health and the Prevention Research Center, The Pennsylvania
State University, University Park, Pennsylvania 16802; email: [email protected]

Annu. Rev. Psychol. 2009. 60:549–576 Key Words


First published online as a Review in Advance on multiple imputation, maximum likelihood, attrition, nonignorable
July 24, 2008
missingness, planned missingness
The Annual Review of Psychology is online at
psych.annualreviews.org Abstract
This article’s doi: This review presents a practical summary of the missing data literature,
10.1146/annurev.psych.58.110405.085530
including a sketch of missing data theory and descriptions of normal-
Copyright  c 2009 by Annual Reviews. model multiple imputation (MI) and maximum likelihood methods.
All rights reserved
Practical missing data analysis issues are discussed, most notably the
0066-4308/09/0110-0549$20.00 inclusion of auxiliary variables for improving power and reducing bias.
Solutions are given for missing data challenges such as handling longitu-
dinal, categorical, and clustered data with normal-model MI; including
interactions in the missing data model; and handling large numbers of
variables. The discussion of attrition and nonignorable missingness em-
phasizes the need for longitudinal diagnostics and for reducing the un-
certainty about the missing data mechanism under attrition. Strategies
suggested for reducing attrition bias include using auxiliary variables,
collecting follow-up data on a sample of those initially missing, and
collecting data on intent to drop out. Suggestions are given for moving
forward with research on missing data and attrition.

549
ANRV364-PS60-21 ARI 27 October 2008 16:22

Contents
INTRODUCTION Longitudinal Data and Special
AND OVERVIEW . . . . . . . . . . . . . . . . 550 Longitudinal Missing
Goals of This Review . . . . . . . . . . . . . . 551 Data Models . . . . . . . . . . . . . . . . . . . . 562
What’s to Come . . . . . . . . . . . . . . . . . . . 551 Categorical Missing Data
MISSING DATA THEORY . . . . . . . . . . 552 and Normal-Model MI . . . . . . . . . . 562
Causes or Mechanisms Normal-Model MI
of Missingness . . . . . . . . . . . . . . . . . . 552 with Clustered Data . . . . . . . . . . . . . 564
Old Analyses: A Brief Summary . . . . . 553 Large Numbers of Variables . . . . . . . . 564
“Modern” Missing Data Practicalities of Measurement:
Analysis Methods . . . . . . . . . . . . . . . 555 Planned Missing Data Designs . . 565
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Dispelling Myths About MAR ATTRITION AND MNAR


Missing Data Methods . . . . . . . . . . 559 MISSINGNESS . . . . . . . . . . . . . . . . . . . 566
PRACTICAL ISSUES: MAKING Some Clarifications About Missing
by Pennsylvania State University on 08/18/10. For personal use only.

MISSING DATA ANALYSIS Data Mechanisms . . . . . . . . . . . . . . . 567


WORK IN THE Measuring the Biasing Effects
REAL WORLD . . . . . . . . . . . . . . . . . . . 560 of Attrition . . . . . . . . . . . . . . . . . . . . . 568
Inclusive versus Restrictive Variable Missing Data Diagnostics . . . . . . . . . . 569
Inclusion Strategies Nonignorable (MNAR) Methods . . . 570
(MI versus FIML) . . . . . . . . . . . . . . . 560 Strategies for Reducing the Biasing
Small Sample Sizes . . . . . . . . . . . . . . . . . 560 Effects of Attrition . . . . . . . . . . . . . . 570
Rounding. . . . . . . . . . . . . . . . . . . . . . . . . . 561 Suggestions for Reporting About
Number of Imputations Attrition in an Empirical Study . . 571
in MI . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Suggestions for Conduct
Making EM (and MI) Perform and Reporting of Simulation
Better (i.e., Faster) . . . . . . . . . . . . . . 561 Studies on Attrition . . . . . . . . . . . . . 571
Including Interactions in the Missing More Research is Needed . . . . . . . . . . 572
Data Model. . . . . . . . . . . . . . . . . . . . . 562 SUMMARY AND CONCLUSIONS. . 573

INTRODUCTION veys that were missing for some waves of a mul-


AND OVERVIEW tiwave measurement project.
Problems brought about by missing data be-
Missing data have challenged researchers since
gan to be addressed in an important way start-
the beginnings of field research. The chal-
ing in 1987, although a few highly influential
lenge has been particularly acute for longitudi-
articles did appear before then (e.g., Dempster
nal research, that is, research involving multiple
et al. 1977, Heckman 1979, Rubin 1976). What
waves of measurement on the same individu-
happened in 1987 was nothing short of a rev-
als. The main issue is that the analytic pro-
olution in thinking about analysis of missing
cedures researchers use, many of which were
data. The revolution began with two major
developed early in the twentieth century, were
books that were published that year. Little &
designed to have complete data. Until relatively
Rubin (1987) published their classic book, Sta-
recently, there was simply no mechanism for
tistical Analysis with Missing Data (the second
handling the responses that were sometimes
edition was published in 2002). Also, Rubin
missing within a particular survey, or whole sur-

550 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

(1987) published his book, Multiple Imputation we were 25 years ago. Newer procedures will
for Nonresponse in Surveys. These two books, continually fine-tune the existing MI and ML
coupled with the advent of powerful personal procedures, but the main missing data solutions
computing, would lay the groundwork for miss- are already available and should be used now.
ing data software to be developed over the next Above all, my goal is that this review will
20 years and beyond. Also published in 1987 be of practical value. I hope that my words will
were two articles describing the first truly acces- facilitate the use of MI and ML missing data
sible method for dealing with missing data using methods. This is not intended to be a thorough
existing structural equation modeling (SEM) review of all work and methods relating to miss-
software (Allison 1987, Muthén et al. 1987). Fi- ing data. I have focused on what I believe to be
nally, Tanner & Wong (1987) published their most useful.
article on data augmentation, which would be-
come a cornerstone of the multiple imputation
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

(MI) software that would be developed a decade What’s to Come


later. In the following sections, I discuss three major
by Pennsylvania State University on 08/18/10. For personal use only.

missing data topics: missing data theory, anal-


ysis in practice, and attrition and missingness
Goals of This Review that is not missing at random. In the first ma-
A major goal of this review is to present ideas jor section, I lay out the main tenets of what
and strategies that will make missing data anal- I refer to as “missing data theory.” One central
yses useful to researchers. My aim here is to focus in this section is the causes or mechanisms
encourage researchers to use the missing data of missingness. In this section, I discuss what I
procedures that are already known to be good refer to as the “old” methods for dealing with
ones. Efforts toward this goal involve summa- missing data, but as much as possible, my dis-
rizing the major research in missing data anal- cussion is limited to methods that remain use-
ysis over the past several years. However, much ful at least in some circumstances. This section
of the reluctance to adopt these procedures is briefly presents the methods I fully endorse: MI
related to the myths and misconceptions that and ML.
continue to abound about the impact of missing In the second major section, I focus on the
data with and without using these procedures. practical side of performing missing data anal-
Thus, a goal of this review is to clear up many yses. Over the years, I have faced all of these
of the myths and misconceptions surrounding problems as a data analyst; these are real solu-
missing data and analysis with missing data. tions. Sometimes the solutions are a bit ad hoc.
Work is required to become a practiced Better solutions may become available in the
user of the acceptable (i.e., MI and maximum- future, but the solutions I present are known
likelihood, or ML) procedures. But that work to have minimal harmful impact on statistical
would be a lot less onerous if one had con- inference, and they will keep you doing analy-
fidence that learning these procedures would sis, which is the most important thing. In this
truly make one’s work better and that criticisms section, I also touch on the developing area of
surrounding missing data would be materially planned missingness designs, an area that opens
reduced. up new design possibilities for researchers who
Researchers should use MI and ML proce- are already making use of the recommended MI
dures (see Schafer & Graham 2002). They are and ML missing data procedures. Contrary to
good procedures that are based on strong statis- the old adage that the best solution to miss-
tical traditions. They can certainly be improved ing data is not to have them, there are times
on, but by how much? I would argue that using when building missing data into the overall
MI and ML procedures gets us at least 90% of measurement design is the best use of limited
the way to the hypothetical ideal from where resources.

www.annualreviews.org • Missing Data Analysis 551


ANRV364-PS60-21 ARI 27 October 2008 16:22

In the final major section, I describe the area Table 1 Hypothetical patterns of missingness
of attrition and missingness that is not missing Variable
at random. This kind of missingness has proven X Y1 Y2 N
to be a major obstacle, especially in longitudi- 1 1 1 65
nal and intervention research. A good bit of the 1 1 0 20
problem in this area stems from the fact that the
1 0 0 10
framework for thinking about these issues was
1 0 1 5
developed and solidified well before the miss-
ing data revolution. In this section, I propose 1 = value present; 0 = value missing.
a different framework for thinking about attri-
tion and make several suggestions (pleas?) as to ing completely at random (MCAR), missing
how researchers might proceed in this area. at random (MAR), and missing not at random
(MNAR). Although statisticians prefer not to
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

use the word “cause,” they do often use the


MISSING DATA THEORY words “due to” or “depends on” in this context.
by Pennsylvania State University on 08/18/10. For personal use only.

With MAR, the missingness (i.e., whether


Causes or Mechanisms of Missingness
the data are missing or not) may depend on
Statisticians talk about missingness mecha- observed data, but not on unobserved data
nisms. But what they mean by that term differs (Schafer & Graham 202).
from what social and behavioral scientists think MCAR is a special case of MAR in which
of as mechanisms. When I (trained as an ex- missingness does not depend on the observed
perimental social psychologist) use that word, I data either (Schafer & Graham 2002).
think of causal mechanisms. What is the reason With MNAR, missingness does depend on
the data are missing? Statisticians, on the other unobserved data.
hand, often are thinking more along the lines of
a description of the missingness. For example, More about MAR, MCAR, and MNAR. Al-
it is not uncommon to talk about a vector R for though these three important terms do have
each variable, which takes on the value “1” if specific statistical definitions, their practical
the variable has data for that case, and “0” if the meaning is often elusive. MCAR is perhaps the
value is missing for that case. This leads natu- easiest to understand. If the cases for which the
rally to descriptions of the missing data, that is, data are missing can be thought of as a random
patterns of missingness. For example, suppose sample of all the cases, then the missingness is
that one has three variables (X, Y1 , and Y2 ), and MCAR. This means that everything one might
suppose that X is never missing but Y1 is miss- want to know about the data set as a whole can
ing for some individuals, and Y2 is missing for a be estimated from any of the missing data pat-
few more. Or, thinking about it the other way, terns, including the pattern in which data exist
suppose one has data for all three variables for for all variables, that is, for complete cases (e.g.,
some number of cases, but partial data (X and see the top row in Table 1).
Y1 ) for some number of cases and partial data (X Another aspect of MCAR that is particularly
only) for some other number of cases. The pat- easy to understand for psychologists is that the
terns of missingness for a hypothetical N = 100 word “random” in MCAR means what psychol-
cases might look like those shown in Table 1. ogists generally think of when they use the term.
Also, as shown in Table 1, it not uncommon The word “random” in MAR, however, means
for a small number of cases to be present at one something rather different from what psychol-
wave, missing at a later wave, and then give data ogists typically think of as random. In fact,
at a still later wave. the randomness in MAR missingness means
When people talk about the mechanisms that once one has conditioned on (e.g., con-
of missingness, three terms come up: miss- trolled for) all the data one has, any remaining

552 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

missingness is completely random (i.e., it does Many researchers have suggested modify-
not depend on some unobserved variable). Be- ing the names of the missing data mechanisms
cause of this, I often say that a more precise in order to have labels that are a bit closer to
term for missing at random would be condi- regular language usage. However, missing data
tionally missing at random. However, if such a theorists believe that these mechanism names
term were in common use, its acronym (CMAR) should remain as is. I agree. I now believe we
would often be confused with MCAR. Thus, my would be doing psychologists a disservice if
feeling about this is that psychologists should we encouraged them to abandon these terms,
continue to refer to MCAR and MAR but sim- which are so well entrenched in the statistics
ply understand that the latter term refers to con- literature. Rather, we should continue to use
ditionally missing at random. these terms (MCAR, MAR, and MNAR), but
Another distinction that is often used with always define them very carefully using regular
missingness is the distinction between ignor- language.
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

able and nonignorable missingness. Without


going into detail here, suffice it to say that Consequences of MCAR, MAR, and
by Pennsylvania State University on 08/18/10. For personal use only.

ignorable missingness applies to MCAR and MNAR. The main consequence of MCAR
MAR, whereas nonignorable missingness is of- missingness is loss of statistical power. The
ten used synonymously with MNAR. good thing about MCAR is that analyses yield
An important wrinkle with these terms, es- unbiased parameter estimates (i.e., estimates
pecially MAR and MNAR (or ignorable and that are close to population values). MAR
nonignorable), is that they do not apply just to missingness (i.e., when the cause of missing-
the data. Rather they apply jointly to the data ness is taken into account) also yields unbiased
and to the analysis that is being used. For exam- parameter estimates. The reason MNAR
ple, suppose one develops a smoking prevention missingness is considered a problem is that it
intervention and has a treatment group and a yields biased parameter estimates (discussed at
control group (represented by a dummy vari- length below).
able called Program). Suppose that one mea-
sures smoking status at time 2, one year after
implementation of the prevention intervention Old Analyses: A Brief Summary
(Smoking2 ). Finally, suppose that some people This summary is not intended to be a thorough
have missing data for Smoking2 , and that miss- examination of the “old” approaches for dealing
ingness on Smoking2 depends on smoking sta- with missing data. Rather, in order to be of most
tus measured at time 1, just before the program practical value, the discussion below focuses on
implementation (Smoking1 ). If one includes the old approaches that can still be useful, at
Smoking1 in one of the acceptable missing data least under some circumstances.
procedures (MI or ML), then the missingness
on Smoking2 is conditioned on Smoking1 and Yardsticks for evaluating methods. I have
is thus MAR (note that even with complete case judged the various methods (old and new) by
analysis, the regression analysis of Program pre- three means. First, the method should yield un-
dicting Smoking2 is MAR as long as Smoking1 biased parameter estimates over a wide range
is also included in the model; e.g., see Graham of parameters. That is, the parameter estimate
& Donaldson 1993). However, if the re- should be close to the population value for
searcher tested a model in which the Program that parameter. Some of the methods I would
alone predicted Smoking2 , then the missingness judge to be unacceptable (e.g., mean substitu-
would become MNAR because the researcher tion) may yield a mean for a particular variable
failed to condition on Smoking1 , the cause of that is close to the true parameter value (e.g.,
missingness. under MCAR), but other parameters using this

www.annualreviews.org • Missing Data Analysis 553


ANRV364-PS60-21 ARI 27 October 2008 16:22

method can be seriously biased. Second, there to get used to doing the same. However, if a
should be a method for assessing the degree of researcher chose to stay with listwise deletion
uncertainty about parameter estimates. That is, under these special circumstances, I believe it
one should be able to obtain reasonable esti- would be unreasonable for a critic to argue that
mates of the standard error or confidence inter- it was a bad idea to do so. It is also important
vals. Third, once bias and standard errors have that standard errors based on listwise deletion
been dealt with, the method should have good are meaningful.
statistical power.
Pairwise deletion. Pairwise deletion is usually
Complete cases analysis (AKA, listwise used in conjunction with a correlation matrix.
deletion). This approach can be very useful Each correlation is estimated based on the cases
even today. One concern about listwise deletion having data for both variables. The issue with
is that it may yield biased parameter estimates pairwise deletion is that different correlations
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

(e.g., see Wothke 2000). For example, groups (and variance estimates) are based on differ-
with complete data, especially in a longitudi- ent subsets of cases. Because of this, it is possi-
by Pennsylvania State University on 08/18/10. For personal use only.

nal study, often are quite different from those ble that parameter estimates based on pairwise
that have missing data. Nevertheless, the dif- deletion will be biased. However, in my experi-
ference in those two groups often is embodied ence, these biases tend to be small in empirical
completely in the pretest variables (for which data. On the other hand, because different cor-
everyone has data). Thus, as long as those vari- relations are based on different subsets of cases,
ables can reasonably be included in the model as there is no guarantee that the matrix will be
covariates, the bias is often minimal, even with positive definite (see sidebar Positive Definite).
listwise deletion, especially for multiple regres- Nonpositive definite matrices cannot be used
sion models (e.g., see Graham & Donaldson for most multivariate statistical analyses. A big-
1993). ger concern with pairwise deletion is that there
However, there will always be some loss of is no basis for estimating standard errors.
power with listwise deletion because of the un- Because of all these problems, I cannot rec-
used partial data. And in some instances, this ommend pairwise deletion as a general solu-
loss of power can be huge, making this method tion. However, I do still use pairwise deletion
an undesirable option. Still, if the loss of cases in one specific instance. When I am conduct-
due to missing data is small (e.g., less than about ing preliminary exploratory factor analysis with
5%), biases and loss of power are both likely a large number of variables, and publication
to be inconsequential. I, personally, would still of the factor analysis results, per se, is not my
use one of the missing data approaches even goal, I sometimes find it useful to conduct this
with just 5% missing cases, and I encourage you analysis with pairwise deletion. As a prelimi-
nary analysis for conducting missing data analy-
sis, I sometimes examine the preliminary eigen-
POSITIVE DEFINITE values from principal components analysis (see
sidebar Eigenvalue). If the last eigenvalue is
One good way to think of a matrix that is not positive definite positive, then the matrix is positive definite.
is that the matrix contains less information than implied by the Many failures of the expectation-maximization
number of variables in the matrix. For example, if a matrix con- (better known as EM) algorithm (and MI) are
tained the correlations of three variables (A, B, and C) and their due to the correlation matrix not being positive
sum, then there would be just three variables worth of informa- definite.
tion, even though it contained four variables. Because the sum is
perfectly predicted by the three variables, it adds no new infor- Other “old” methods. Other old methods in-
mation, and the matrix would not be positive definite. clude mean substitution, which I do not recom-
mend. In the “Modern” Missing Data Analysis

554 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

Methods section, I describe a method based on


the EM algorithm that is much preferred over EIGENVALUE
mean substitution (see the sections on Good
Uses of the EM Algorithm and Imputing a Sin- Eigenvalues are part of the decomposition of a correlation ma-
gle Data Set from EM Parameters). A second trix during factor analysis or principal components analysis. Each
method that has been described in the litera- eigenvalue represents the variance of the linear combination of
ture involves using a missingness dummy vari- items making up that factor. In principal components, the total
able in addition to the specially coded miss- variance for the correlation matrix is the number of items in the
ing value. This approach has been discredited matrix. If the matrix is positive definite, the last eigenvalue will
and should not be used (e.g., see Allison 2002). be positive, that is, it will have variance. However, if the matrix
Finally, regression-based single imputation has is not positive definite, one or more of the eigenvalues will be 0,
been employed in the past. Although the con- implying that those factors have no variance, or that they add no
cept is a sound one and is the basis for many new information over and above the other factors.
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

of the modern procedures, this method is not


recommended in general.
based single imputation with all other variables
by Pennsylvania State University on 08/18/10. For personal use only.

“Modern” Missing Data in the model used as predictors. For the sums,
Analysis Methods the best guess value is used as is. For sums
The “modern” missing data procedures I con- of squares and sums of cross-products, if just
sider here are (a) the EM algorithm, (b) multiple one value is missing, then the quantity is in-
imputation under the normal model, and (c) ML cremented directly. However, if both values are
methods, often referred to as full-information missing, then the quantity is incremented, and
maximum likelihood (FIML) methods. a correction factor is added. This correction
is conceptually equivalent to adding a random
EM algorithm. It is actually a misnomer to re- residual error term in MI (described below).
fer to this as “the” EM algorithm because there In the M-step of the same iteration, the pa-
are different EM algorithms for different ap- rameters (variances, covariances, and means)
plications. Each version of the EM algorithm are estimated (calculated) based on the current
reads in the raw data and reads out a different values of the sums, sums of squares, and sums
product, depending on the application. I de- of cross-products. Based on the covariance ma-
scribe here the EM algorithm that reads in the trix at this iteration, new regression equations
raw data, with missing values, and reads out an are calculated for each variable predicted by
ML variance-covariance matrix and vector of all others. These regression equations are then
means. Definitive technical treatments of vari- used to update the best guess for missing values
ous EM algorithms are given in Little & Rubin during the E-step of the next iteration. This
(1987, 2002) and Schafer (1997). Graham and two-step process continues until the elements
colleagues provide less-technical descriptions of the covariance matrix stop changing. When
of the workings of the EM algorithm for co- the changes from iteration to iteration are so
variance matrices (Graham & Donaldson 1993; small that they are judged to be trivial, EM is
Graham et al. 1994, 1996, 1997, 2003). said to have converged.
In brief, the EM algorithm is an iterative Being ML, the parameter estimates (means,
procedure that produces maximum likelihood variances, and covariances) from the EM algo-
estimates. For the E-step at one iteration, cases rithm are excellent. However, the EM algo-
are read in, one by one. If a value is present, rithm does not provide standard errors as an
the sums, sums of squares, and sums of cross- automatic part of the process. One could ob-
products are incremented. If the value is miss- tain an estimate of these standard errors using
ing, the current best guess for that value is used bootstrap procedures (e.g., see Graham et al.
instead. The best guess is based on regression- 1997). Although bootstrap procedures (Efron

www.annualreviews.org • Missing Data Analysis 555


ANRV364-PS60-21 ARI 27 October 2008 16:22

1982) are often criticized, they can be quite use- the procedure in SPSS for writing out a single
ful in this context as a means of dealing with imputed data set based on the EM algorithm is
nonnormal data distributions. not recommended unless random error resid-
uals are added after the fact to each imputed
Good uses of the EM algorithm. Although value; the current implementation of SPSS, up
the EM algorithm provides excellent parame- to version 16 at least, writes data out without
ter estimates, the lack of convenient standard adding error (e.g., see von Hippel 2004). This
errors means that EM is not particularly good is known to produce important biases in the data
for hypothesis testing. On the other hand, sev- set (Graham et al. 1996).
eral important analyses, often preliminary anal-
yses, don’t use standard errors anyway, so the Implementations of the EM algorithm.
EM estimates are very useful. First, it is of- Good implementations of the EM algorithm
ten desirable to report means, standard devi- for covariance matrices are widely available.
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

ations, and sometimes a correlation matrix in SAS Proc MI estimates the EM covariance ma-
one’s paper. I would argue that the best esti- trix as a by-product of its MI analysis (SAS
by Pennsylvania State University on 08/18/10. For personal use only.

mates for these quantities are the ML estimates Institute). Schafer’s (1997) NORM program, a
provided by EM. Second, data quality analy- stand-alone Microsoft Windows program, also
ses, for example, coefficient alpha analyses, be- estimates the EM covariance matrix as a step in
cause they typically do not involve standard er- the MI process. Graham et al. (2003) have de-
rors, can easily be based on the EM covariance scribed utilities for making use of that covari-
matrix (e.g., see Enders 2003; Graham et al. ance matrix. Graham & Hofer (1992) have cre-
2002, 2003). The EM covariance matrix is also ated a stand-alone DOS-based EM algorithm,
an excellent basis for exploratory factor analysis EMCOV, which can be useful in simulations.
with missing data. This is especially easy with
the SAS/STAT® software program (SAS Insti- Multiple imputation under the normal
tute); one simply includes the relevant variables model. I describe in this section MI under the
in Proc MI, asking for the EM matrix to be out- normal model as it is implemented in Schafer’s
put. That matrix may then be used as input for (1997) NORM program. MI as implemented
Proc Factor using the “type = cov” option. in SAS Proc MI is also based on Schafer’s
Although direct analysis of the EM covari- (1997) algorithms, and thus is the same kind
ance matrix can be useful, a more widely useful of program as NORM. Detailed, step-by-step
EM tool is to impute a single data set from EM instructions for running NORM are available
parameters (with random error). This proce- in Graham et al. (2003; also see Graham &
dure has been described in detail in Graham Hofer 2000, Schafer 1999, Schafer & Olsen
et al. (2003). This single imputed data set is 1998). Note that Schafer’s NORM program is
known to yield good parameter estimates, close also available as part of the Splus missing data
to the population average. But more impor- library (http://www.insightful.com/).
tantly, because it is a complete data set, it may be The key to any MI program is to restore the
read in using virtually any software, including error variance lost from regression-based single
SPSS. Once read into the software, coefficient imputation. Imputed values from single impu-
alpha and exploratory factor analyses may be tation always lie right on the regression line.
carried out in the usual way. One caution is that But real data always deviate from the regres-
this data set should not be used for hypothe- sion line by some amount. In order to restore
sis testing. Standard errors based on this data this lost variance, the first part of imputation is
set, say from a multiple regression analysis, will to add random error variance (random normal
be too small, sometimes to a substantial extent. error in this case). The second part of restor-
Hypothesis testing should be carried out with ing lost variance relates to the fact that each
MI or one of the FIML procedures. Note that imputed value is based on a single regression

556 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

equation, because the regression equation, and and Proc MI to verify that the number of DA
the underlying covariance matrix, is based on a steps selected was good enough.
single draw from the population of interest.
In order to adjust the lost error completely, Implementations of MI under the normal
one should obtain multiple random draws from model. Implementations of MI under the nor-
the population and impute multiple times, each mal model are also widely available. Schafer’s
with a different random draw from the popu- (1997) NORM software is a free program
lation. Of course, this is almost never possible; (see http://methodology.psu.edu/ for the free
researchers have just a single sample. One op- download). SAS Proc MI (especially version 9,
tion might be to simulate random draws from but to a large extent version 8.2; SAS Insti-
the population by using bootstrap procedures tute) provides essentially the same features as
(Efron 1982). Another approach is to simulate NORM. For analyses conducted in SAS, Proc
random draws from the population using data MI is best. Other implementations of MI are
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

augmentation (DA; Tanner & Wong 1987). not guaranteed to be as robust as are those based
The key to Schafer’s NORM program is DA. on DA or other Markov-Chain Monte Carlo
by Pennsylvania State University on 08/18/10. For personal use only.

NORM first runs EM to obtain starting val- routines, although such programs may be use-
ues for DA. DA can be thought of as a kind ful under specific circumstances. For example,
of stochastic (probabilistic) version of EM. It, Amelia II (see Honaker et al. 2007, King et al.
too, is an iterative, two-step process. There is 2001) and IVEware (Raghunathan 2004) are
an imputation step during which DA simulates two MI programs that merit a look. See Horton
the missing data based on the current parame- & Kleinman (2007) for a recent review of MI
ter estimates, and a posterior step during which software.
DA simulates the parameters given the current
(imputed) data. DA is a member of the Markov- Special MI software for categorical, longi-
Chain Monte Carlo family of algorithms. It is tudinal/cluster, and semi-continuous data.
Markov-like in the sense that all of the infor- In this category are Schafer’s (1997) CAT pro-
mation from one step of DA is contained in the gram (for categorical data) and MIX program
previous step. Because of this, the parameter es- (for mixed continuous and categorical prob-
timates and imputed values from two adjacent lems). Both of these are available (along with
steps of DA are more similar than one would NORM) as special commands in the latest ver-
expect from two random draws from the popu- sion of Splus. Although CAT can certainly be
lation. However, after, say, 50 steps of DA, the used to handle imputation with categorical data,
parameter estimates and imputed values from it presents the user with some limitations. Most
the initial step and those 50 steps removed are importantly, the default in CAT involves what
much more like two random draws from the amounts to the main effects and all possible in-
population. The trick is to determine how many teractions. Thus, even with a few variables, the
steps are required before the two imputed data default model can involve a huge number of pa-
sets are sufficiently similar to two random draws rameters. For example, with just five input vari-
from the population. Detailed guidance for this ables, CAT estimates parameters for 31 vari-
process is given in Graham et al. (2003). In gen- ables (five main effects, ten 2-way interactions,
eral, the number of iterations it takes EM to ten 3-way interactions, five 4-way interactions,
converge is an excellent estimate of the num- and one 5-way interaction).
ber of steps there should be between imputed Also included in this category is the PAN
data sets from DA (this rule applies best to the program (for special panel and cluster-data de-
NORM program; different MI programs have signs, see Schafer 2001, Schafer & Yucel 2002).
different convergence criteria, and this rule may PAN was created for the situation in which a
be slightly different with those MI programs). variable, Posatt (beliefs about the positive social
In addition, diagnostics are available in NORM consequences of alcohol use), was measured in

www.annualreviews.org • Missing Data Analysis 557


ANRV364-PS60-21 ARI 27 October 2008 16:22

grades 5, 6, 7, 9, and 10 of a longitudinal study, ing data feature is limited. At present, the fea-
but was omitted for all subjects in grade 8. Other ture is most common in SEM software (in al-
variables (e.g., alcohol use), however, were mea- phabetical order, Amos: Arbuckle & Wothke
sured in all six grades. Because no subject had 1999; LISREL: Jöreskog & Sörbom 1996, also
data for Posatt measured in eighth grade, MI see du Toit & du Toit 2001; Mplus: Muthén,
under the normal model could not be used to & Muthén 2007; and Mx: Neale et al. 1999).
impute that variable. However, because PAN Although each of these programs was written
also takes into account growth (change) over specifically for SEM applications, they can be
time, Posatt 8 can be imputed with PAN. PAN used for virtually any analysis that falls within
is also very good for imputing clustered data the general linear model, most notably multiple
(e.g., students within schools) where the num- regression. For a review of FIML SEM meth-
ber of clusters is large. Although the potential ods, see Enders (2001a).
for the PAN program is huge, its availability Other FIML (or largely FIML) software for
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

remains limited. latent class analysis includes Proc LTA (e.g.,


Olsen & Schafer (2001) have described mul- Lanza et al. 2005; also see http://methodology.
by Pennsylvania State University on 08/18/10. For personal use only.

tiple imputation for semi-continuous data, es- psu.edu) and Mplus (Muthén & Muthén 2007).
pecially in the growth modeling context. Semi-
continuous data come from variables that have Other “older” methods. One other method
many responses at one value (e.g., 0) and are deserves special mention in this context. Al-
more or less normally distributed for values though SEM analysis with missing data is cur-
greater than 0. rently handled almost exclusively by SEM/
FIML methods (see previous section), an older
Imputing a single data set from EM method involving the multiple group capabili-
parameters. An often useful alternative to an- ties of SEM programs is very useful for some
alyzing the EM covariance matrix directly is to applications. This approach was described ini-
impute a single data set based on EM param- tially by Allison (1987) and Muthén et al. (1987).
eters (+random error, an option available in Among other things, this method has proven
Schafer’s NORM program; see Graham et al. to be extremely useful with simulations involv-
2003 for details). With data sets imputed using ing missing data (e.g., see Graham et al. 2001,
data augmentation, parameter estimates can be 2006).
anywhere in legitimate parameter space. How- This method also continues to be useful for
ever, when the single imputation (+error) is measurement designs described as “accelerated
based on the EM covariance matrix, all param- longitudinal” or “cohort sequential” (see
eter estimates are near the center of the param- Duncan & Duncan 1994; Duncan et al. 1994,
eter space. For this reason, if one analyzes just 1996; McArdle 1994; McArdle & Hamagami
one imputed data set, it should be this one. This 1991, 1992). With these designs, one collects
data set is very useful for analyses that do not re- data for two or more sets of participants of
quire hypothesis testing, such as coefficient al- different ages over, say, three consecutive years.
pha analysis and exploratory factor analysis (see For example, one group is 10, 11, and 12 years
Graham et al. 2003 for additional details). old over the three years of a study, and another
group is 11, 12, and 13 years old over the same
FIML methods. FIML methods deal with the three study years. Because no participants
missing data, do parameter estimation, and es- have data for both ages 10 and 13, regular
timate standard errors all in a single step. This FIML-based SEM software and normal-model
means that the regular, complete-cases algo- MI cannot be used in this context. However,
rithms must be completely rewritten to han- the multiple-group SEM approach may be
dle missing data. Because this task is somewhat used to test a growth model covering growth
daunting, software written with the FIML miss- over all four ages.

558 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

Dispelling Myths About MAR Missing model. The fear is that including the DV in the
Data Methods imputation model might lead to bias in estimat-
ing the important relationships (e.g., the regres-
Myths abound regarding missing data and anal-
sion coefficient of a program variable predicting
ysis with missing data. Many of these myths
the DV). However, the opposite actually hap-
originated with thinking that was developed
pens. When the DV is included in the model, all
well before the missing data revolution. Parts of
relevant parameter estimates are unbiased, but
that earlier thinking, of course, remain an im-
excluding the DV from the imputation model
portant element of modern psychological sci-
for the IVs and covariates can be shown to pro-
ence. But the parts relating to missing data need
duce biased estimates. The problem with leav-
to be revised. I address three of the most com-
ing the DV out of the imputation model is this:
mon myths in this section. Other myths are
When any variable is omitted from the model,
dealt with in the sections that follow.
imputation is carried out under the assumption
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

that the correlation is r = 0 between the omit-


Imputation is making up the data. It is true ted variable and variables included in the impu-
that imputation is the process of plugging in
by Pennsylvania State University on 08/18/10. For personal use only.

tation model. Thus, when the DV is omitted,


plausible values where none exist. But the point the correlations between it and the IVs (and
of this process is not to obtain the individual covariates) included in the model are all sup-
values themselves. Rather, the point is to plug pressed (i.e., biased) toward 0.
in these values (multiple times) in order to pre-
serve important characteristics of the data set as MAR methods don’t work if the MAR as-
a whole. By “preserve,” I mean that parameter sumption does not hold (AKA, complete
estimates should be unbiased. That is, the es- cases are preferred if MAR does not hold).
timated mean, for example, should be close to With some procedures, such as multiple linear
the true population value for the mean; the es- regression, it is assumed that the data are mul-
timated variance should be close to the true tivariate normal. Violation of this assumption
population value for the variance. In this re- is known to affect the results (most notably the
view, I talk mainly about multiple imputation standard errors of the regression coefficients).
under the normal model. Normal-model MI So with multiple regression, if the normality
“preserves” means, variances, covariances, cor- assumption has been violated, one should use
relations, and linear regression coefficients. a different procedure. This logic makes good
sense with multiple regression analysis, but it
You are unfairly helping yourself by imput- does not apply to analysis with missing data be-
ing (AKA, it is okay to impute the indepen- cause with multiple regression, when the nor-
dent variable, but not the dependent vari- mality assumption is violated, other common
able). There are several versions of this myth. procedures work better. But with missing data,
In the past, some researchers were convinced when the MAR assumption has been violated,
that imputation procedures such as normal- the violation affects the old procedures (e.g.,
model MI were fine for imputing missing data listwise deletion) as well, and typically this vio-
that might occur within the set of independent lation has greater effect on the old procedures.
variables (IVs) (and covariates) of a study. How- In short, MI and ML methods are always at
ever, these researchers were very reluctant to least as good as the old procedures (e.g., list-
include the dependent variable (DV) in the MI wise deletion, except in artificial, unrealistic cir-
model when it, too, included missing values. cumstances), and MI/ML methods are typically
They felt that it was somehow unfair to impute better than old methods, and often very much
the DV. better.
The truth is that all variables in the analy- An important difference between MI/ML
sis model must be included in the imputation methods and complete cases analysis is that

www.annualreviews.org • Missing Data Analysis 559


ANRV364-PS60-21 ARI 27 October 2008 16:22

auxiliary variables (see next section) may be used involving the imputed data benefit from the
with MI/ML in order to reduce the impact of auxiliary variables, whether or not those vari-
MNAR missingness. However, there is no good ables appear in the analysis of substantive inter-
way of incorporating auxiliary variables into a est (this latter benefit also applies to analysis
complete cases model unless they can reason- of the EM covariance matrix). On the other
ably be incorporated (e.g., as covariates) into hand, FIML analyses have typically included
the model of substantive interest. only the variables that are part of the model of
substantive interest. Thus, researchers who use
FIML models have found it difficult to incor-
PRACTICAL ISSUES: MAKING porate auxiliary variables in a reasonable way.
MISSING DATA ANALYSIS WORK Fortunately, reasonable approaches for includ-
IN THE REAL WORLD ing auxiliary variables into SEM/FIML mod-
The suggestions given here are designed to els now exist (e.g., see Graham 2003; also see
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

make missing data analyses useful in real-world the recently introduced feature in Mplus for
data situations. Some of the suggestions given easing the process of including auxiliary vari-
by Pennsylvania State University on 08/18/10. For personal use only.

here are necessarily brief. Many other practical ables). Note that although these methods work
suggestions are given in Graham et al. (2003) well for SEM/FIML models, no corresponding
and elsewhere. strategies are available at present for incorpo-
rating auxiliary variables into latent class FIML
models.
Inclusive versus Restrictive Variable
Inclusion Strategies (MI versus FIML)
In some ways, this is the most important lesson Small Sample Sizes
that can be learned when doing missing data Graham & Schafer (1999) showed that MI per-
analysis in the real world. Collins et al. (2001) forms very well in small samples (as low as
discussed the differences between “inclusive” N = 50), even with very large multiple regres-
and “restrictive” variable inclusion strategies in sion models (as large as 18 predictors) and even
missing data analysis. An inclusive strategy is with as much as 50% missing data in the DV.
one in which auxiliary variables are included in The biggest issue with such small samples is not
the model. An auxiliary variable is a variable that the missingness, per se, but rather that one sim-
is not part of the model of substantive interest, ply does not have much data to begin with and
but is highly correlated with the variables in the missingness depletes one’s data even further. MI
substantive model. Collins et al. (2001) showed was shown to perform very well under these cir-
that including auxiliary variables in the missing cumstances; the analyses based on MI data were
data model can be very helpful in two impor- as good as the same analyses performed on com-
tant ways. It can reduce estimation bias due to plete data.
MNAR missingness, and it can partially restore The simulations performed by Graham &
lost power due to missingness. Schafer (1999) also showed that normal-model
Collins et al. (2001) note that the potential MI with nonnormal data works well, as well as
auxiliary variable benefit is the same for MI and analysis with the same data with no missing val-
FIML analyses but that the typical use of MI ues. Although analysis of imputed data works as
is different from typical use of FIML. For MI well as analysis with complete datasets, noth-
analyses, including auxiliary variables in the im- ing in the imputation process, per se, fixes the
putation model has long been practiced and is nonnormal data. Thus, in order to correct the
very easy to accomplish: Simply add the vari- problems with standard errors often found with
ables to the imputation model. Furthermore, nonnormal data, analysis procedures must be
once the auxiliary variables have been included used that give correct standard errors (e.g., the
in the imputation model, subsequent analyses correction given for SEM by Satorra & Bentler

560 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

1994). Enders (2001b) has drawn similar con- tion to guarantee less than a 1% power falloff
clusions regarding the use of the FIML missing compared to the comparable FIML analysis.
data feature for SEM programs.

Making EM (and MI) Perform


Rounding Better (i.e., Faster)
Rounding should be kept to a minimum. MI was Factors that affect the speed of MI are the same
designed to restore the lost variability found in as those that affect the speed of EM, so I focus
single imputation, and the MI strategy was de- here on the latter. EM involves matrix manip-
signed to yield the correct variability. Rounding ulations to a large extent, so sample size has
is tantamount to adding more variability to the relatively little effect. However, the number of
imputed values. The added variability is ran- variables (k) affects EM tremendously. Con-
dom, to be sure, but there is definitely more of sider that EM estimates [k (k+1)/2 + k] param-
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

it with rounding than without. This additional eters [k variances, (k(k−1)/2) covariances, and
variance is evident in coefficient alpha analyses k means]. That means that with 25, 50, 100,
by Pennsylvania State University on 08/18/10. For personal use only.

with rounded and unrounded imputed values 150, and 200 variables, EM must estimate 350,
in a single data set imputed from EM parame- 1325, 5150, 11,475, and 20,300 parameters, re-
ters. Coefficient alpha is always a point or two spectively. Note that as the number of variables
lower (showing more random error variance) gets large, the number of estimated parame-
with rounding than without (also, see the Cat- ters in EM gets huge. Although there is leeway
egorical Missing Data and Normal-Model MI here, I generally try to keep the total number
section below for a discussion regarding round- of variables under 100 even with large sample
ing for categorical variables). sizes of N = 1000 or more. With smaller sam-
ple sizes, a smaller number of variables should
be used.
Number of Imputations in MI Also affecting the speed of EM and MI is
Missing data theorists have often claimed that the amount of missing information (similar to,
good inferences can be made with the num- but not the same as, the amount of missing
ber of imputed data sets (m) as few as m = data). More missing information means EM
3 to 5. They have argued that the relative ef- converges more slowly. Finally, the distribu-
ficiency of estimation is very high under these tions of the variables can affect speed of conver-
circumstances, compared to an infinite number gence. With highly skewed data, EM generally
of imputations. However, Graham et al. (2007) converges much more slowly. For this reason, it
have recently shown that the effects of m on is often a good idea to transform the data (e.g.,
statistical power for detecting a small effect size with a log transformation) prior to imputation.
(ρ = 0.10) can be strikingly different from The imputed values can be back-transformed
what is observed for relative efficiency. They (e.g., using the antilog) after imputation, if
showed that if statistical power is the main con- necessary.
sideration, the number of imputations typically If EM is very slow to converge, for exam-
must be much higher than previously thought. ple, if it takes more than about 200 iterations,
For example, with 50% missing information, the speed of convergence can generally be im-
Graham et al. (2007) showed that MI with proved. If EM converges in 200 iterations, then
m = 5 has a 13% power falloff compared to one should ask for 200 steps of data augmenta-
the equivalent FIML analysis; with 30% miss- tion between each imputed data set. With m =
ing information and m = 5, there was a 7% 40 imputed data sets, one would need to run
power falloff compared to FIML. Graham et al. 40 × 200 = 8000 steps of DA. If EM con-
(2007) recommend that at least m = 40 impu- verged in 1000 iterations, one would need to
tations are needed with 50% missing informa- run 40 × 1000 = 40,000 steps of DA. The

www.annualreviews.org • Missing Data Analysis 561


ANRV364-PS60-21 ARI 27 October 2008 16:22

additional time can be substantial, especially if able with a large sample. But if the sample is
the time between iterations is large. too small for this strategy, then including a few
carefully selected product terms may be the best
option.
Including Interactions in the Missing
Data Model
An issue that comes up frequently in missing Longitudinal Data and Special
data analysis has to do with omitting certain Longitudinal Missing Data Models
variables from the missing data model. This Missing data models have been created for han-
issue is sometimes referred to as being sure dling special longitudinal data sets (e.g., the
that the imputation model is at least as general PAN program; Schafer 2001). Some people be-
as the analysis model. A clear example is a test lieve that programs such as PAN must be used
of the effect of an interaction (e.g., the product) to impute longitudinal data, for example, in
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

of two variables on some third variable. Because connection with growth curve modeling (see
the product is a nonlinear combination of the “Modern” Missing Data Analysis Methods sec-
by Pennsylvania State University on 08/18/10. For personal use only.

two variables, it is not part of the regular linear tion above). However, this is not the case. It
imputation model. The problem with exclud- is easiest to see this by examining the various
ing such variables from the imputation model ways in which growth curve analyses can be
is that all imputation is done under the assump- performed. Special hierarchical linear model-
tion that the correlation is r = 0 between the ing programs (e.g., HLM; Raudenbush & Bryk
omitted variable and all other variables in the 2002) can be used for this purpose. However,
imputation model. In this case, the correlation standard SEM programs can also be used (e.g.,
between the interaction term and the DVs of see Willett & Sayer 1994). When analyses are
interest will be suppressed toward 0. The solu- conducted with these models under identical
tion is to anticipate any interaction terms and conditions (e.g., assuming homogeneity of er-
include the relevant product terms in the im- ror variances over time), the results of these two
putation model. procedures are identical.
Another way to conceive of an interaction The key for the present review is that
is to think of one of the variables as a group- a variance-covariance matrix and vector of
ing variable (e.g., gender). The interaction in means provide all that is needed for perform-
this case means that the correlation between ing growth modeling in SEM. Thus, any miss-
two variables is different for males and females. ing data procedure that preserves (i.e., estimates
In the typical imputation model, one imputes without bias) variances, covariances, and means
under the model that all correlations are the is acceptable. This is exactly what results from
same for females and males. A good way to im- the EM algorithm, and, asymptotically, with
pute under a model that allows these correla- normal-model MI. In summary, MI under the
tions to be different is to impute separately for normal model, or essentially equivalent SEM
males and females. The advantage of this ap- models with a FIML missing data feature, may
proach is that all interactions involving gender safely be used in conjunction with longitudinal
can be tested during analysis even if a specific in- data.
teraction was not anticipated beforehand. This
approach works very well in program effects
analyses. If the program and control groups are Categorical Missing Data
imputed separately, then it is possible to test and Normal-Model MI
any interaction involving the program dummy Although some researchers believe that missing
variable after the fact. One drawback to imput- categorical data requires special missing data
ing separately within groups is that it cuts the procedures for categorical data, this is not true
sample size at least in half. This may be accept- in general. The proportion of people giving the

562 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

“1” response for a two-level categorical variable is possible that a missing value for two (or more)
coded “1” and “0” is the same as the mean for of the dummy variables could be imputed as a 1
that variable. Thus, the important characteris- after rounding. If any people have 1 values for
tics of this variable are preserved, even using more than one of these dummy variables, then
normal-model MI. If the binary variable (e.g., the meaning of the dummy variables is changed.
gender) is to be used as a covariate in a regres- If the number of “illegal” imputed values in
sion analysis, then the imputed values should this situation is small compared with the overall
be used, as is, without rounding (see Rounding sample size, then one could simply leave them.
section above). If the binary variable must be However, a clever, ad hoc fix for the problem
used in analysis as a binary variable, then each has been suggested by Paul Allison (2002). To
imputed value should be rounded to the near- employ Allison’s fix, it is important to impute
est observed value (0 or 1). There are variations without rounding. Whenever there is an illegal
on this rounding procedure (e.g., see Bernaards pattern of imputed values (more than a single
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

et al. 2007), but the simple rounding is known 1 for the dummy variables), the value 1 is as-
to perform very well in empirical data. signed to the dummy variable with the highest
by Pennsylvania State University on 08/18/10. For personal use only.

With normal-model MI (this also applies to imputed value, and 0 is assigned to all others in
SEM analysis with FIML), categorical variables the dummy variable set. The results of this fix
with two levels may be used directly. However, will be excellent under most circumstances.
categorical variables with more than two levels
must first be dummy coded. If there are p levels Estimating proportions and frequencies
in the categorical variable, then p − 1 dummy with normal-model MI. Although normal-
variables must be created to represent the cate- model MI does a good job of preserving many
gorical variable. For example, with a categorical important characteristics of the data set as a
variable with four levels, this dummy coding is whole, it is important to note that it does not
completed as shown in Table 2. preserve proportions and frequencies, except in
If the original categorical variable has no the special case of a variable with just two levels
missing data, then creating these dummy vari- (e.g., yes and no coded as 1 and 0), in which
ables and using them in the missing data analysis case the proportion of people giving the 1 re-
is all that must be done. However, if the origi- sponse is the same as the mean and is thus pre-
nal categorical variable does have missing data, served. However, consider the question, “How
then imputation under the normal model may many cigarettes did you smoke yesterday?”
not work perfectly, and an ad hoc fix must be (0 = none, 1 = 1–5, 2 = 6 or more). Re-
used. The problem is that dummy coding has searchers may be interested in knowing the pro-
precise meaning when all dummy-code values portion of people who have smoked cigarettes.
for a particular person are 0 or if there is exactly This is the same as the proportion of people
one 1 for the person. If there is a missing value who did not respond 0, or one minus the pro-
for the original categorical variable, then all of portion who gave the 0 response. Although the
the dummy variables will also be missing and it mean of this three-level smoking variable will
be correct with normal-model MI, the pro-
Table 2 Example of dummy coding with portion of people with the 0 response is not
four-level categorical variable guaranteed to be correct unless the three-level
smoking variable happens to be normally dis-
Dummy variable
tributed (which is unlikely in most popula-
Original category D1 D2 D3
tions). This problem can be corrected simply
1 1 0 0
by performing a separate EM analysis with the
2 0 1 0
two-level version of this smoking variable (e.g.,
3 0 0 1
0 versus other). The EM mean provides the
4 0 0 0 correct proportion. Correct frequencies for all

www.annualreviews.org • Missing Data Analysis 563


ANRV364-PS60-21 ARI 27 October 2008 16:22

response categories, if needed, may be obtained learning new procedures. In addition, the per-
in this case by recasting the three-level categor- formance of PAN has not been adequately eval-
ical variable as two dummy variables. uated at this time. Alternatively, the number of
clusters can be reduced in a reasonable way. For
example, if it is known that certain clusters have
Normal-Model MI similar means, then these clusters could be com-
with Clustered Data bined (i.e., they would have the same dummy
The term “clustered data” refers to the situa- variable) prior to imputation (note that this kind
tion in which cases are clustered in naturally of combining of clusters must be done within
occurring groups, for example, students within experimental groups). For every combination
schools. In this situation, the members of each of this sort that can be reasonably formed, the
cluster are often more similar to one another number of dummy variables is reduced by one.
in some important ways than they are to mem- I have even seen the strategy of performing a
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

bers of other clusters. This type of multilevel k-means cluster analysis on the key study vari-
structure requires special methods of analysis ables using school averages as input. This type
by Pennsylvania State University on 08/18/10. For personal use only.

(e.g., Raudenbush & Bryk 2002; also see Murray of analysis would help identify the clusters of
1998). These multilevel analysis models allow clusters for which means on key variables are
variable means to be different across the dif- similar.
ferent clusters (random intercepts model), and One factor to take into consideration when
sometimes they allow the covariances to be dif- employing the dummy variable approach to
ferent in the different clusters (random slopes handling cluster data is that sometimes vari-
and intercepts model). If the analysis of choice ables, especially binary variables, that have very
is a random slopes and intercepts model, then, low counts (e.g., marijuana use among fifth
just as described above, the imputation model graders) will be constants within one or more
should involve imputing separately within each of the clusters. The data should be examined
cluster. However, if the analysis of choice in- for this kind of problem prior to attempting the
volves only random intercepts, then a some- dummy variable approach. A good way to start is
what easier option is available. In this situation, to perform a principal components analysis on
the cluster membership variable can be dummy the variables to be included in the imputation
coded. That is, for p clusters, p − 1 dummy model, along with the dummy variables. If the
variables would be specified (see Table 2 for a last eigenvalue from this analysis is positive, the
simple dummy variable example). dummy coding strategy will most likely work.
With the dummy coding strategy, the p −
1 dummy variables are included in the imputa-
tion model. As long as the number of clusters is Large Numbers of Variables
relatively small compared with the sample size, This problem represents perhaps the biggest
this dummy coding strategy works well. I have challenge for missing data analyses in large
seen this strategy work well with as many as field studies, especially longitudinal field stud-
35 dummy variables, although a smaller num- ies. Consider that most constructs are measured
ber is desirable. Remember that the number of with multiple items. Five constructs with four
dummy variables takes away from the number items per construct translates into 20 individual
of substantive variables that reasonably can be variables. Five waves of measurement produce
used in the imputation model. 100 variables. If more constructs are included in
When the number of clusters is too high to the analysis, it is difficult to keep the total down
work with normal-model MI, several options to the k = 100 that I have recommended. It is
are available. A specialty MI model (such as even more difficult to keep the variables in the
PAN; Schafer 2001) can be employed, but that imputation model to a reasonable number if one
strategy can sometimes be costly in terms of has cluster data and is employing the dummy

564 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

variable strategy (see previous section). Below, I variables that are correlated r = 0.50 or bet-
describe briefly two strategies that have worked ter with the variables of interest will generally
well in practice. help the analysis to have less bias and more
power. However, adding auxiliary variables with
Imputing whole scales. If the analysis in- lower correlations will typically have little in-
volves latent variable analysis (e.g., SEM) such cremental benefit, especially when these addi-
that the individual variables must be part of tional variables are correlated with the auxiliary
the analysis, then imputing whole scales is not variables already being used.
possible. However, analyses often involve the Good candidates for auxiliary variables are
whole scales anyway, so imputing at that level is the same variables used in the analytic model,
an excellent compromise. As long as study par- but measured at different waves. For example,
ticipants have data either for all or for none of in a test of a program’s effect on smoking at wave
the scale items, then this strategy is easy. The 4, with smoking at wave 1 as a covariate, smok-
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

problem with using this strategy is how to deal ing at waves 2 and 3 is not being used in the
with the partial data on the scale. Schafer & analysis model but should be rather highly cor-
by Pennsylvania State University on 08/18/10. For personal use only.

Graham (2002) suggested that forming a scale related with smoking at wave 4. These two vari-
score based on partial data can cause problems ables would make excellent auxiliary variables.
in some situations, but may be fine in others. A related strategy is to impute variables that
In my experience, forming a scale score based are relatively highly correlated with one an-
on partial data will be acceptable (a) if a rel- other. Suppose, for example, that one would like
atively high proportion of variables are used to conduct program effect analyses on a large
to form the scale score (and never fewer than number of DVs, say 30. But including 30 DVs,
half of the variables), and (b) when the variables 30 wave-1 covariates, and the corresponding
are consistent with the domain sampling model 60 variables from waves 2 and 3 (as auxiliary
(Nunnally 1967), and (c) when the variables variables) would be too many, especially con-
have relatively high coefficient alpha. Variables sidering that there are perhaps 10 background
can be considered consistent with the domain variables as covariates and perhaps 20 dummy
sampling model when the item-total correla- variables representing cluster membership. All
tions (or factor loadings) for the variables within these hypothetical variables total 150 variables
the scale are all similar. Conceptually, it must be in the imputation model.
reasonable that dropping one variable from the In this instance, a principal components
scale has essentially the same meaning as drop- analysis on the 30 DVs could identify three
ping any other variables from the scale. Other- or four sets of items that are relatively highly
wise, one must either discard any partial data or correlated. This approach does not need to be
impute at the individual item level for that scale. precise; it is simply used as a device for identi-
Imputing at the scale level for even some of the fying variables that are more correlated. These
study scales will often help with the problem of groupings of variables would then be imputed
having too many variables. together. By conducting imputation analyses
and analyses in these groupings, nearly all of
Think FIML. Another strategy that works the benefits of the auxiliary variables would be
well in practice is to start with the variables gained at a minimum cost of time for imputa-
that would be included in the analysis model tion and analysis.
of substantive interest. If the FIML approaches
are used, then these are the only variables that
would be included using the typical practice. Practicalities of Measurement:
In addition, one should carefully select the few Planned Missing Data Designs
auxiliary variables that will have the most ben- In this section, I outline the developing area
eficial impact on the model. Adding auxiliary of planned missingness designs. If you are

www.annualreviews.org • Missing Data Analysis 565


ANRV364-PS60-21 ARI 27 October 2008 16:22

following my advice, you are already (or soon shared by most other measurement designs of
will be) using the recommended MI and ML this general sort (generically described as matrix
missing data procedures. Thus, it is time to sampling). The drawback to the 3-form design
explore new possibilities with respect to data is that some correlations, because they are based
collection. on only one-third of the sample, are tested with
lower power. However, as Graham et al. (2006)
show, virtually all of the possible drawbacks are
3-Form design. Measurement is the corner-
under the researcher’s control and can generally
stone of science. Without measurement, there
be avoided.
is no science. When researchers design the mea-
surement component of their study, they are
Two-method measurement. Graham et al.
universally faced with the dilemma of wanting
(2006) also described a planned missingness de-
to ask more questions than their study partic-
sign called two-method measurement (also see
ipants are willing to answer, given what they
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Allison & Hauser 1991, who describe a re-


are being paid. Invariably, researchers are faced
lated design). The two-method measurement
with the choice of asking fewer questions or
by Pennsylvania State University on 08/18/10. For personal use only.

design stems from the need to obtain good,


paying their participants more. The 3-form de-
valid measures of the main DV. Researchers
sign (Graham et al. 2006), which has been in
in many domains face a dilemma: (a) collect
use since the early 1980s (e.g., see Graham
data from everyone with a relatively inexpen-
et al. 1984), gives researchers a third choice.
sive, but questionably valid measure (e.g., a
In its generic form, the 3-form design allows
self-administered, self-report measure of recent
researchers to increase by 33% the number of
physical activity), or (b) collect data from a small
questions for which data are collected without
proportion of study participants using a more
changing the number of questions asked of each
valid, but much more expensive, measure (e.g.,
respondent. The trick is to divide all questions
using an expensive accelerometer). The two-
asked into four items sets. The X set, which
method measurement design allows the collec-
contains questions most central to the study
tion of both kinds of data: complete data for
outcomes, is asked of everyone. But the A, B,
the less expensive measure, and partial data (on
and C sets of items are rotated, such that one
a random sample of participants) for the expen-
set is omitted from each of the three forms.
sive measure. SEM models are then tested in
Table 3 describes the basic idea of the design
which the two kinds of data are both used as in-
(Raghunathan & Grizzle 1995 suggested a sim-
dicators of a latent variable of the construct of
ilar design involving all possible two-set com-
interest (e.g., recent physical activity). If certain
binations of five items sets).
assumptions are met (and they are commonly
The benefit of the 3-form design is that one
met), then this SEM approach allows for the test
gathers data for 33% more questions than can
of the main study hypotheses with more statis-
be asked of any one participant. Also, impor-
tical power than is possible with the expensive
tantly, at least one-third of the participants pro-
measure alone and with more construct validity
vide data for every pair of questions. That is, all
than is possible with the inexpensive measure
correlations are estimable. This feature is not
alone. For details on this design, see Graham
Table 3 3-form design et al. (2006).

Respondent received item set?


Form X A B C ATTRITION AND MNAR
1 Yes Yes Yes No MISSINGNESS
2 Yes Yes No Yes The methods described up to this point are
3 Yes No Yes Yes clear. I believe that the directions we have

566 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

headed, and the MI and ML methods that have dence that the biasing effects of attrition can be
been developed, are the right way to go. My tolerably low, even with what is normally con-
advice continues to be to learn and use these sidered substantial attrition. Furthermore, I cite
methods. I am certain that the major software research showing that if the recommended safe-
writers will continue to help in this regard, and guards are put in place, the effects of attrition
in a very few years, the software solutions will can be further diminished.
be abundant (e.g., see these recent missing data With MAR missingness, missing scores at
developments: the feature in Mplus for easing one wave can be predicted from the scores at
the process of including auxiliary variables; the previous waves. This is true even though the
association between SPSS and Amos software; previous scores might be markedly different for
and the inclusion of MI into SAS/STAT® soft- stayers and leavers. With MNAR missingness,
ware). But as we move forward into the arena the missing scores cannot be predicted based
of attrition and MNAR missingness, the waters on the previous scores; the model that applies
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

get a bit murky. The following section describes to complete cases for describing how scores
more of a work in progress. Nevertheless, I be- change over time does not apply to those who
by Pennsylvania State University on 08/18/10. For personal use only.

lieve that we are making strides, and the way to have missing data. The practical problem with
move forward is becoming clearer. MNAR missingness is that the missing scores
It has been said that attrition in longitu- could be anywhere (high or low). And because
dinal studies is virtually ubiquitous. Although of this uncertainty, it is possible that clear judg-
that may be true in large part, also ubiqui- ments cannot be made about the study conclu-
tous is the fear researchers and critics express sions. The strategies I present in this section
that attrition is a threat to the internal validity serve to reduce that uncertainty. Some of these
of the study and the readiness with which re- strategies can be employed after the fact (es-
searchers are willing to discount study results pecially for longitudinal studies), but many of
when more than a few percent of the partici- the strategies must be planned in advance for
pants have dropped out along the way. To make maximum effectiveness.
matters worse, even the missing data experts
often make statements that, if taken out of con-
text, seem to verify that the fears about attri- Some Clarifications About Missing
tion were well founded. To be fair, the concerns Data Mechanisms
about attrition began well before the missing The major three missingness mechanisms are
data revolution, and in the absence of the cur- MCAR, MAR, and MNAR. These three kinds
rent knowledge base about missing data, per- of missingness should not be thought of as mu-
haps taking the conservative approach was the tually exclusive categories of missingness, de-
right thing to do. spite the fact that they are often misperceived
But now we have this knowledge base, and as such. In particular, MCAR, pure MAR, and
it is time to take another, careful look at at- pure MNAR really never exist because the pure
trition. It is important to acknowledge that the form of any of these requires almost universally
conclusions in some studies will be adversely af- untenable assumptions. The best way to think
fected by attrition and MNAR missingness, and of all missing data is as a continuum between
I am not suggesting that we pretend that isn’t MAR and MNAR. Because all missingness is
the case. But I do believe that the effects of at- MNAR (i.e., not purely MAR), then whether
trition on study conclusions in a general sense it is MNAR or not should never be the issue.
are not nearly as severe as commonly feared. Rather than focusing on whether the MI/ML
In this section, I describe the beginnings of a assumptions are violated, we should answer the
framework for measuring the extent to which question of whether the violation is big enough
attrition has biasing effects, and I present evi- to matter to any practical extent.

www.annualreviews.org • Missing Data Analysis 567


ANRV364-PS60-21 ARI 27 October 2008 16:22

Measuring the Biasing Effects Suppression and inflation bias. Two kinds of
of Attrition attrition bias are inflation and suppression bias.
Inflation bias makes a truly ineffective program,
In order for researchers to move toward a miss-
or experimental manipulation, appear to be ef-
ing data approach that focuses on the likely
fective. Suppression bias, on the other hand,
impact of MNAR missingness, they need tools
makes a truly effective program look less effec-
for measuring the effects of missingness (attri-
tive or an ineffective program look as if it had
tion) on estimation bias. Collins et al. (2001)
a harmful effect. When evaluation researchers
made effective use of the standardized bias
write about attrition bias, they usually are talk-
(presented as a percent of the standard er-
ing about inflation bias, which is a major con-
ror) for this purpose: Standardized Bias =
cern because it calls into question the internal
100 × (average parameter estimate − popu-
validity of a study. Suppression bias is much
lation value)/SE, where SE = the standard er-
less important if it occurs along with a signifi-
ror of the estimate, or the standard deviation
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

cant program effect in the desired direction and


of the sampling distribution for the parameter
thus does not undermine the internal validity of
estimate in question. Collins et al. (2001) ar-
by Pennsylvania State University on 08/18/10. For personal use only.

the study. On the other hand, suppression bias


gued that standardized bias greater than 40%
can be a significant factor if it keeps the truly
(i.e., more than 40% of a standard error) rep-
effective nature of a program from being ob-
resented estimation bias that was of practical
served. The possibility of suppression bias is an
significance. Armed with this tool for judging
especially important factor during the planning
whether MNAR bias was of practical signif-
(and proposal) stages of a project. The chances
icance, Collins et al. (2001) showed that the
are reduced that a project will be funded if the
regression coefficient (X predicting Y), where
power to detect true effects is likely to be di-
the cause of missingness was Z, was biased to
minished unduly because of suppression bias.
a practical degree only when there was 50%
missing on Y and rZY = 0.90. With 50%
Important quantities in describing miss-
missing on Y and rZY = 0.40, 25% missing
ingness. In any discussion of missing data
on Y and rZY = 0.90, and 25% missing on
and attrition, three quantities are prominently
Y and rZY = 0.40, the bias was judged not
featured: (a) the amount of missingness (i.e.,
to be of practical significance when the cause
percent missing or percent attrition), (b) rZY , the
of missingness was omitted from the model
correlation between the cause of missingness,
(i.e., for MNAR missingness). (Note that al-
Z, and the model variable containing missing-
though Collins et al. 2001 found that the re-
ness, Y, and (c) rZR , the correlation between the
gression coefficient was largely unbiased, the
cause of missingness, Z, and missingness itself,
mean of Y did show a practical level of bias in
R. This last quantity, rZR , is often manipulated
all four of their missingness and rZY scenarios
as MAR linear by allowing the probability of
described above. This may be an issue in some
a missing Y to be dependent on the quartiles
studies.)
of Z. In Collins et al. (2001), for example, the
The findings of the Collins et al. (2001)
probabilities of missing on Y were 0.20, 0.40,
study are important for a variety of reasons:
0.60, and 0.80 for the first, second, third, and
(a) they demonstrated the usefulness of stan-
fourth quartiles of Z, respectively. The magni-
dardized bias as a way of measuring the practi-
tude of this correlation, rZR , which depends on
cal effects of bias due to attrition, and (b) they
the range of these probabilities, can be thought
showed that MNAR missingness alone is of-
of as the strength of the lever for missingness.
ten not sufficient to affect the internal valid-
That is, a wide range means greater impact on
ity of an experimental study to any practical
missingness (higher rZR ), and a narrow range
extent.
means less impact (lower rZR ). Collins et al.

568 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

(2001) used a rather strong lever for missingness did not have data for the last measure (week 6),
in their simulations. Their lever for 50% miss- the people in the drug condition appeared to
ingness produces rZR = 0.447. A weaker lever be doing even better, and those in the placebo
for missingness would involve setting the four condition appeared to be doing even worse.
probabilities for Y missing to different val- Although it is highly speculative to extrap-
ues, for example, 0.35, 0.45, 0.55, and 0.65 for olate from a single pretest to the last posttest,
the four quartiles of Z. This lever also pro- extrapolation to the last posttest makes much
duces 50% missingness on Y, but corresponds to more sense when one is working from a clearly
rZR = 0.224. established longitudinal trend. That is, there is
much less uncertainty about what the missing
Missing Data Diagnostics scores might be on the final measure. In the
Hedeker & Gibbons (1997) study, for exam-
Researchers often want to examine the differ-
ple, it can be safely assumed that those without
ence between stayers and leavers on the pretest
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

data for the final wave continued along the same


variables of a longitudinal study. Knowledge
(or similar) trajectory that could be observed
will be gained from this practice, but not as
by Pennsylvania State University on 08/18/10. For personal use only.

through three of the four time points.


much as researchers might think. Because the
Figure 1 displays the same kind of data
missingness is not MCAR, any differences ob-
from the Adolescent Alcohol Prevention Trial
served on pretest variables should not be un-
(AAPT; Hansen & Graham 1991). The figure
expected. And the information gained cannot
illustrates the same four plots, this time for the
indicate the degree to which the missingness
program group (Norm) and comparison group
is MNAR. Perhaps the best value of this kind
(No Norm) for those who did have data for
of analysis is to identify the pretest variables
the final follow-up measure at eleventh grade
that most strongly predict missingness later in
and for those who did not. It is evident that
the study; these variables can be included in the
just as in the study described by Hedeker &
missing data model (e.g., see Heckman 1979,
Gibbons (1997), the plots look reasonably
Leigh et al. 1993).
smooth, and it would be a reasonable to assume
The diagnostics of Hedeker & Gibbons.
However, making use of longitudinal data and
examining the missingness and the patterns of 2.6
change over time on the main DV can be very 2.4
enlightening. Hedeker & Gibbons (1997) pro-
2.2
vided an excellent example of this kind of di-
agnostic. In the empirical study they used to 2
illustrate their analytic technique (a pattern 1.8
mixture model), they plotted the main DV
1.6
over the four main measurement points in
four groups: (a) drug group, data for week 6; 1.4
(b) placebo group, data for week 6; (c) drug 1.2
group, data missing for week 6; (d ) placebo smk7 smk8 smk9 smk10 smk11
group, data missing for week 6 (where week 6 Norm No Norm Norm No Norm
was the final measure in the study).
These plots (Hedeker & Gibbons 1997, Figure 1
p. 72) show clearly that the changes over time in Smoking levels over time for those with and without data for the last wave of
measurement. Students in the Norm program group (square markers) reported
all four groups were nearly linear. Among those
lower levels of cigarette smoking than did students in the Comparison group
with data for the last time point, the participants (circle markers) for those who had data for the last wave of measurement
in the drug group were clearly doing better than (eleventh grade; white markers) as well as for those who had missing data for the
those in the placebo group. Among those who last wave of measurement (black markers).

www.annualreviews.org • Missing Data Analysis 569


ANRV364-PS60-21 ARI 27 October 2008 16:22

that the missing smoking scores for eleventh confidence in those study conclusions. In ad-
grade followed this same or a similar trajectory. dition, the models suggested by Hedeker &
All of these plotted points make it more dif- Gibbons (1997) may prove to be especially use-
ficult to imagine that the missing points are ful when the longitudinal patterns of the main
hugely different from where one would guess DV are as smooth as they described.
they would be based on the observed trajecto-
ries. On the other hand, it would make little
sense to try to predict the missing eleventh- Strategies for Reducing the Biasing
grade data based on the seventh-grade pretest Effects of Attrition
data alone. Based only on those pretest scores,
Use auxiliary variables. Probably the single
the missing eleventh-grade scores could indeed
best strategy for reducing bias (and increasing
be anywhere.
statistical power lost owing to missing data) is
The data described by Hedeker & Gibbons
to include good auxiliary variables in the miss-
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

(1997) and the data presented here are very well


ing data model (see Collins et al. 2001). As
behaved. However, not all plots of the main DV
Little (1995) put it, “one should collect covari-
by Pennsylvania State University on 08/18/10. For personal use only.

over time will be this smooth. In a longitudinal


ates that are useful for predicting missing val-
study, for example, the changes in the DV over
ues.” It is important that these variables need
time may not conform to a simple, well-defined
only be good for predicting the missing values;
curve (linear or curvilinear). Under those con-
they need not be related to missingness, per se.
ditions, predicting the missing final data point
Good candidate variables for auxiliary variables
would be much more difficult.
are measures of the main DV that happen not to
be in the analysis model. However, if the analy-
Nonignorable (MNAR) Methods sis already involves all measures of the DV (e.g.,
with latent growth modeling analyses), then the
Nonignorable methods (e.g., see Demirtas
incremental benefit of other potential auxiliary
2005; Demirtas & Schafer 2003; Hedeker &
variables is likely to be small.
Gibbons 1997; Little 1993, 1994, 1995) may
Collins et al. (2001) showed that including
be very useful. However, it is not necessarily
an auxiliary variable with rZY = 0.40 reduced
true that any particular method will be better
relatively little bias in any of the parameters ex-
than MAR methods (e.g., normal-model MI or
amined. However, including an auxiliary vari-
ML) for any particular empirical study. It is well
able with rZY = 0.90 had a major impact on bias.
known that methods for handling nonignorable
Later simulations have suggested that the ben-
data require the analyst to make assumptions
efit from auxiliary variables begins to be notice-
about the model of missingness. If this model is
able at about rZY = 0.50 or 0.60. Furthermore,
incorrect, the MNAR model may perform even
it appears that one or two auxiliary variables
less well than standard MAR methods (e.g., see
with rZY = 0.60 are better than 20 auxiliary
Demirtas & Schafer 2003).
variables whose correlations with Y are all less
On the other hand, MNAR methods such as
than rZY = 0.40. This is true because such vari-
pattern mixture models may, as argued above,
ables are often intercorrelated, and the incre-
be excellent tools for describing the missingness
mental benefit of adding them to the model is
in longitudinal fashion, thereby increasing one’s
very small.
confidence in many instances about the true na-
ture of the missingness. As suggested by Little
(1993, 1994, 1995), this type of model can be a Longitudinal missing data diagnostics. The
good way to perform sensitivity analyses about excellent strategy described by Hedeker &
the model structure. If the same general study Gibbons (1997) is discussed above. If longitu-
conclusions are made over a wide variety of pos- dinal data are available, this strategy is a good
sible missing data models, then one has greater way to describe the missingness patterns. Not

570 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

all patterns will be good. The best longitudinal Suggestions for Reporting About
patterns are those that reduce the uncertainty Attrition in an Empirical Study
about what the missing scores might be at the
1. Avoid generic, possibly misleading, state-
last waves of measurement.
ments about the degree to which attrition
plagues longitudinal research.
2. Be precise about the amount of attrition;
Measuring intent to drop out. Schafer & avoid vague terms that connote a miss-
Graham (2002) suggested that a potentially ing data problem. Use precise percentage
good way of reducing attrition bias is to mea- of dropout from treatment and control
sure participants’ intent to drop out of the study. groups if that is relevant.
Some people say they will drop out and do drop 3. Missingness on the main DV can be
out, whereas others say they will drop out and caused by (a) the program itself, (b) the
do not. Those who do not drop out provide a DV itself, (c) the program × DV inter-
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

good basis for imputing the scores of those who action, or (d ) any combination of these
do. Demirtas & Schafer (2003) suggested that factors. Perform analyses that lay out as
by Pennsylvania State University on 08/18/10. For personal use only.

this approach might be one good way of deal- clearly as possible which version of attri-
ing with MNAR missingness. Leon et al. (2007) tion is most likely in the particular study
performed a simulation that suggested that this (e.g., see Hedeker & Gibbons 1997).
approach can be useful. 4. Based on longitudinal diagnostics, assess
the degree of estimation bias, for ex-
ample, using standardized bias (e.g., see
Collecting follow-up data from those Collins et al. 2001) for this configuration
initially missing. Perhaps the best way of and percent of attrition, and determine
dealing with MNAR missingness is to follow the kind of bias (suppression or inflation).
up and measure a random sample of those 5. Draw study conclusions in light of these
initially missing from the measurement ses- facts.
sion (e.g., see Glynn et al. 1993, Graham &
Donaldson 1993). This strategy is not easy,
and it is impossible in some research settings Suggestions for Conduct
(e.g., where study participants have died). How- and Reporting of Simulation
ever, even if some studies are conducted that Studies on Attrition
include these follow-up measures from a ran- 1. Avoid generic, possibly misleading, state-
dom sample of those initially missing, it could ments about the degree to which attrition
shed enormous light on the issues surrounding plagues longitudinal research. Limit the
MNAR missingness. With a few well-placed number of assertions about the possible
studies of this sort, we would be an excel- problems associated with attrition. Those
lent position to establish the true bias from of us who do this kind of simulation study
using MAR methods and a variety of MNAR must shoulder the responsibility of be-
methods. ing precise in how we talk about these
One wrinkle with this approach is that al- topics—what we say can be very influen-
though collecting data from a random sample of tial. Be careful to give proper citations for
those initially missing can be very difficult, col- any statement about the degree to which
lecting data from a nonrandom sample of those attrition is known to be a problem or not.
initially missing is much easier. Although in- Try to focus on the constructiveness of
ferences from this nonrandom sample are gen- taking proper steps to minimize any bias-
erally weaker than inferences possible from a ing effects of attrition.
random sample, data such as these may be of 2. Be precise about the amount of attri-
much value (e.g., see Glynn et al. 1993). tion in the simulation study. Provide a

www.annualreviews.org • Missing Data Analysis 571


ANRV364-PS60-21 ARI 27 October 2008 16:22

sufficient number of citations to demon- used as one of the primary yardsticks of prac-
strate that this amount of attrition is of tical importance of missing data bias. In their
plausible relevance in the substantive area study, they used 40% bias (parameter estimate
to which the simulation study applies. is four-tenths of a standard error different from
3. Be precise about the configuration of at- the population value) as the cutoff for MNAR
trition simulated in the study. If the con- bias that would be of practical concern. Any-
figuration of attrition is simulated in a thing greater than 40% would be considered to
manner different from that used in other be of practical concern. This implies that any-
simulation studies, then provide a de- thing 40% or less would be considered to be of
scription of the procedure in plain terms no practical concern.
and in comparison with the approaches Collins et al. (2001) went out on a limb
of other simulation researchers (e.g., see with an estimate of a cutoff for practical effect
the difference in style between Collins of MNAR bias, and this study is an excellent
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

et al. 2001 and Leon et al. 2007). Dif- starting place for this kind of research. I am
ferent approaches are all valuable; read- reminded of the early days of SEM research,
by Pennsylvania State University on 08/18/10. For personal use only.

ers with varying degrees of technical skill when researchers were struggling to find in-
just need to know how they relate to one dices of practical model fit and to find cut-
another. offs for such indices above which a model’s fit
might be judged as “good.” Bentler & Bonett
Be precise about the strength of attrition (1980) were the first to provide any kind of cut-
used in the simulation. For example, Collins off (0.90 for their nonnormed fit index). SEM
et al. (2001) specified increasing missingness researchers were eager to employ this 0.90 cut-
probabilities for the four quartiles of the vari- off, but with considerable experience with this
able Z (0.20 for Q1, 0.40 for Q2, 0.60 for Q3, fit index, eventually began to realize that per-
and 0.80 for Q4). As it turns out, this was very haps 0.95 was a better cutoff.
strong attrition compared to what it could have I suggest that researchers involved with
been (e.g., 0.35 for Q1, 0.45 for Q2, 0.55 for work where attrition is a factor (both empirical
Q3, and 0.65 for Q4). This is important because and simulation studies) begin to develop expe-
the strength used produced bias in the Collins rience with the standardized bias concept used
et al. (2001) study that would present practi- by Collins et al. (2001). But after years of expe-
cal problems, whereas the latter strength would rience, will we still believe that 40% bias is the
produce a level of bias that Collins and cowork- best cutoff? It is easy to show that a standard-
ers would have judged to be acceptably low, ized bias of 40% corresponds to a change in the
even with 50% missingness on Y, and rZY = t-value of 0.4. Can we tolerate such a change?
0.90 in both cases. Also, present a sufficient Other issues surround the use of standardized
number of citations from the empirical liter- bias. For example, larger sample sizes produce
ature to demonstrate that the strength of ef- more standardized bias. Future research should
fect used in the simulation actually occurs in address the possibility that different cut points
empirical research to an extent that makes the are needed for different sample sizes.
study useful. As noted above, the strength of
attrition used in the Collins et al. (2001) study Other indices of the practical impact of
was greater than is typically seen in empirical attrition bias. In the SEM literature, re-
research. searchers now enjoy a plethora of indices of
practical fit. There are even three or four funda-
mentally different approaches to these indices
More Research is Needed of practical fit. We need more such approaches
Collins et al. (2001) did a nice job of describ- to the practical effects of attrition. I encourage
ing the standardized bias concept, which they the development of such indices.

572 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

Collecting data on a random sample of those sis with missing data with multiple im-
initially missing. Many authors have recom- putation under the normal model and
mended collecting data on a random sample of maximum-likelihood (or FIML) meth-
those initially missing. However, most of this ods. Use them! My wish is that 10 years
has involved simulation work and not actual from now, everyone will be making use
data collection. Carefully conducted empirical of these procedures as a matter of course.
studies along the lines suggested by Glynn et al. Having these methods serve as our basic
(1993) and Graham & Donaldson (1993) to platform will raise the quality of every-
determine the actual extent of MNAR biases one’s research.
would be valuable, not just to the individual em- 2. Gain experience with MNAR missing
pirical study, but also to the study of attrition data (e.g., from attrition), especially with
in general. If this type of study is conducted the measures of the practical effect of
properly, it will give us much-needed informa- MNAR data. Use the indices that exist,
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

tion about the effect of MNAR processes on and evaluate them. Come up with your
estimation bias. It is possible that the kinds of own levels of what constitutes acceptable
by Pennsylvania State University on 08/18/10. For personal use only.

studies for which MNAR biases are greatest are levels of estimation bias. Where possi-
precisely the studies for which collection of ad- ble, publish articles describing a new ap-
ditional data on the initial dropouts is most dif- proach to evaluating the practical impact
ficult. Nevertheless, even a few such studies will of MNAR missingness on study conclu-
be a great benefit. sions.
3. Try to move away from the fear of missing
Empirical studies testing benefit of intent data and attrition. Situations will occur in
to drop out questions. The simulation study which missing data and attrition will af-
conducted by Leon et al. (2007) suggests that fect your research conclusions in an unde-
this strategy is promising. However, empirical sirable way. But don’t fear that eventual-
studies are needed that make use of these kinds ity. Embrace the knowledge that you will
of procedures. Best, perhaps, would be a few be more confident in your research con-
carefully conducted studies that examined the clusions, either way. Don’t see this possi-
combination of this approach with the approach ble situation as a reason not to understand
of collecting data on a random sample of those missing data issues. Focus instead on the
initially missing. idea that your new knowledge means that
when your research conclusions are de-
sirable, you needn’t have the fear that you
SUMMARY AND CONCLUSIONS got away with something. Rather, you can
1. Several excellent, useful, and accessible go ahead with the cautious optimism that
programs exist for performing analy- your study really did work.

DISCLOSURE STATEMENT
The author is not aware of any biases that might be perceived as affecting the objectivity of this
review.

LITERATURE CITED
Allison PD. 1987. Estimation of linear models with incomplete data. In Sociological Methodology 1987, ed. C
Clogg, pp. 71–103. San Francisco, CA: Jossey-Bass
Allison PD. 2002. Missing Data. Thousand Oaks, CA: Sage

www.annualreviews.org • Missing Data Analysis 573


ANRV364-PS60-21 ARI 27 October 2008 16:22

Allison PD, Hauser RM. 1991. Reducing bias in estimates of linear models by remeasurement of a random
subsample. Soc. Method Res. 19:466–92
Arbuckle JL, Wothke W. 1999. Amos 4.0 User’s Guide. Chicago: Smallwaters
Bentler PM, Bonett DG. 1980. Significance tests and goodness of fit in the analysis of covariance structures.
Psychol. Bull. 88:588–606
Bernaards CA, Belin TR, Schafer JL. 2007. Robustness of a multivariate normal approximation for imputation
of incomplete binary data. Stat. Med. 26:1368–82
Collins LM, Schafer JL, Kam CM. 2001. A comparison of inclusive and restrictive strategies in modern missing
data procedures. Psychol. Methods 6:330–51
Demirtas H. 2005. Multiple imputation under Bayesianly smoothed pattern-mixture models for nonignorable
drop-out. Stat. Med. 24:2345–63
Demirtas H, Schafer JL. 2003. On the performance of random-coefficient pattern-mixture models for non-
ignorable dropout. Stat. Med. 21:1–23
Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

(with discussion). J. R. Stat. Soc. B39:1–38


Duncan SC, Duncan TE. 1994. Modeling incomplete longitudinal substance use data using latent variable
growth curve methodology. Multivar. Behav. Res. 29:313–38
by Pennsylvania State University on 08/18/10. For personal use only.

Duncan SC, Duncan TE, Hops H. 1996. Analysis of longitudinal data within accelerated longitudinal designs.
Psychol. Methods 1:236–48
Duncan TE, Duncan SC, Hops H. 1994. The effect of family cohesiveness and peer encouragement on the
development of adolescent alcohol use: a cohort-sequential approach to the analysis of longitudinal data.
J. Stud. Alcohol 55:588–99
du Toit M, du Toit S. 2001. Interactive LISREL: User’s Guide. Lincolnwood, IL: Sci. Software Intl.
Efron B. 1982. The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia, PA: Soc. Industrial Appl.
Math.
Enders CK. 2001a. A primer on maximum likelihood algorithms available for use with missing data. Struct.
Equ. Model. 8:128–41
Enders CK. 2001b. The impact of nonnormality on full information maximum-likelihood estimation for
structural equation models with missing data. Psychol. Methods 6:352–70
Enders CK. 2003. Using the expectation maximization algorithm to estimate coefficient alpha for scales with
item-level missing data. Psychol. Methods 8:322–37
Glynn RJ, Laird NM, Rubin DB. 1993. Multiple imputation in mixture models for nonignorable nonresponse
with followups. J. Am. Stat. Assoc. 88:984–93
Graham JW. 2003. Adding missing-data relevant variables to FIML-based structural equation models. Struct.
Equ. Model. 10:80–100
Graham JW, Cumsille PE, Elek-Fisk E. 2003. Methods for handling missing data. In Research Methods in
Psychology, ed. JA Schinka, WF Velicer, pp. 87–114. Volume 2 of Handbook of Psychology, ed. IB Weiner.
New York: Wiley
Graham JW, Donaldson SI. 1993. Evaluating interventions with differential attrition: the importance of
nonresponse mechanisms and use of followup data. J. Appl. Psychol. 78:119–28
Graham JW, Flay BR, Johnson CA, Hansen WB, Grossman LM, Sobel JL. 1984. Reliability of self-report
measures of drug use in prevention research: evaluation of the Project SMART questionnaire via the
test-retest reliability matrix. J. Drug Educ. 14:175–93
Graham JW, Hofer SM. 1992. EMCOV Users Guide. Univ. S. Calif. Unpubl. documentation
Graham JW, Hofer SM. 2000. Multiple imputation in multivariate research. In Modeling Longitudinal and
Multiple-Group Data: Practical Issues, Applied Approaches, and Specific Examples, ed. TD Little, KU Schnabel,
J Baumert, 1:201–18. Hillsdale, NJ: Erlbaum
Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL. 1997. Analysis with missing data in
prevention research. In The Science of Prevention: Methodological Advances from Alcohol and Substance Abuse
Research, ed. K Bryant, M Windle, S West, 1:325–66. Washington, DC: Am. Psychol. Assoc.
Graham JW, Hofer SM, MacKinnon DP. 1996. Maximizing the usefulness of data obtained with planned
missing value patterns: an application of maximum likelihood procedures. Multivar. Behav. Res. 31:197–
218

574 Graham
ANRV364-PS60-21 ARI 27 October 2008 16:22

Graham JW, Hofer SM, Piccinin AM. 1994. Analysis with missing data in drug prevention research. In Advances
in Data Analysis for Prevention Intervention Research, National Institute on Drug Abuse Research Monograph,
ed. LM Collins, L Seitz, 142:13–63. Washington, DC: Natl. Inst. Drug Abuse
Graham JW, Olchowski AE, Gilreath TD. 2007. How many imputations are really needed? Some practical
clarifications of multiple imputation theory. Prev. Sci. 8:206–13
Graham JW, Roberts MM, Tatterson JW, Johnston SE. 2002. Data quality in evaluation of an alcohol-related
harm prevention program. Evaluation Rev. 26:147–89
Graham JW, Schafer JL. 1999. On the performance of multiple imputation for multivariate data with small
sample size. In Statistical Strategies for Small Sample Research, ed. R Hoyle, 1:1–29. Thousand Oaks, CA:
Sage
Graham JW, Taylor BJ, Cumsille PE. 2001. Planned missing data designs in analysis of change. In New Methods
for the Analysis of Change, ed. LM Collins, A Sayer, 1:335–53. Washington, DC: Am. Psychol. Assoc.
Graham JW, Taylor BJ, Olchowski AE, Cumsille PE. 2006. Planned missing data designs in psychological
research. Psychol. Methods 11:323–43
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Hansen WB, Graham JW. 1991. Preventing alcohol, marijuana, and cigarette use among adolescents: peer
pressure resistance training versus establishing conservative norms. Prev. Med. 20:414–30
Heckman JJ. 1979. Sample selection bias as a specification error. Econometrica 47:153–61
by Pennsylvania State University on 08/18/10. For personal use only.

Hedeker D, Gibbons RD. 1997. Application of random-effects pattern-mixture models for missing data in
longitudinal studies. Psychol. Methods 2:64–78
Honaker J, King G, Blackwell M. 2007. Amelia II: A Program for Missing Data. Unpubl. users guide. Cambridge,
MA: Harvard Univ. http://gking.harvard.edu/amelia/
Horton NJ, Kleinman KP. 2007. Much ado about nothing: a comparison of missing data methods and software
to fit incomplete data regression models. Am. Stat. 61:79–90
Jöreskog KG, Sörbom D. 1996. LISREL 8 User’s Reference Guide. Chicago: Sci. Software
King G, Honaker J, Joseph A, Scheve K. 2001. Analyzing incomplete political science data: an alternative
algorithm for multiple imputation. Am. Polit. Sci. Rev. 95:49–69
Lanza ST, Collins LM, Schafer JL, Flaherty BP. 2005. Using data augmentation to obtain standard errors
and conduct hypothesis tests in latent class and latent transition analysis. Psychol. Methods 10:84–100
Leigh JP, Ward MM, Fries JF. 1993. Reducing attrition bias with an instrumental variable in a regression
model: results from a panel of rheumatoid arthritis patients. Stat. Med. 12:1005–18
Leon AC, Demirtas H, Hedeker D. 2007. Bias reduction with an adjustment for participants’ intent to drop
out of a randomized controlled clinical trial. Clin. Trials 4:540–47
Little RJA. 1993. Pattern-mixture models for multivariate incomplete data. J. Am. Stat. Assoc. 88:125–34
Little RJA. 1994. A class of pattern-mixture models for normal incomplete data. Biometrika 81:471–83
Little RJA. 1995. Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 90:1112–
21
Little RJA, Rubin DB. 1987. Statistical Analysis with Missing Data. New York: Wiley
Little RJA, Rubin DB. 2002. Statistical Analysis with Missing Data. New York: Wiley. 2nd ed.
McArdle JJ. 1994. Structural factor analysis experiments with incomplete data. Multivar. Behav. Res. 29(4):
409–54
McArdle JJ, Hamagami F. 1991. Modeling incomplete longitudinal and cross-sectional data using latent
growth structural models. Exp. Aging Res. 18:145–66
McArdle JJ, Hamagami F. 1992. Modeling incomplete longitudinal data using latent growth structural equation
models. In Best Methods for the Analysis of Change, ed. L Collins, JL Horn, 1:276–304. Washington, DC:
Am. Psychol. Assoc.
Murray DM. 1998. Design and Analysis of Group-Randomized Trials. New York: Oxford Univ. Press
Muthén B, Kaplan D, Hollis M. 1987. On structural equation modeling with data that are not missing
completely at random. Psychometrika 52:431–62
Muthén LK, Muthén BO. 2007. Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén. 4th ed.
Neale MC, Boker SM, Xie G, Maes HH. 1999. Mx: Statistical Modeling. Richmond: Virginia Commonwealth
Univ. Dept. Psychiatry. 5th ed.
Nunnally JC. 1967. Psychometric Theory. New York: McGraw-Hill

www.annualreviews.org • Missing Data Analysis 575


ANRV364-PS60-21 ARI 27 October 2008 16:22

Olsen MK, Schafer JL. 2001. A two-part random-effects model for semicontinuous longitudinal data. J. Am.
Stat. Assoc. 96:730–45
Raghunathan TE. 2004. What do we do with missing data? Some options for analysis of incomplete data.
Annu. Rev. Public Health 25:99–117
Raghunathan TE, Grizzle J. 1995. A split questionnaire survey design. J. Am. Stat. Assoc. 90:54–63
Raudenbush SW, Bryk AS. 2002. Hierarchical Linear Models. Thousand Oaks, CA: Sage. 2nd ed.
Rubin DB. 1976. Inference and missing data. Biometrika 63:581–92
Rubin DB. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley
SAS Institute. 2000–2004. SAS 9.1.3 Help and Documentation. Cary, NC: SAS Inst.
Satorra A, Bentler PM. 1994. Corrections to test statistics and standard errors in covariance structure analysis.
In Latent Variables Analysis: Applications for Developmental Research, ed. A von Eye, CC Clogg, 1:399–419.
Thousand Oaks, CA: Sage
Schafer JL. 1997. Analysis of Incomplete Multivariate Data. New York: Chapman & Hall
Schafer JL. 1999. Multiple imputation: a primer. Stat. Methods Med. Res. 8:3–15
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Schafer JL. 2001. Multiple imputation with PAN. In New Methods for the Analysis of Change, ed. LM Collins,
AG Sayer, 1:357–77. Washington, DC: Am. Psychol. Assoc.
Schafer JL, Graham JW. 2002. Missing data: our view of the state of the art. Psychol. Methods 7:147–77
by Pennsylvania State University on 08/18/10. For personal use only.

Schafer JL, Olsen MK. 1998. Multiple imputation for multivariate missing data problems: a data analyst’s
perspective. Multivar. Behav. Res. 33:545–71
Schafer JL, Yucel RM. 2002. Computational strategies for multivariate linear mixed-effects models with
missing values. J. Comput. Graph. Stat. 11:437–57
Tanner MA, Wong WH. 1987. The calculation of posterior distributions by data augmentation (with discus-
sion). J. Am. Stat. Assoc. 82:528–50
von Hippel PT. 2004. Biases in SPSS 12.0 Missing Value Analysis. Am. Stat. 58:160–64
Willett JB, Sayer AG. 1994. Using covariance structure analysis to detect correlates and predictors of individual
change over time. Psychol. Bull. 116(2):363–81
Wothke W. 2000. Longitudinal and multigroup modeling with missing data. In Modeling Longitudinal and
Multiple-Group Data: Practical Issues, Applied Approaches, and Specific Examples, ed. TD Little, KU Schnabel,
J Baumert, 1:219–40. Hillsdale, NJ: Erlbaum

576 Graham
AR364-FM ARI 11 November 2008 15:42

Annual Review of
Psychology

Contents Volume 60, 2009

Prefatory
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Emotion Theory and Research: Highlights, Unanswered Questions,


and Emerging Issues
by Pennsylvania State University on 08/18/10. For personal use only.

Carroll E. Izard p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 1
Concepts and Categories
Concepts and Categories: A Cognitive Neuropsychological Perspective
Bradford Z. Mahon and Alfonso Caramazza p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p27
Judgment and Decision Making
Mindful Judgment and Decision Making
Elke U. Weber and Eric J. Johnson p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p53
Comparative Psychology
Comparative Social Cognition
Nathan J. Emery and Nicola S. Clayton p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p87
Development: Learning, Cognition, and Perception
Learning from Others: Children’s Construction of Concepts
Susan A. Gelman p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 115
Early and Middle Childhood
Social Withdrawal in Childhood
Kenneth H. Rubin, Robert J. Coplan, and Julie C. Bowker p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 141
Adulthood and Aging
The Adaptive Brain: Aging and Neurocognitive Scaffolding
Denise C. Park and Patricia Reuter-Lorenz p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 173
Substance Abuse Disorders
A Tale of Two Systems: Co-Occurring Mental Health and Substance
Abuse Disorders Treatment for Adolescents
Elizabeth H. Hawkins p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 197

vii
AR364-FM ARI 11 November 2008 15:42

Therapy for Specific Problems


Therapy for Specific Problems: Youth Tobacco Cessation
Susan J. Curry, Robin J. Mermelstein, and Amy K. Sporer p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 229
Adult Clinical Neuropsychology
Neuropsychological Assessment of Dementia
David P. Salmon and Mark W. Bondi p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 257
Child Clinical Neuropsychology
Relations Among Speech, Language, and Reading Disorders
Bruce F. Pennington and Dorothy V.M. Bishop p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 283
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Attitude Structure
Political Ideology: Its Structure, Functions, and Elective Affinities
by Pennsylvania State University on 08/18/10. For personal use only.

John T. Jost, Christopher M. Federico, and Jaime L. Napier p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 307


Intergroup relations, stigma, stereotyping, prejudice, discrimination
Prejudice Reduction: What Works? A Review and Assessment
of Research and Practice
Elizabeth Levy Paluck and Donald P. Green p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 339
Cultural Influences
Personality: The Universal and the Culturally Specific
Steven J. Heine and Emma E. Buchtel p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 369
Community Psychology
Community Psychology: Individuals and Interventions in Community
Context
Edison J. Trickett p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 395
Leadership
Leadership: Current Theories, Research, and Future Directions
Bruce J. Avolio, Fred O. Walumbwa, and Todd J. Weber p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 421
Training and Development
Benefits of Training and Development for Individuals and Teams,
Organizations, and Society
Herman Aguinis and Kurt Kraiger p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 451
Marketing and Consumer Behavior
Conceptual Consumption
Dan Ariely and Michael I. Norton p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 475

viii Contents
AR364-FM ARI 11 November 2008 15:42

Psychobiological Mechanisms
Health Psychology: Developing Biologically Plausible Models Linking
the Social World and Physical Health
Gregory E. Miller, Edith Chen, and Steve Cole p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 501
Health and Social Systems
The Case for Cultural Competency in Psychotherapeutic Interventions
Stanley Sue, Nolan Zane, Gordon C. Nagayama Hall, and Lauren K. Berger p p p p p p p p p p 525
Research Methodology
Missing Data Analysis: Making It Work in the Real World
John W. Graham p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 549
Annu. Rev. Psychol. 2009.60:549-576. Downloaded from arjournals.annualreviews.org

Psychometrics: Analysis of Latent Variables and Hypothetical Constructs


by Pennsylvania State University on 08/18/10. For personal use only.

Latent Variable Modeling of Differences and Changes with


Longitudinal Data
John J. McArdle p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 577
Evaluation
The Renaissance of Field Experimentation in Evaluating Interventions
William R. Shadish and Thomas D. Cook p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 607
Timely Topics
Adolescent Romantic Relationships
W. Andrew Collins, Deborah P. Welsh, and Wyndol Furman p p p p p p p p p p p p p p p p p p p p p p p p p p p p 631
Imitation, Empathy, and Mirror Neurons
Marco Iacoboni p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 653
Predicting Workplace Aggression and Violence
Julian Barling, Kathryne E. Dupré, and E. Kevin Kelloway p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 671
The Social Brain: Neural Basis of Social Knowledge
Ralph Adolphs p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 693
Workplace Victimization: Aggression from the Target’s Perspective
Karl Aquino and Stefan Thau p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 717

Indexes

Cumulative Index of Contributing Authors, Volumes 50–60 p p p p p p p p p p p p p p p p p p p p p p p p p p p 743


Cumulative Index of Chapter Titles, Volumes 50–60 p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 748
Errata

An online log of corrections to Annual Review of Psychology articles may be found at


http://psych.annualreviews.org/errata.shtml

Contents ix

You might also like