0% found this document useful (0 votes)

15 views

37316

The document discusses computational issues that may arise when performing multiple imputation to handle missing data in datasets. It outlines several issues including ensuring convergence of the imputation model, the potential problems of assuming a multivariate normal distribution, issues with perfect prediction of categorical variables, and imputing values outside of the original observed data.

Uploaded by

yesaya.tommy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

37316

Uploaded by

yesaya.tommy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Brief comments on computational issues with multiple imputation

James Carpenter and Mike Kenward

September 2008

1 Introduction

Multiple Imputation (MI) is a Monte-Carlo (i.e. stochastic) method for parameter estimation in
partially observed datasets. It is now being increasingly used in practice. This document highlights
some of the computational issues that may arise.
MI is usually performed under the assumption that the mechanism causing the missing data is
‘Missing At Random’. We discuss the practical implications of this elsewhere (Carpenter and
Kenward, 2008). Here we note that although this assumption may be plausible, it cannot be
verified from the data at hand, and therefore the analysis of a partially observed data set under the
MAR assumption can never have the same status as the analysis of the fully observed dataset would
have had. Thus, it is helpful to present any analysis carried out on the partially observed data (e.g.
using MI) alongside the analysis based on only those units/individuals with no missing data (so
called ‘Complete Cases’ (CC)), to see how the conclusions differ. If there should be differences, it
is then important, if possible, to provide an explanation for these, as this increases confidence in
any conclusions drawn.
Of course MI and CC analysis can often be different because the mechanism causing the missing
data is MAR, and CC is only generally valid if data are MCAR. In this case it is useful to explain
the visible ways in which the mechanism departs from MCAR, and how this distorts the results of
a CC analysis.
The purpose of this document is to briefly mention some computational issues which could invalidate
the results of an MI analysis. These should be checked when preparing a MI analysis for publication.

2 Convergence

In order to impute completed datasets, the imputation model is typically fitted to all the observed
data (partial, as well as complete, cases). This is usually done by specifying (either explicitly, or
implicitly1 ) a joint model for the observed data where the partially observed variables are responses.
Most software packages set the imputation model in a Bayesian framework, then fit it using Markov
Chain Monte Carlo (MCMC) methods2 , and draw the imputed data from the resulting posterior
distributions of the unobserved measurements.
Fitting models using MCMC methods requires care. First the simulation process needs to be run
a number of times till it converges to the correct distribution (termed the ‘burn in’). Second, the
simulation process needs to be run a number of times between drawing imputations, so that —
informally speaking — the imputed data sets are sufficiently different.
There is a large literature on fitting Bayesian models by MCMC methods (Gilks et al., 1996), and
this needs to be kept in mind when performing MI. However, many software packages downplay
this aspect, focusing instead on specifying the imputation model.
1
Such as using the chained equations approach, also referred to as the full conditional specification
2
These are simulation based methods for obtaining parameter estimates in Bayesian models

1
However, with complex imputation models and/or large data sets, the default settings in software
packages may be inappropriate. If possible, graphical diagnostics should be examined to check the
process is working satisfactorily.

3 Distributional assumptions

In statistical modelling, considerable care is usually taken to ensure that the distributional model for
the data is adequate. For example, binary data is rarely analysed using a linear regression. However,
the imputation model is often based on a multivariate normal distribution. In other words, a mix
of quantitative, ordinal and categorical data will be treated as joint multivariate normal for the
purpose of imputation.
The extent to which this will invalidate the results is difficult to quantify, as it depends critically
on the main analysis and the pattern and extent of missing data. A number of simulation studies
have appeared which suggest that it is not as misleading as it might appear when:

1. the extent of missing information in a non-quantitative variables is not too great.

In this case, multiple imputation is allowing the inclusion in the analysis of a large number of
individuals with a few missing observations, so that very accurate predictions of their missing
data are not necessary

2. for binary/ordinal data the probabilities are not too close to 0 or 1.

In this setting, logistic/probit analysis is reasonably close to linear regression.

In practice, one should check whether the results of MI are resting inappropriately on these assump-
tions. Analysts should be on the look out for binary or ordinal variables with a high proportion of
identical values, especially if their effect changes markedly between the CC and MI analysis.
An attraction of MI is that data can be transformed before imputation to be more nearly normal,
and back transformed after imputation for the analysis of interest. Some recent work, informally
reported to us, confirms that this generally reduces bias and improves statistical properties. Both
skew quantitative and ordinal variables can benefit from transformation before imputation. Where
possible, categorical variables can benefit from re-ordering so they are closer to ordinal. In practice
the problem of ill specified imputation distributions is also likely to be greater when the aim of
the analysis is to estimate properties of distributions such as means and percentiles, rather than
regression coefficients.
This problem may also be alleviated by using the so called ‘chained equations’ or ‘full conditional
specification’ imputation methods. These methods use appropriate conditional models for each
variable (i.e. logistic for binary variables and so on). Quantitative variables can still benefit from
transformation to approximate normality.

4 Perfect prediction

Handling the imputation of a large number of categorical variables can be tricky (what ever approach
is used) because of problems of perfect prediction. If near perfect prediction occurs, fitted values for
a particular subset of the data become very close to 1 (or 0), and their standard errors also become

2
very large. This can result in a non-trivial fraction of the imputed data being inappropriate. This
may result in increased variance of parameter estimates from CC to MI, and unexpected changes
in direction.
While these problems are easy to spot in more simple problems, this is not the case in complex
settings. Further many software packages suppress the information needed to spot this. Some work
has been done to address this issue semi-automatically during imputation; nevertheless analysts
need to be aware of the potential for errors in these situations.

5 Out of sample imputation

This final issue occurs when the majority of the missing values occur in a particular area of the
data set (for example at later waves in a longitudinal study, or among particular socio-economic
groups).
In this case, the imputation model, which is by necessity fitted to the observed data, may not be
an appropriate choice for imputation. The problem is analogous to making predictions ‘out of the
original sample’ in regression. For instance, we estimate the relationship between age and lung
function in adults, and then us this to impute missing lung function in children.
Again, most software packages used for imputation will not flag this up as an issue, so the analyst
needs to be aware of it. The issue is clearly more acute the higher the proportion of missing data,
and the greater the focus in the model of interest on contrasts between well observed and partially
observed parts of the dataset.

6 Implications

It is not always realised that MI involves statistical modelling that is an order of magnitude more
complicated that the original model of interest, because it always involves (either explicitly or
implicitly) a joint model for the data.
Fitting such models is not straightforward: there are a number of pitfalls, of which the above are
the most common in our experience.
The extent to which these pitfalls will mislead depends on both the model of interest and the degree
of missing information. In practice therefore, it is a good to

1. compare the results of MI and the CC analysis, to check that differences between them are
plausible given the scentific context and the likely missing data mechanisms, and
2. consider checking the points discussed above, possibly redoing the analysis in a different
package to give greater confidence in the results.

One issue we have not touched upon here that is also important, is that the structure of the model
of interest (response, interactions, non-linearities, hierarchical aspects) needs to be included in the
imputation model for valid results (King et al., 2001).
Although these caveats may appear discouraging, we believe multiple imputation has a key role to
play in the analysis of large, complex datasets with missing data. However, like all powerful tools,
users need to be aware of the risks and how to minimse them, if they are to reap the full benefits.

3
References

Carpenter, J. R. and Kenward, M. G. (2008) Missing data in clinical trials — a practical guide.
Birmingham: National Health Service Co-ordinating Centre for Research Methodology. Free from
http://www.pcpoh.bham.ac.uk/publichealth/methodology/projects/RM03 JH17 MK.shtml.

Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. (1996) Markov chain Monte-Carlo in practice.
London: Chapman and Hall.

King, G., Honaker, J., Joseph, A. and Scheve, K. (2001) Analyzing incomplete political science
data: an alternative algorithm for multiple imputation. The American Political Science Review,
95, 49–69.

Dana Marniche Africa Arabian Origins of The Israelites and The Ishmaelites Part
100% (2)
Dana Marniche Africa Arabian Origins of The Israelites and The Ishmaelites Part
20 pages
White 2010
No ratings yet
White 2010
23 pages
Missing_Data
No ratings yet
Missing_Data
71 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
2019 Multiple Imputations
No ratings yet
2019 Multiple Imputations
27 pages
Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
No ratings yet
Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
23 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Multiple Imputation in Practice
No ratings yet
Multiple Imputation in Practice
11 pages
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
v93b01
No ratings yet
v93b01
4 pages
Missing Values
No ratings yet
Missing Values
16 pages
Imputation
No ratings yet
Imputation
10 pages
ST-14 Handling Missing Data With Multiple Imputation Using PROC MI in SAS
No ratings yet
ST-14 Handling Missing Data With Multiple Imputation Using PROC MI in SAS
5 pages
Advanced Handling of Missing Data: One-Day Workshop
No ratings yet
Advanced Handling of Missing Data: One-Day Workshop
38 pages
Missing Data and Multi Imputation
No ratings yet
Missing Data and Multi Imputation
5 pages
Multiple Imputation in Practice Using IVEware First Edition Berglund pdf download
No ratings yet
Multiple Imputation in Practice Using IVEware First Edition Berglund pdf download
55 pages
Multiple Imputation of Predictor Variables Using Generalized Additive Models
No ratings yet
Multiple Imputation of Predictor Variables Using Generalized Additive Models
27 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
01-dealing-with-missing-data-the-art-and-science-of-imputation
No ratings yet
01-dealing-with-missing-data-the-art-and-science-of-imputation
26 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
No ratings yet
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
20 pages
The proportion of missing data should not be used to guide decisions on
No ratings yet
The proportion of missing data should not be used to guide decisions on
11 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Multiple
No ratings yet
Multiple
30 pages
603-8-1 Donders - J Clin Epidemiol 2006 v59 n10 p1087-91
No ratings yet
603-8-1 Donders - J Clin Epidemiol 2006 v59 n10 p1087-91
5 pages
Journal of Statistical Software: Multiple Imputation With Diagnostics (Mi) in R: Opening Windows Into The Black Box
No ratings yet
Journal of Statistical Software: Multiple Imputation With Diagnostics (Mi) in R: Opening Windows Into The Black Box
31 pages
2013 Kropko+ (Gelman, Hill) MI
No ratings yet
2013 Kropko+ (Gelman, Hill) MI
31 pages
Download Full (Ebook) Multiple Imputation in Practice Using IVEware by Berglund, Patricia A.; Raghunathan, Trivellore; Solenberger, Peter W ISBN 9781315154275, 9781351640794, 9781351650311, 9781498770163, 9781498770170, 1315154277, 1351640798, 1351650319, 1498770169 PDF All Chapters
100% (7)
Download Full (Ebook) Multiple Imputation in Practice Using IVEware by Berglund, Patricia A.; Raghunathan, Trivellore; Solenberger, Peter W ISBN 9781315154275, 9781351640794, 9781351650311, 9781498770163, 9781498770170, 1315154277, 1351640798, 1351650319, 1498770169 PDF All Chapters
67 pages
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
No ratings yet
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
17 pages
Mice vs Ppca
No ratings yet
Mice vs Ppca
8 pages
How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory
No ratings yet
How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory
8 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
dejong2014
No ratings yet
dejong2014
19 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Fitting Additive Hazards Models For Case-Cohort Studies: A Multiple Imputation Approach
No ratings yet
Fitting Additive Hazards Models For Case-Cohort Studies: A Multiple Imputation Approach
31 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
No ratings yet
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
13 pages
Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis
No ratings yet
Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis
30 pages
Flexible Imputation of Missing Data
100% (2)
Flexible Imputation of Missing Data
444 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
Missing Data Mechanisms and Imputation Methods
No ratings yet
Missing Data Mechanisms and Imputation Methods
16 pages
Ojs 2023121515132551
No ratings yet
Ojs 2023121515132551
22 pages
Handling Missing Values
No ratings yet
Handling Missing Values
182 pages
What is MICE and How it works-2850 Citations
No ratings yet
What is MICE and How it works-2850 Citations
10 pages
Complete Download Multiple Imputation in Practice Using IVEware First Edition Berglund PDF All Chapters
100% (2)
Complete Download Multiple Imputation in Practice Using IVEware First Edition Berglund PDF All Chapters
52 pages
Multiple Imputation and its Application 2nd Edition Scribd PDF Download
100% (10)
Multiple Imputation and its Application 2nd Edition Scribd PDF Download
17 pages
Missing Data 1st Edition Paul D. Allison - Quickly download the ebook to read anytime, anywhere
No ratings yet
Missing Data 1st Edition Paul D. Allison - Quickly download the ebook to read anytime, anywhere
82 pages
a-comparison-of-three-popular-methods-for-handling-missing-data-complete-case-analysis-inverse
No ratings yet
a-comparison-of-three-popular-methods-for-handling-missing-data-complete-case-analysis-inverse
31 pages
Dyad 008
No ratings yet
Dyad 008
8 pages
Download Multiple Imputation in Practice Using IVEware First Edition Berglund ebook All Chapters PDF
100% (4)
Download Multiple Imputation in Practice Using IVEware First Edition Berglund ebook All Chapters PDF
65 pages
Journal of Statistical Software: MICE: Multivariate Imputation by Chained Equations in R
No ratings yet
Journal of Statistical Software: MICE: Multivariate Imputation by Chained Equations in R
68 pages
Jornadas de Estad Istica Aplicada, Universidad de Chimborazo, Riobamba, Ecuador, 10 - 13th June 2013
No ratings yet
Jornadas de Estad Istica Aplicada, Universidad de Chimborazo, Riobamba, Ecuador, 10 - 13th June 2013
28 pages
Schafer SMMR 1999 MI Primer
No ratings yet
Schafer SMMR 1999 MI Primer
14 pages
Statistics: Basic Principles and Applications
From Everand
Statistics: Basic Principles and Applications
Ramune B. Adams
No ratings yet
Data - Preprocessing - 2
No ratings yet
Data - Preprocessing - 2
10 pages
(Ebook) Missing Data by Paul D. Allison ISBN 0761916725 download
100% (1)
(Ebook) Missing Data by Paul D. Allison ISBN 0761916725 download
61 pages
Rubin - Multiple imputation after 18+ years
No ratings yet
Rubin - Multiple imputation after 18+ years
17 pages
Measurement Error in Dynamic Models
From Everand
Measurement Error in Dynamic Models
Pasquale De Marco
No ratings yet
RJwrapper
No ratings yet
RJwrapper
24 pages
CFP - Assignment in Data Coding System
No ratings yet
CFP - Assignment in Data Coding System
3 pages
Dummit and Foote Solutions To Ch1 Abstract Algebra
50% (2)
Dummit and Foote Solutions To Ch1 Abstract Algebra
45 pages
Past Tense (Waktu Lampau)
No ratings yet
Past Tense (Waktu Lampau)
4 pages
CN38 En0094 Israel
No ratings yet
CN38 En0094 Israel
1 page
4009 DSA_Week1 Course Overview & Java Basics
No ratings yet
4009 DSA_Week1 Course Overview & Java Basics
28 pages
Unit 2 Language Processing (Comprehension and Language Expression)
No ratings yet
Unit 2 Language Processing (Comprehension and Language Expression)
15 pages
RTU C50 B
No ratings yet
RTU C50 B
4 pages
Chap 5 - PPT
No ratings yet
Chap 5 - PPT
40 pages
The Man With The Hoe Analysis
No ratings yet
The Man With The Hoe Analysis
2 pages
Letter Writing:: Reading and Thoughtfully Corresponding
No ratings yet
Letter Writing:: Reading and Thoughtfully Corresponding
36 pages
KCB Temenos R21-User Guide
100% (2)
KCB Temenos R21-User Guide
15 pages
Unit 2 Grammar: Present Simple Affirmative
No ratings yet
Unit 2 Grammar: Present Simple Affirmative
5 pages
Inverse Functions
No ratings yet
Inverse Functions
4 pages
Bloom's Taxonomy: Aamir Hussain Shahani
No ratings yet
Bloom's Taxonomy: Aamir Hussain Shahani
27 pages
Tugas CMD
No ratings yet
Tugas CMD
17 pages
EA FC 24 Card Creator Generator FUTBIN
No ratings yet
EA FC 24 Card Creator Generator FUTBIN
1 page
Kalachuvadu Catalog 2015 2016
0% (2)
Kalachuvadu Catalog 2015 2016
33 pages
CEMT 5240: Building Information Modeling: Industry Foundation Classes (IFC)
No ratings yet
CEMT 5240: Building Information Modeling: Industry Foundation Classes (IFC)
12 pages
Class One
No ratings yet
Class One
3 pages
The Rise of Filipino Mysticism Anting an-1
No ratings yet
The Rise of Filipino Mysticism Anting an-1
44 pages
大学英语二试卷
No ratings yet
大学英语二试卷
3 pages
Grand Canyon
No ratings yet
Grand Canyon
7 pages
Career - Copy
No ratings yet
Career - Copy
81 pages
Hopsin
No ratings yet
Hopsin
6 pages
Source Code For Cookie Example
No ratings yet
Source Code For Cookie Example
29 pages
Embedded Software Design Philosophy - Embedded Software Design - A Practical Approach To Architecture, Processes, and Coding Techniques
No ratings yet
Embedded Software Design Philosophy - Embedded Software Design - A Practical Approach To Architecture, Processes, and Coding Techniques
28 pages
Tle 9 DLL Week 3
No ratings yet
Tle 9 DLL Week 3
5 pages
National Bible Sunday
No ratings yet
National Bible Sunday
17 pages
Algorithms and Programming1
No ratings yet
Algorithms and Programming1
18 pages

37316

Uploaded by

37316

Uploaded by

Brief comments on computational issues with multiple imputation

James Carpenter and Mike Kenward

1. the extent of missing information in a non-quantitative variables is not too great.

2. for binary/ordinal data the probabilities are not too close to 0 or 1.

5 Out of sample imputation

You might also like