40-MODULE7_ Missing Data and Outlier Prediction-03-04-2023
40-MODULE7_ Missing Data and Outlier Prediction-03-04-2023
These terms are widely used, but are a bit misleading. When we say data
are missing completely at random, we mean that the missingness is nothing to do
with the person being studied. For example, a questionnaire might be lost in the
post, or a blood sample might be damaged in the lab. In CADET, sex might be
MCAR. Of course, this is not truly random, but means that whether something is
missing is not related to the subject of the missing data.
When we say data are missing at random, we mean that the missingness is to do
with the person but can be predicted from other information about the person. It is
not specifically related to the missing information. For example, if a child does not
attend an educational assessment because the child is (genuinely) ill, this might be
predictable from other data we have about the child’s health, but it would not be
related to what we would have measured had the child not been ill. Are the
depression data MAR? We cannot tell this from the data. We know that the PHQ9
scores are not MCAR, because the proportions missing in the two treatment groups
are different. We know that at least one observation is not MAR, because,
tragically, the participant had committed suicide. This is always a danger in
depression research.
When data are missing not at random, the missingness is specifically related to
what is missing, e.g. a person does not attend a drug test because the person took
drugs the night before. The suicide victim has the PHQ9 at 4 months MNAR. The
problem is to decide which of these situations we have and in the same dataset we
may have some data missing for each reason. We had some missing data in the
foot ulcer data in Table 10.2. Some of the capillary densities were missing because
the skin biopsy was not usable to count the capillaries. We could regard these as
MCAR. Some were missing because the foot had been amputated. As a frequent
reason for foot amputation is gangrene from severe foot ulcers, I think we would
have to classify these as MNAR.