Home Introduction to missing data Missingness mechanisms Missing Completely at Random (MCAR)

Missing Completely at Random (MCAR)

Suppose the probability of an observation being missing does not depend on observed or unobserved measurements. In mathematical terms, we write this as


Pr(ryoym) = Pr(r)


Then we say that the observation is Missing Completely At Random, which is often abbreviated to MCAR.

Note that in a sample survey setting MCAR is sometimes called uniform non-response.

If data are MCAR, then consistent results with missing data can be obtained by performing the analyses we would have used had their been no missing data, although there will generally be some loss of information. In practice this means that, under MCAR, the analysis of only those units with complete data gives valid inferences.

An example of a MCAR mechanism would be that a laboratory sample is dropped, so the resulting observation is missing.

However, many mechanisms that initially seem to be MCAR may turn out not to be. For example, a patient in a clinical trial may be lost to follow up after 'falling' under a bus; however if it is a psychiatric trial, this may be an indication of poor response to treatment. Likewise, if a response to a postal questionnaire is missing because the questionnaire was lost or stolen in the post, this may not be random but rather reflect the area in which the sorting office is located.

As we have already said, under MCAR analyses of completers only (a short hand for including in the analysis only units with fully observed data) give valid inferences.

So do analyses based on moment based estimators (for example, generalised estimating equations), and other estimators derived from consistent estimating equations.

By consistent estimating equations we mean functions of the data and unknown parameters whose expectation, taken over the complete data at the population parameter values, is zero. Under MCAR, they still have expectation zero, and so still lead to valid inferences.

Saying the same thing mathematically, an estimating equation can be written as U(y,$ \theta$), and at the estimate $ \hat{\theta}$U(y,$ \hat{\theta}$) = 0. The estimating equation is consistent because EU(Y,$ \theta$) = 0 (where $ \theta$ is the population parameter value). It remains consistent if the data are missing completely at random (MCAR) because, even then, still EU(Yo,$ \theta$) = 0.

A simple example of a consistent estimating equation is the sample mean, U(y,$ \theta$) = $ \bar{{y}}$$ \theta$.