Complete case analysis

The data on the left below has one missing observation on variable 2, unit 10.

  • Completer case analysis deletes all units with incomplete data (in the variables involved) from the analysis (here unit 10).


  • It is inefficient.


  • It is problematic in regression when covariate values are missing and models with several sets of explanatory variables need to be compared. Either we keep changing the size of the data set, as we add/remove explanatory variables with missing observations, or we use the (potentially very small, and unrepresentative) subset of the data with no missing values.


  • When the missing observations are not a completely random selection of the data, a completers analysis may give biased estimates and invalid inferences.


  • In the context of fitting a regression model, if the probability of being a complete case does not depend on the outcome variable, conditional on the covariates (whether or not these have missing values themselves), complete case analysis is valid.