Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation (MI). The imputation of partially observed covariates is complicated if the model of interest is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of MI may impute covariates from models that are uncompatible (uncongenial) with such models of interest, which may result in biased estimates. We have recently proposed a modified version of the popular fully conditional specification (FCS) (or chained equations) approach to multiple imputation, which ensures that each partially observed covariate is imputed from a model which is compatible with the specified model for the outcome.
A paper describing the method has been published in Statistical Methods in Medical Research:
Jonathan W. Bartlett, Shaun R. Seaman, Ian R. White, James R. Carpenter. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model, Statistical Methods in Medical Research, 2015; 24:462-487
smcfcs in R
An R package implementing the approach is now available, and can be installed from within R from CRAN:
The latest development version can be installed into R from GitHub using:
smcfcs in Stata
A Stata program implementing the approach for linear, logistic and Cox proportional hazards outcome models is available for free download. Imputation is now supported for continuous (under the normal linear regression model), binary (under the logistic model), count (using either Poisson or negative binomial regression models), and categorical (using ordered logistic or multinomial logistic regression) covariates.
To install, load Stata, and at the command window type:
ssc install smcfcs
The latest development version can be installed into Stata from GitHub using:
net install smcfcs, from (https://raw.githubusercontent.com/jwb133/Stata-smcfcs/master/) replace
More details about the Stata package can be found in an accompanying Stata journal paper:
Bartlett JW, Morris TP. 2015. Multiple imputation of covariates by substantive-model compatible fully conditional specification. The Stata Journal; 15(2): 437-456