London School of Hygiene and Tropical Medicine logo
Research Developer Initiative logo Economic and Social Research Council logo
Home FAQs Missing covariates Imputing interaction terms in multi-level models

Imputing interaction terms in multi-level models

Question: Unfortunately, all variables in my model of interest (MOI), from both levels of analysis, include missing cases. Furthermore, I wish to examine interaction terms of some of these variables, where one of these is a interaction of two level 1 categorical variables, another is an interaction of a level 1 continues variable with a categorical one, and yet another is a cross level interaction of a continues level 1 variable with a continues level 2 variable. I'm trying to use the just another variable (JAV) approach to impute the missing cases in all these variables. Essentially, the model is:

realcomImpute X1 m.X2 o.X3 X1*X2 X2*X3 W1*X1 W1 W2 X4 W5 using MIInput.dat, replace numresponses(8) level2id(school) cons(cons)

My problem is quite technical – how do I define the interaction terms as responses when using the realcomImpute command. Obviously, the '*' symbol is wrong. I tried using '#' but it doesn’t work. Is it right to create new variables, multiplying all relevant variables and creating the dummies (in the case of the categorical interaction) although they are response variables here? Will Realcom know to handle this case properly (It will have the same missing cases in all the dummies)? If not, what is the correct syntax?


Answer: Leaving aside for a moment whether JAV with multi-level data is a reasonable approach, you need to manually define the new variables in Stata for each of the interaction terms which you wish to impute, and then pass these as additional response variables to the realcomImpute command.

JAV is justified for single-level linear regression models when data are MCAR because the model parameters are functions only of the first and second moments of the random variables. Normality assumptions for random effects and residuals are usually made when estimating linear mixed models, yet at least in terms of parameter estimation I believe it can be shown that the MLE (assuming normality) are consistent provided certain first and second moment conditions are satisfied. It may therefore be the case that JAV may work, under MCAR, for certain (multi-level) linear mixed models, but this is a conjecture on my part.
An alternative approach you could consider is to impute separately within each school using the smcfcs command available here. This would allow you to impute the interaction terms from models which are compatible/congenial with your model for the outcome. Having generated say 10 imputations separately for each school, you could then combine them (i.e. join the first imputed datasets across schools, then the second imputed datasets across schools), then fit your outcome/substantive model as usual, using mi estimate. Unfortunately the smcfcs command does not yet support m. and o. for unordered and ordered imputation, but this will be available soon.