18.6 Multiple Imputation (MI)
18.6.1 Goals
- Accurately reflect available information
- Avoid bias in estimates of quantities of interest
- Estimation could involve explicit or implicit model
- Accurately reflect uncertainty due to missingness
18.6.2 Technique
- For each missing value, impute \(m\) estimates (usually \(m\) = 5)
- Imputation method must include a random component
- Create \(m\) complete data sets
- Perform desired analysis on each of the \(m\) complete data sets
- Combine final estimates in a manner that accounts for the between, and within imputation variance.

18.6.3 MI as a paradigm
- Logic: “Average over” uncertainty, don’t assume most likely scenario (single imputation) covers all plausible scenarios
- Principle: Want nominal 95% intervals to cover targets of estimation 95% of the time
- Simulation studies show that, when MAR assumption holds:
- Proper imputations will yield close to nominal coverage (Rubin 87)
- Improvement over single imputation is meaningful
- Number of imputations can be modest - even 2 adequate for many purposes, so 5 is plenty
Rubin 87: Multiple Imputation for Nonresponse in Surveys, Wiley, 1987).
18.6.4 Inference on MI
Consider \(m\) imputed data sets. For some quantity of interest \(Q\) with squared \(SE = U\), calculate \(Q_{1}, Q_{2}, \ldots, Q_{m}\) and \(U_{1}, U_{2}, \ldots, U_{m}\) (e.g., carry out \(m\) regression analyses, obtain point estimates and SE from each).
Then calculate the average estimate \(\bar{Q}\), the average variance \(\bar{U}\), and the variance of the averages \(B\).
\[ \begin{aligned} \bar{Q} & = \sum^{m}_{i=1}Q_{i}/m \\ \bar{U} & = \sum^{m}_{i=1}U_{i}/m \\ B & = \frac{1}{m-1}\sum^{m}_{i=1}(Q_{i}-\bar{Q})^2 \end{aligned} \]
Then \(T = \bar{U} + \frac{m+1}{m}B\) is the estimated total variance of \(\bar{Q}\).
Significance tests and interval estimates can be based on
\[\frac{\bar{Q}-Q}{\sqrt{T}} \sim t_{df}, \mbox{ where } df = (m-1)(1+\frac{1}{m+1}\frac{\bar{U}}{B})^2\]
- df are similar to those for comparison of normal means with unequal variances, i.e., using Satterthwaite approximation.
- Ratio of (B = between-imputation variance) to (T = between + within-imputation variance) is known as the fraction of missing information (FMI).
- The FMI has been proposed as a way to monitor ongoing data collection and estimate the potential bias resulting from survey non-responders Wagner, 2018