## 18.6 Multiple Imputation (MI)

### 18.6.1 Goals

• Accurately reflect available information
• Avoid bias in estimates of quantities of interest
• Estimation could involve explicit or implicit model
• Accurately reflect uncertainty due to missingness

### 18.6.2 Technique

1. For each missing value, impute $$m$$ estimates (usually $$m$$ = 5)
• Imputation method must include a random component
2. Create $$m$$ complete data sets
3. Perform desired analysis on each of the $$m$$ complete data sets
4. Combine final estimates in a manner that accounts for the between, and within imputation variance. Diagram of Multiple Imputation process. Credit: https://stefvanbuuren.name/fimd/sec-nutshell.html

### 18.6.3 MI as a paradigm

• Logic: “Average over” uncertainty, don’t assume most likely scenario (single imputation) covers all plausible scenarios
• Principle: Want nominal 95% intervals to cover targets of estimation 95% of the time
• Simulation studies show that, when MAR assumption holds:
• Proper imputations will yield close to nominal coverage (Rubin 87)
• Improvement over single imputation is meaningful
• Number of imputations can be modest - even 2 adequate for many purposes, so 5 is plenty

Rubin 87: Multiple Imputation for Nonresponse in Surveys, Wiley, 1987).

### 18.6.4 Inference on MI

Consider $$m$$ imputed data sets. For some quantity of interest $$Q$$ with squared $$SE = U$$, calculate $$Q_{1}, Q_{2}, \ldots, Q_{m}$$ and $$U_{1}, U_{2}, \ldots, U_{m}$$ (e.g., carry out $$m$$ regression analyses, obtain point estimates and SE from each).

Then calculate the average estimate $$\bar{Q}$$, the average variance $$\bar{U}$$, and the variance of the averages $$B$$.

\begin{aligned} \bar{Q} & = \sum^{m}_{i=1}Q_{i}/m \\ \bar{U} & = \sum^{m}_{i=1}U_{i}/m \\ B & = \frac{1}{m-1}\sum^{m}_{i=1}(Q_{i}-\bar{Q})^2 \end{aligned}

Then $$T = \bar{U} + \frac{m+1}{m}B$$ is the estimated total variance of $$\bar{Q}$$.

Significance tests and interval estimates can be based on

$\frac{\bar{Q}-Q}{\sqrt{T}} \sim t_{df}, \mbox{ where } df = (m-1)(1+\frac{1}{m+1}\frac{\bar{U}}{B})^2$

• df are similar to those for comparison of normal means with unequal variances, i.e., using Satterthwaite approximation.
• Ratio of (B = between-imputation variance) to (T = between + within-imputation variance) is known as the fraction of missing information (FMI).
• The FMI has been proposed as a way to monitor ongoing data collection and estimate the potential bias resulting from survey non-responders Wagner, 2018