18.5 Imputation Methods
- Unconditional mean substitution. Never use
- Impute all missing data using the mean of observed cases
- Artificially decreases the variance.
- Hot deck imputation
- Impute values by randomly sampling values from observed data.
- Good for categorical data
- Reasonable for MCAR and MAR
- Impute values by randomly sampling values from observed data.
- Model based imputation
- Conditional Mean imputation: Use regression on observed variables to estimate missing values
- Predictive Mean Matching: Fills in a value randomly by sampling observed values whose regression-predicted values are closest to the regression-predicted value for the missing point.
- Cross between hot-deck and conditional mean
- Categorical data can be imputed using classification models
- Less biased than mean substitution
- but SE’s could be inflated
- Adding a residual
- Impute regression value \(\pm\) a randomly selected residual based on estimated residual variance
- Over the long-term, we can reduce bias, on the average
…but we can do better.