18.5 Imputation Methods

  • Unconditional mean substitution. Never use
    • Impute all missing data using the mean of observed cases
    • Artificially decreases the variance.
  • Hot deck imputation
    • Impute values by randomly sampling values from observed data.
    • Good for categorical data
    • Reasonable for MCAR and MAR
  • Model based imputation
    • Conditional Mean imputation: Use regression on observed variables to estimate missing values
    • Predictive Mean Matching: Fills in a value randomly by sampling observed values whose regression-predicted values are closest to the regression-predicted value for the missing point.
      • Cross between hot-deck and conditional mean
    • Categorical data can be imputed using classification models
    • Less biased than mean substitution
    • but SE’s could be inflated
  • Adding a residual
    • Impute regression value \(\pm\) a randomly selected residual based on estimated residual variance
    • Over the long-term, we can reduce bias, on the average

…but we can do better.