## 18.5 Imputation Methods

• Unconditional mean substitution. Never use
• Impute all missing data using the mean of observed cases
• Artificially decreases the variance.
• Hot deck imputation
• Impute values by randomly sampling values from observed data.
• Good for categorical data
• Reasonable for MCAR and MAR
• Model based imputation
• Conditional Mean imputation: Use regression on observed variables to estimate missing values
• Predictive Mean Matching: Fills in a value randomly by sampling observed values whose regression-predicted values are closest to the regression-predicted value for the missing point.
• Cross between hot-deck and conditional mean
• Categorical data can be imputed using classification models
• Less biased than mean substitution
• but SE’s could be inflated
• Impute regression value $$\pm$$ a randomly selected residual based on estimated residual variance