18.4 General strategies

Strategies for handling missing data include:

Complete-case/available-case analysis: drop cases that make analysis inconvenient.
If variables are known to contribute to the missing values, then appropriate modeling can often account for the missingness.
Imputation procedures: fill in missing values, then analyze completed data sets using complete-date methods
Weighting procedures: modify “design weights” (i.e., inverse probabilities of selection from sampling plan) to account for probability of response
Model-based approaches: develop model for partially missing data, base inferences on likelihood under that model

If not all variables observed, delete case from analysis

Use all cases where the variable of interest is present
- Potentially different sets of cases for means of X and Y
- and complete pairs for \(r_{XY}\)
Tempting to think that available-case analysis will be superior to complete-case analysis
But it can distort relationships between variables by not using a common base of observations for all quantities being estimated.

Fill in missing values, analyze completed data set

Advantage:
- Rectangular data set easier to analyze
- Analysis data set \(n\) matches summary table \(n\)
Disadvantage:
- “Both seductive and dangerous” (Little and Rubin)
- Can understate uncertainty due to missing values.
- Can induce bias if imputing under the wrong model.