18.4 General strategies
Strategies for handling missing data include:
- Complete-case/available-case analysis: drop cases that make analysis inconvenient.
- If variables are known to contribute to the missing values, then appropriate modeling can often account for the missingness.
- Imputation procedures: fill in missing values, then analyze completed data sets using complete-date methods
- Weighting procedures: modify “design weights” (i.e., inverse probabilities of selection from sampling plan) to account for probability of response
- Model-based approaches: develop model for partially missing data, base inferences on likelihood under that model
18.4.1 Complete cases analysis
If not all variables observed, delete case from analysis
- Advantages:
- Simplicity
- Common sample for all estimates
- Disadvantages:
- Loss of valid information
- Bias due to violation of MCAR
18.4.2 Available-case analysis
- Use all cases where the variable of interest is present
- Potentially different sets of cases for means of X and Y
- and complete pairs for \(r_{XY}\)
- Tempting to think that available-case analysis will be superior to complete-case analysis
- But it can distort relationships between variables by not using a common base of observations for all quantities being estimated.
18.4.3 Imputation
Fill in missing values, analyze completed data set
- Advantage:
- Rectangular data set easier to analyze
- Analysis data set \(n\) matches summary table \(n\)
- Disadvantage:
- “Both seductive and dangerous” (Little and Rubin)
- Can understate uncertainty due to missing values.
- Can induce bias if imputing under the wrong model.