18.4 General strategies

Strategies for handling missing data include:

  • Complete-case/available-case analysis: drop cases that make analysis inconvenient.
  • If variables are known to contribute to the missing values, then appropriate modeling can often account for the missingness.
  • Imputation procedures: fill in missing values, then analyze completed data sets using complete-date methods
  • Weighting procedures: modify “design weights” (i.e., inverse probabilities of selection from sampling plan) to account for probability of response
  • Model-based approaches: develop model for partially missing data, base inferences on likelihood under that model

18.4.1 Complete cases analysis

If not all variables observed, delete case from analysis

  • Advantages:
    • Simplicity
    • Common sample for all estimates
  • Disadvantages:
    • Loss of valid information
    • Bias due to violation of MCAR

18.4.2 Available-case analysis

  • Use all cases where the variable of interest is present
    • Potentially different sets of cases for means of X and Y
    • and complete pairs for \(r_{XY}\)
  • Tempting to think that available-case analysis will be superior to complete-case analysis
  • But it can distort relationships between variables by not using a common base of observations for all quantities being estimated.

18.4.3 Imputation

Fill in missing values, analyze completed data set

  • Advantage:
    • Rectangular data set easier to analyze
  • Disadvantage:
    • “Both seductive and dangerous” (Little and Rubin)
    • Can understate uncertainty due to missing values.
    • Can induce bias if imputing under the wrong model.