1.3 Data Management

Questions to ask yourself while reviewing the codebook to choose variables to be used in an analysis.

  1. Are there codes that indicate missing? E.g. MISSING or -99?
  2. Do you need to make response codes more logical?
    • Some systems will record 1=YES and 2=NO. This should be changed to 0=NO.
  3. Do you need to recode numerical variables to categorical?
    • Sometimes categorical data will be recorded as 1, 2, 3 etc when those numbers represent named categories.
  4. Do you need to create secondary variables such as an average across measures to create a score.
  5. Are the variable names user friendly? Mixtures of CAPS and lower case, names with spaces or special characters should all be changed.

Some of these answers will come only after you look at your data. This can be looking at the raw data itself but also looking at tables and charts generated from the data. Often when you try to create a plot or table you will encounter an error or something odd looking that will be the notification that something has to be adjusted.

The next sections go over a few of the common data management processes, but is not comprehensive, and may only show one method for cleaning. There are always different ways to accomplish tasks.