For unbiased and accurate results of a statistical analysis, sufficient data has to be present. Often times once you start slicing and dicing the data to only look at certain groups, or if you are interested in the behavior of certain variables across levels of another variable, sometimes you start to run into small sample size problems.
For example, consider marital status again. There are only 13 people who report being separated. This could potentially be too small of a group size for valid statistical analysis.
One way to deal with insufficient data within a certain category is to collapse categories. The following code uses the
recode() function from the
car package to create a new variable that I am calling
marital2 that combines the
⚠️ Note: See Math 130 lesson 06 for a better method using forcats
Always confirm your recodes. Check a table of the old variable (
marital) against the new one
This confirms that records where
marital (rows) is
Separated have the value of
marital2 (columns). And that no missing data crept up in the process. Now I can drop the temporary
marital2 variable and actually fix
marital. (keeping it clean)