Situation: You want to add model predictions to the data set, but you have missing data that was automatically dropped prior to analysis.
R objects created by methods such as
glm will store the data used in the model in the model object itself in
model$data. See Chapter 12 for an example.
If your original data had missing values, here is one way to get the PC’s / factor scores for available data back onto the data set.
Method 1) Create an ID column and merge new variables onto original data. (add columns)
- If no ID column exists, create one on the original dataset
id = 1:NROW(data)
select()to extract the ID and all variables used in the factor analysis, then do a
na.omit()to drop rows with any missing data. Save this as a new complete case data set.
- Conduct PCA / Factor analysis on this new complete case data set (MINUS THE ID). Extract the PCs or factor scores.
bind_cols()to add the ID variable to the data containing factor scores.
left_join(original_data, factor_score_data)the factor scores back to the original data, using the ID variable as the joining key.
Method 2) Split the data, analyze one part then concatenate back together. (add rows)
- Use the
complete.cases()function to create a boolean vector for if each row is complete
- Split the data into complete and incomplete.
- Do the analysis on the complete rows, extracting the PC’s/Factors
- Add the PC/Factor data onto the complete rows using
bind_rowsthe two parts back together.