13.8 Use in Multiple Regression

  • Choose a handful of few principal components to use as predictors in a regression model
    • Leads to more stable regression estimates.
  • Alternative to variable selection
    • Ex: several measures of behavior.
    • Use PC\(_{1}\) or PC\(_{1}\) and PC\(_{2}\) as summary measures of all.

13.8.1 Example: Modeling acute illness

The 20 depression questions C1:C20 were designed to be added together to create the CESD scale directly. While this is a validate measure, what if some components (e.g. had crying spells) contributes more to someones level of depression than another measure (e.g. people were unfriendly). Since the PC’s are linear combinations of the \(x\)’s, the coefficients \(a\), or the loadings, aren’t all equal as we’ve seen. So let’s see if the first two PC’s (since that’s what was chosen from the scree plot) can predict chronic illness better than the straight summative score of cesd.

1. Extract PC scores and attach them to the data.

The scores for each PC for each observation is stored in the scores list object in the pc_dep object.

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
-2.446342 0.6236068 0.1288289 -0.2546597 -0.1624772
-1.452116 -0.1763085 0.5861563 -0.6781969 -0.3225529
-1.468211 -0.4350019 0.2893955 -0.3243790 -0.2513590
-1.324852 1.7766419 1.0833599 1.2651869 -1.1339350
-1.449606 2.3576522 -0.7489288 1.9464680 1.2229057

2. Fit a model using those PC scores as covariates

Along with any other covariates chosen by other methods.

In this example, the model using the PC’s and the model using cesd were very similar. However, this is an example where an aggregate measure such as cesd has already been figured out scientifically and validated. This is not often the case, expecially in exploratory data analysis when you are not sure -how- the measures are correlated.