14.4 Generating PC’s using R
Corresponding reading: PMA6 Ch 14.3-14.4
Calculating the principal components in R can be done using the function prcomp()
and princomp()
. This section of notes uses princomp()
, not for any specific reason. STHDA has a good overview of the difference between prcomp()
and princomp()
. It appears that prcomp()
may have some more post-analysis fancy visualizations available.
- Requirements of
data
: This must be a numeric matrix. Since I made up this example by generating data in section 14.2, I know they are numeric.
<- princomp(data)
pr summary(pr)
## Importance of components:
## Comp.1 Comp.2
## Standard deviation 11.4019265 4.2236767
## Proportion of Variance 0.8793355 0.1206645
## Cumulative Proportion 0.8793355 1.0000000
- The summary output above shows the first PC (
Comp.1
) explains the highest proportion of variance. - The values for the matrix \(\mathbf{A}\) is contained in
pr$loadings
.
$loadings
pr##
## Loadings:
## Comp.1 Comp.2
## X1 0.854 0.519
## X2 0.519 -0.854
##
## Comp.1 Comp.2
## SS loadings 1.0 1.0
## Proportion Var 0.5 0.5
## Cumulative Var 0.5 1.0
To visualize these new axes, we plot the centered data.
<- pr$loadings
a <- with(data, X1 - mean(X1))
x1 <- with(data, X2 - mean(X2))
x2
plot(c(-40, 40), c(-20, 20), type="n",xlab="x1", ylab="x2")
points(x=x1, y=x2, pch=16)
abline(0, a[2,1]/a[1,1]); text(30, 10, expression(C[1]))
abline(0, a[2,2]/a[1,2]); text(-10, 20, expression(C[2]))
Plot the original data on the new axes we see that PC1 and PC2 are uncorrelated. The red vectors show you where the original coordinates were at.
biplot(pr)