## 14.4 R commands

Calculating the principal components in R can be done using the function prcomp(), princomp() and functions from the factoextra package. This section of notes uses princomp() to generate the PCAs and helper functions from factoextra package. STHDA is a great reference for these functions.

### 14.4.1 Generating PC’s

The matrix that is used in princomp must be fully numeric.

pr <- princomp(data)

### 14.4.2 Viewing the amount of variance contained by each PC

Use summary or get_eigenvalue to see the variance breakdown.

summary(pr)
## Importance of components:
##                            Comp.1    Comp.2
## Standard deviation     11.4019265 4.2236767
## Proportion of Variance  0.8793355 0.1206645
## Cumulative Proportion   0.8793355 1.0000000
factoextra::get_eigenvalue(pr)
##       eigenvalue variance.percent cumulative.variance.percent
## Dim.1  130.00393         87.93355                    87.93355
## Dim.2   17.83944         12.06645                   100.00000

The first PC (Comp.1) will always explain the highest proportion of variance (by mathematical design).

#### 14.4.3.1 As a matrix of values

• The values for the matrix $$\mathbf{A}$$ is contained in pr$loadings. Alternatively the loadings function will extract this matrix. pr$loadings
##
##    Comp.1 Comp.2
## X1  0.854  0.519
## X2  0.519 -0.854
##
##                Comp.1 Comp.2
## Proportion Var    0.5    0.5
## Cumulative Var    0.5    1.0
##
##    Comp.1 Comp.2
## X1  0.854  0.519
## X2  0.519 -0.854
##
##                Comp.1 Comp.2
## Proportion Var    0.5    0.5
## Cumulative Var    0.5    1.0

$C_{1} = 0.854x_1 + 0.519X_2 \\ C_{2} = 0.519x_1 - 0.854X_2$

#### 14.4.3.2 As a vector plot

To visualize how these two new PC’s create new axes these new axes, we plot the centered data.

#### 14.4.3.4 As a strength of reprensetation

Contribution of rows/columns to the PC’s. For a given dimension, any row/column with a contribution above the reference line could be considered as important in contributing to the dimension.

fviz_contrib(pr, choice = "var", axes = 1)

X1 contributes more than half of the amount of information to PC1 compared to X2

#### 14.4.3.5 As a correlation circle

With only 2 PC’s this isn’t that informative. The later example and the vignette are likely more helpful.

See STDHA correlation circle for detailed information.

fviz_pca_var(pr, col.var = "contrib", axes=c(1,2),
)