## 14.3 More Generally

We want

• From $$P$$ original variables $$X_{1}, \ldots , X_{P}$$ get $$P$$ principal components $$C_{1}, \ldots , C_{P}$$
• Where each $$C_{j}$$ is a linear combination of the $$X_{i}$$’s: $$C_{j} = a_{j1}X_{1} + a_{j2}X_{2} + \ldots + a_{jP}X_{P}$$
• The coefficients are chosen such that $$Var(C_{1}) \geq Var(C_{2}) \geq \ldots \geq Var(C_{P})$$
• Variance is a measure of information. Consider modeling prostate cancer.
• Gender has 0 variance. No information.
• Size of tumor: the variance is > 0, it provides useful information.
• Any two PC’s are uncorrelated: $$Cov(C_{i}, C_{j})=0, \quad \forall i \neq j$$

We have

$\left[ \begin{array}{r} C_{1} \\ C_{2} \\ \vdots \\ C_{P} \end{array} \right] = \left[ \begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1P} \\ a_{21} & a_{22} & \ldots & a_{2P} \\ \vdots & \vdots & \ddots & \vdots \\ a_{P1} & a_{P2} & \ldots & a_{PP} \end{array} \right] \left[ \begin{array}{r} X_{1} \\ X_{2} \\ \vdots \\ X_{P} \end{array} \right]$

• Hotelling (1933) showed that the columns of the matrix $$a_{ij}$$ are solutions to $$(\mathbf{\Sigma} -\lambda\mathbf{I})\mathbf{a}=\mathbf{0}$$.
• $$\mathbf{\Sigma}$$ is the variance-covariance matrix of the $$\mathbf{X}$$ variables.
• This means $$\lambda$$ is an eigenvalue and $$\mathbf{a}$$ an eigenvector of the covariance matrix $$\mathbf{\Sigma}$$.
• Problem: There are infinite number of possible $$\mathbf{a}$$’s
• Solution: Choose $$a_{ij}$$’s such that the sum of the squares of the coefficients for any one eigenvector is = 1.
• $$P$$ unique eigenvalues and $$P$$ corresponding eigenvectors.

Which gives us

• Variances of the $$C_{j}$$’s add up to the sum of the variances of the original variables (total variance).
• Can be thought of as variance decomposition into orthogonal (independet) vectors (variables).
• With $$Var(C_{1}) \geq Var(C_{2}) \geq \ldots \geq Var(C_{P})$$.