## 10.3 Multicollinearity

• Occurs when some of the X variables are highly intercorrelated.
• Computed estimates of regression coefficients are unstable and have large standard errors.

For example, the squared standard error of the $$i$$th slope coefficient ($$[SE(\beta_{i})]^2$$) can be written as:

$[SE(\beta_{i})]^2 = \frac{S^{2}}{(N-1)(S_{i}^{2})}*\frac{1}{1 - (R_{i})^2}$

where $$S^{2}$$ is the residual mean square, $$S_{i}$$ the standard deviation of $$X_{i}$$, and $$R_{i}$$ the multiple correlation between $$X_{i}$$ and all other $$X$$’s.

When $$R_{i}$$ is close to 1 (very large), $$1 - (R_{i})^2$$ becomes close to 0, which makes $$\frac{1}{1 - (R_{i})^2}$$ very large.

This fraction is called the variance inflation factor and is available in most model diagnostics.

big.pen.model <- lm(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm, data=pen)
performance::check_collinearity(big.pen.model) |> plot() • Solution: use variable selection to delete some X variables.
• Alternatively, use dimension reduction techniques such as Principal Components (Chapter 14).