9.2 Parameter Estimation

Recall the goal of regression analysis is to minimize the unexplained/residual error. That is, to minimize the difference between the value of the dependent variable predicted by the model and the true value of the dependent variable.

\[ \hat{y_{i}} - y_{i}, \]

where the predicted values \(\hat{y}_{i}\) are calculated as

\[\hat{y}_{i} = \sum_{i=1}^{p}X_{ij}\beta_{j}\]

The sum of the squared residual errors (the distance between the observed point \(y_{i}\) and the fitted value) now has the following form:

\[ \sum_{i=1}^{n} |y_{i} - \sum_{i=1}^{p}X_{ij}\beta_{j}|^{2}\]

Or in matrix notation

\[ || \mathbf{Y} - \mathbf{X}\mathbf{\beta} ||^{2} \]

Solving this least squares problem for multiple regression requires knowledge of multivariable calculus and linear algebra, and so is left to a course in mathematical statistics.