7.2 Mathematical Model
The mathematical model that we use for regression has three features.
- \(Y\) values are normally distributed at any given \(X\)
- The mean of \(Y\) values at any given \(X\) follows a straight line \(Y = \beta_{0} + \beta_{1} X\).
- The variance of \(Y\) values at any \(X\) is \(\sigma^2\) (same for all X). This is known as homoscedasticity, or homogeneity of variance.
Mathematically this is written as:
\[ Y|X \sim N(\mu_{Y|X}, \sigma^{2}) \\ \mu_{Y|X} = \beta_{0} + \beta_{1} X \\ Var(Y|X) = \sigma^{2} \]
and can be visualized as:
7.2.1 Unifying model framework
The mathematical model above describes the theoretical relationship between \(Y\) and \(X\). So in our unifying model framework to describe observed data,
DATA = MODEL + RESIDUAL
Our observed data values \(y_{i}\) can be modeled as being centered on \(\mu_{Y|X}\), with normally distributed residuals.
\[ y_{i} = \beta_{0} + \beta_{1} X + \epsilon_{i} \\ \epsilon_{i} \sim N(0, \sigma^{2}) \]