7.2 Mathematical Model

The mathematical model that we use for regression has three features.

  1. Y values are normally distributed at any given X
  2. The mean of Y values at any given X follows a straight line Y=β0+β1X.
  3. The variance of Y values at any X is σ2 (same for all X). This is known as homoscedasticity, or homogeneity of variance.

Mathematically this is written as:

Y|XN(μY|X,σ2)μY|X=β0+β1XVar(Y|X)=σ2

and can be visualized as:

Figure 6.2
Figure 6.2

7.2.1 Unifying model framework

The mathematical model above describes the theoretical relationship between Y and X. So in our unifying model framework to describe observed data,

DATA = MODEL + RESIDUAL

Our observed data values yi can be modeled as being centered on μY|X, with normally distributed residuals.

yi=β0+β1X+ϵiϵiN(0,σ2)