Chapter 10 Generalized Linear Models

One of the primary assumptions with linear regression, is that the error terms have a specific distribution. Namely:

\[ \epsilon_{i} \sim \mathcal{N}(0, \sigma^{2}) \qquad i=1, \ldots, n, \quad \mbox{and } \epsilon_{i} \perp \epsilon_{j}, i \neq j \]

When your outcome variable \(y\) is non-continuous/non-normal, the above assumption fails dramatically.

Generalized Linear Models (GLM) allows for different data type outcomes by allowing the linear portion of the model (\(\mathbf{X}\beta\)) to be related to the outcome variable \(y\) using a link function, that allows the magnitude of the variance of the errors (\(\sigma\)) to be related to the predicted values themselves.

There are a few overarching types of non-continuous outcomes that can be modeled with GLM’s.

  • Binary data: Logistic or Probit regression
  • Log-linear models
  • Multinomial/categorical data: Multinomial or Ordinal Logistic regression.
  • Count data: Poisson regression