8.2 Stratification

Stratified models fit the regression equations (or any other bivariate analysis) for each subgroup of the population.

The mathematical model describing the relationship between Petal length (\(Y\)), and Sepal length (\(X\)) for each of the species separately would be written as follows:

\[ Y_{is} \sim \beta_{0s} + \beta_{1s}*x_{i} + \epsilon_{is} \qquad \epsilon_{is} \sim \mathcal{N}(0,\sigma^{2}_{s})\] \[ Y_{iv} \sim \beta_{0v} + \beta_{1v}*x_{i} + \epsilon_{iv} \qquad \epsilon_{iv} \sim \mathcal{N}(0,\sigma^{2}_{v}) \] \[ Y_{ir} \sim \beta_{0r} + \beta_{1r}*x_{i} + \epsilon_{ir} \qquad \epsilon_{ir} \sim \mathcal{N}(0,\sigma^{2}_{r}) \]

where \(s, v, r\) indicates species setosa, versicolor and virginica respectively.

In each model, the intercept, slope, and variance of the residuals can all be different. This is the unique and powerful feature of stratified models. The downside is that each model is only fit on the amount of data in that particular subset. Furthermore, each model has 3 parameters that need to be estimated: \(\beta_{0}, \beta_{1}\), and \(\sigma^{2}\), for a total of 9 for the three models. The more parameters that need to be estimated, the more data we need.