17.4 Fitting models in R

Complete Pooling

The complete pooling model is fit with the function lm, and is only modeled by 1 and no covariates. This is the simple mean model, and is equivelant to estimating the mean.

No Pooling

The no pooling model is also fit with the function lm, but gives each county a unique intercept in the model.

Dependent variable:
log_radon
(1) (2)
Constant 0.715* (0.383)
countyAITKIN 0.715* (0.383)
countyANOKA 0.891*** (0.106) 0.176 (0.398)
countyBECKER 1.090** (0.443) 0.375 (0.585)
Note: p<0.1; p<0.05; p<0.01
  • The first model (fit_nopool) is coded as lm(log_radon ~ -1 + county, data=radon), and so does not have the global intercept (that’s what the -1 does). Each \(\beta\) coefficient is the estimate of the mean log_radon for that county.
  • The second model (fit_nopool.withint) is coded as lm(log_radon ~ county, data=radon) and is what we are typically used to fitting.
    • Each estimate is the difference in log(radon) for that county compared to a reference county.
    • Because county is alphabetical, the reference group is AITKIN.
    • Aitkin’s mean level of log(radon) shows up as the intercept or Constant term.
  • For display purposes only, only the first 3 county estimates are being shown.

Partial Pooling

  • The partial pooling model is fit with the function lmer(), which is part of the lme4 package.
  • The extra notation around the input variable (1|county) dictates that each county should get its own unique intercept \(\alpha_{j[n]}\).

The fixed effects portion of the model output of lmer is similar to output from lm, except no p-values are displayed. The fact that no p-values are displayed is a much discussed topic. The author of the library lme4, Douglas Bates, believes that there is no “obviously correct” solution to calculating p-values for models with randomly varying intercepts (or slopes); see here for a general discussion.

  • The random effects portion of the lmer output provides a point estimate of the variance of component \(\sigma^2_{\alpha} = 0.09\) and the model’s residual variance, \(\sigma_\epsilon = 0.57\).
  • The fixed effect here is interpreted in the same way that we would in a normal fixed effects mean model, as the global predicted value of the outcome of log_radon.
  • The random intercepts aren’t automatically shown in this output. We can visualize these using what some call a forest plot. A very easy way to accomplish this is to use the sjPlot package. We use the plot_model() function, on the fit_partpool model, we want to see the random effects (type="re"), and we want to sort on the name of the random variable, here it’s "(Intercept)".

Notice that these effects are centered around 0. Refering back ((???)), the intercept \(\beta_{0j}\) was modeled equal to some average intercept across all groups \(\gamma_{00}\), plus some difference. What is plotted above is listed in a table below, showing that if you add that random effect to the fixed effect of the intercept, you get the value of the random intercept for each county.

Random_Effect Fixed_Intercept RandomIntercept
AITKIN -0.2390577 1.34983 1.1107726
ANOKA -0.4071257 1.34983 0.9427046
BECKER -0.0809978 1.34983 1.2688325
BELTRAMI -0.0804278 1.34983 1.2694025
BENTON -0.0254506 1.34983 1.3243796
BIGSTONE 0.0582831 1.34983 1.4081134

17.4.1 Comparison of estimates

  • By allowing individuals within counties to be correlated, and at the same time let counties be correlated, we allow for some information to be shared across counties.
  • Thus we come back to that idea of shrinkage. Below is a numeric table version of the plot in Section ((???)).
Mean_Model Random_Intercept Fixed_Effects
AITKIN 1.264779 1.1107726 0.7149352
ANOKA 1.264779 0.9427046 0.8908486
BECKER 1.264779 1.2688325 1.0900084
BELTRAMI 1.264779 1.2694025 1.1933029
BENTON 1.264779 1.3243796 1.2822379
BIGSTONE 1.264779 1.4081134 1.5367889