## 17.4 Fitting models in R

Complete Pooling

The complete pooling model is fit with the function lm, and is only modeled by 1 and no covariates. This is the simple mean model, and is equivelant to estimating the mean.

fit_completepool <- lm(log_radon ~ 1, data=radon)
fit_completepool
##
## Call:
##
## Coefficients:
## (Intercept)
##       1.265
mean(radon$log_radon) ##  1.264779 No Pooling The no pooling model is also fit with the function lm, but gives each county a unique intercept in the model. fit_nopool <- lm(log_radon ~ -1 + county, data=radon) fit_nopool.withint <- lm(log_radon ~ county, data=radon)  Dependent variable: log_radon (1) (2) Constant 0.715* (0.383) countyAITKIN 0.715* (0.383) countyANOKA 0.891*** (0.106) 0.176 (0.398) countyBECKER 1.090** (0.443) 0.375 (0.585) Note: p<0.1; p<0.05; p<0.01 • The first model (fit_nopool) is coded as lm(log_radon ~ -1 + county, data=radon), and so does not have the global intercept (that’s what the -1 does). Each $$\beta$$ coefficient is the estimate of the mean log_radon for that county. • The second model (fit_nopool.withint) is coded as lm(log_radon ~ county, data=radon) and is what we are typically used to fitting. • Each estimate is the difference in log(radon) for that county compared to a reference county. • Because county is alphabetical, the reference group is AITKIN. • Aitkin’s mean level of log(radon) shows up as the intercept or Constant term. • For display purposes only, only the first 3 county estimates are being shown. Partial Pooling • The partial pooling model is fit with the function lmer(), which is part of the lme4 package. • The extra notation around the input variable (1|county) dictates that each county should get its own unique intercept $$\alpha_{j[n]}$$. fit_partpool <- lmer(log_radon ~ (1 |county), data=radon) The fixed effects portion of the model output of lmer is similar to output from lm, except no p-values are displayed. The fact that no p-values are displayed is a much discussed topic. The author of the library lme4, Douglas Bates, believes that there is no “obviously correct” solution to calculating p-values for models with randomly varying intercepts (or slopes); see here for a general discussion. summary(fit_partpool) ## Linear mixed model fit by REML ['lmerMod'] ## Formula: log_radon ~ (1 | county) ## Data: radon ## ## REML criterion at convergence: 2184.9 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -4.6880 -0.5884 0.0323 0.6444 3.4186 ## ## Random effects: ## Groups Name Variance Std.Dev. ## county (Intercept) 0.08861 0.2977 ## Residual 0.58686 0.7661 ## Number of obs: 919, groups: county, 85 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 1.350 0.047 28.72 • The random effects portion of the lmer output provides a point estimate of the variance of component $$\sigma^2_{\alpha} = 0.09$$ and the model’s residual variance, $$\sigma_\epsilon = 0.57$$. • The fixed effect here is interpreted in the same way that we would in a normal fixed effects mean model, as the global predicted value of the outcome of log_radon. • The random intercepts aren’t automatically shown in this output. We can visualize these using a forestplot. We use the plot_model() function from the sjPlot package, on the fit_partpool model, we want to see the random effects (type="re"), and we want to sort on the name of the random variable, here it’s "(Intercept)". sjPlot::plot_model(fit_partpool, type="re", sort.est = "(Intercept)", y.offset = .4) Notice that these effects are centered around 0. Refering back 17.2, the intercept $$\beta_{0j}$$ was modeled equal to some average intercept across all groups $$\gamma_{00}$$, plus some difference. What is plotted above is listed in a table below, showing that if you add that random effect to the fixed effect of the intercept, you get the value of the random intercept for each county. showri <- data.frame(Random_Effect = unlist(ranef(fit_partpool)), Fixed_Intercept = fixef(fit_partpool), RandomIntercept = unlist(ranef(fit_partpool))+fixef(fit_partpool)) rownames(showri) <- rownames(coef(fit_partpool)$county)
kable(head(showri))
Random_Effect Fixed_Intercept RandomIntercept
AITKIN -0.2390574 1.34983 1.1107728
ANOKA -0.4071256 1.34983 0.9427047
BECKER -0.0809977 1.34983 1.2688325
BELTRAMI -0.0804277 1.34983 1.2694025
BENTON -0.0254506 1.34983 1.3243796
BIGSTONE 0.0582831 1.34983 1.4081133

### 17.4.1 Comparison of estimates

• By allowing individuals within counties to be correlated, and at the same time let counties be correlated, we allow for some information to be shared across counties.
• Thus we come back to that idea of shrinkage. Below is a numeric table version of the plot in Section ??.
cmpr.est <- data.frame(Mean_Model       = coef(fit_completepool),
Random_Intercept = unlist(ranef(fit_partpool))+fixef(fit_partpool),
Fixed_Effects    = coef(fit_nopool))
rownames(cmpr.est) <- rownames(coef(fit_partpool)\$county)
kable(head(cmpr.est))
Mean_Model Random_Intercept Fixed_Effects
AITKIN 1.264779 1.1107728 0.7149352
ANOKA 1.264779 0.9427047 0.8908486
BECKER 1.264779 1.2688325 1.0900084
BELTRAMI 1.264779 1.2694025 1.1933029
BENTON 1.264779 1.3243796 1.2822379
BIGSTONE 1.264779 1.4081133 1.5367889