17.4 Fitting models in R

Complete Pooling

The complete pooling model is fit with the function lm, and is only modeled by 1 and no covariates. This is the simple mean model, and is equivelant to estimating the mean.

fit_completepool <- lm(log_radon ~ 1, data=radon)
fit_completepool
## 
## Call:
## lm(formula = log_radon ~ 1, data = radon)
## 
## Coefficients:
## (Intercept)  
##       1.265
mean(radon$log_radon)
## [1] 1.264779

No Pooling

The no pooling model is also fit with the function lm, but gives each county a unique intercept in the model.

fit_nopool <- lm(log_radon ~ -1 + county, data=radon)
fit_nopool.withint <- lm(log_radon ~ county, data=radon)


	Dependent variable:

	log_radon
	(1)	(2)

Constant		0.715^* (0.383)
countyAITKIN	0.715^* (0.383)
countyANOKA	0.891^*** (0.106)	0.176 (0.398)
countyBECKER	1.090^** (0.443)	0.375 (0.585)


Note:	p<0.1; p<0.05; p<0.01

The first model (fit_nopool) is coded as lm(log_radon ~ -1 + county, data=radon), and so does not have the global intercept (that’s what the -1 does). Each \(\beta\) coefficient is the estimate of the mean log_radon for that county.
The second model (fit_nopool.withint) is coded as lm(log_radon ~ county, data=radon) and is what we are typically used to fitting.
- Each estimate is the difference in log(radon) for that county compared to a reference county.
- Because county is alphabetical, the reference group is AITKIN.
- Aitkin’s mean level of log(radon) shows up as the intercept or Constant term.
For display purposes only, only the first 3 county estimates are being shown.

Partial Pooling

The partial pooling model is fit with the function lmer(), which is part of the lme4 package.
The extra notation around the input variable (1|county) dictates that each county should get its own unique intercept \(\alpha_{j[n]}\).

fit_partpool <- lmer(log_radon ~ (1 |county), data=radon)

The fixed effects portion of the model output of lmer is similar to output from lm, except no p-values are displayed. The fact that no p-values are displayed is a much discussed topic. The author of the library lme4, Douglas Bates, believes that there is no “obviously correct” solution to calculating p-values for models with randomly varying intercepts (or slopes); see here for a general discussion.

summary(fit_partpool)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log_radon ~ (1 | county)
##    Data: radon
## 
## REML criterion at convergence: 2184.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6880 -0.5884  0.0323  0.6444  3.4186 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  county   (Intercept) 0.08861  0.2977  
##  Residual             0.58686  0.7661  
## Number of obs: 919, groups:  county, 85
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    1.350      0.047   28.72

The random effects portion of the lmer output provides a point estimate of the variance of component \(\sigma^2_{\alpha} = 0.09\) and the model’s residual variance, \(\sigma_\epsilon = 0.57\).
The fixed effect here is interpreted in the same way that we would in a normal fixed effects mean model, as the global predicted value of the outcome of log_radon.
The random intercepts aren’t automatically shown in this output. We can visualize these using a forestplot. We use the plot_model() function from the sjPlot package, on the fit_partpool model, we want to see the random effects (type="re"), and we want to sort on the name of the random variable, here it’s "(Intercept)".

sjPlot::plot_model(fit_partpool, type="re", sort.est = "(Intercept)", y.offset = .4)

Notice that these effects are centered around 0. Refering back 17.2, the intercept \(\beta_{0j}\) was modeled equal to some average intercept across all groups \(\gamma_{00}\), plus some difference. What is plotted above is listed in a table below, showing that if you add that random effect to the fixed effect of the intercept, you get the value of the random intercept for each county.

showri <- data.frame(Random_Effect   = unlist(ranef(fit_partpool)), 
                     Fixed_Intercept = fixef(fit_partpool), 
                     RandomIntercept = unlist(ranef(fit_partpool))+fixef(fit_partpool))
                
rownames(showri) <- rownames(coef(fit_partpool)$county)
kable(head(showri))

	Random_Effect	Fixed_Intercept	RandomIntercept
AITKIN	-0.2390574	1.34983	1.1107728
ANOKA	-0.4071256	1.34983	0.9427047
BECKER	-0.0809977	1.34983	1.2688325
BELTRAMI	-0.0804277	1.34983	1.2694025
BENTON	-0.0254506	1.34983	1.3243796
BIGSTONE	0.0582831	1.34983	1.4081133

17.4.1 Comparison of estimates

By allowing individuals within counties to be correlated, and at the same time let counties be correlated, we allow for some information to be shared across counties.
Thus we come back to that idea of shrinkage. Below is a numeric table version of the plot in Section ??.

cmpr.est <- data.frame(Mean_Model       = coef(fit_completepool), 
                       Random_Intercept = unlist(ranef(fit_partpool))+fixef(fit_partpool), 
                       Fixed_Effects    = coef(fit_nopool))
rownames(cmpr.est) <- rownames(coef(fit_partpool)$county)
kable(head(cmpr.est))

	Mean_Model	Random_Intercept	Fixed_Effects
AITKIN	1.264779	1.1107728	0.7149352
ANOKA	1.264779	0.9427047	0.8908486
BECKER	1.264779	1.2688325	1.0900084
BELTRAMI	1.264779	1.2694025	1.1933029
BENTON	1.264779	1.3243796	1.2822379
BIGSTONE	1.264779	1.4081133	1.5367889