## 7.7 Example

Using a cleaned version of the Lung function data set from PMA5, lets explore the relationship between height and FEV for fathers in this data set.

ggplot(fev, aes(y=FFEV1, x=FHEIGHT)) + geom_point() +
xlab("Height") + ylab("FEV1") +
ggtitle("Scatter Diagram with Regression (blue) and Lowess (red) Lines
of FEV1 Versus Height for Fathers.") +
geom_smooth(method="lm", se=FALSE, col="blue") +
geom_smooth(se=FALSE, col="red") There does appear to be a tendency for taller men to have higher FEV1. Let’s fit a linear model and report the regression parameter estimates.

model <- lm(FFEV1 ~ FHEIGHT, data=fev)
summary(model)
##
## Call:
## lm(formula = FFEV1 ~ FHEIGHT, data = fev)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -1.56688 -0.35290  0.04365  0.34149  1.42555
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.08670    1.15198  -3.548 0.000521 ***
## FHEIGHT      0.11811    0.01662   7.106 4.68e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5638 on 148 degrees of freedom
## Multiple R-squared:  0.2544, Adjusted R-squared:  0.2494
## F-statistic:  50.5 on 1 and 148 DF,  p-value: 4.677e-11

The least squares equation is $$Y = -4.087 + 0.118X$$.

confint(model)
##                   2.5 %     97.5 %
## (Intercept) -6.36315502 -1.8102499
## FHEIGHT      0.08526328  0.1509472

For ever inch taller a father is, his FEV1 measurement significantly increases by .12 (95%CI: .09, .15, p<.0001).
The correlation between FEV1 and height is $$\sqrt{.2544}$$ = 0.5.

Lastly, check assumptions on the residuals to see if the model results are valid.

• Homogeneity of variance
plot(model$residuals ~ fev$FHEIGHT)
lines(lowess(model$residuals ~ fev$FHEIGHT), col="red") • Normal residuals
qqnorm(model$residuals) qqline(model$residuals, col="red") No major deviations away from what is expected.

### 7.7.1 Confidence and Prediction Intervals

If we set the se argument in geom_smooth to TRUE, the shaded region is the confidence band for the mean. To get the prediction interval, we have to use the predict function.

pred.int <- predict(model, interval="predict") %>% data.frame()

ggplot(fev, aes(y=FFEV1, x=FHEIGHT)) + geom_point() +
geom_smooth(method="lm", se=TRUE, col="blue") +
geom_line(aes(y=pred.int$lwr), linetype="dashed", col="red", lwd=1.5) + geom_line(aes(y=pred.int$upr), linetype="dashed", col="red", lwd=1.5) 