7.1 Example

Lung function data were obtained from an epidemiological study of households living in four areas with different amounts and types of air pollution. The data set used in this book is a subset of the total data. In this chapter we use only the data taken on the fathers, all of whom are nonsmokers (see PMA6 Appendix A for more details).

# Read in the data from a version stored online. 
fev <- read.delim("https://norcalbiostat.netlify.com/data/Lung_081217.txt", sep="\t", header=TRUE)

One of the major early indicators of reduced respiratory function is FEV1 or forced expiratory volume in the first second (amount of air exhaled in 1 second). Since it is known that taller males tend to have higher FEV1, we wish to determine the relationship between height and FEV1. We can use regression analysis for both a descriptive and predictive purpose.

  • Descriptive: Describing the relationship between FEV1 and height
  • Predictive: Use the equation to determine expected or normal FEV1 for a given height
ggplot(fev, aes(y=FFEV1, x=FHEIGHT)) + geom_point() + 
      xlab("Height") + ylab("FEV1") + 
      ggtitle("Scatterplot and Regression line of FEV1 Versus Height for Males.") + 
      geom_smooth(method="lm", se=FALSE, col="blue") 

In this graph, height is given on the horizontal axis since it is the independent or predictor variable and FEV1 is given on the vertical axis since it is the dependent or outcome variable.

Interpretation: There does appear to be a tendency for taller men to have higher FEV1. The regression line is also added to the graph. The line is tilted upwards, indicating that we expect larger values of FEV1 with larger values of height.

Specifically the equation of the regression line is \[ Y = -4.087 + 0.118 X \]

The quantity 0.118 in front of \(X\) is greater than zero, indicating that as we increase \(X, Y\) will increase. For example, we would expect a father who is 70 inches tall to have an FEV1 value of

\[\mbox{FEV1} = -4.087 + (0.118) (70) = 4.173\]

If the height was 66 inches then we would expect an FEV1 value of only 3.70.

7.1.1 Caution on out of range predictions

To take an extreme example, suppose a father was 2 feet tall. Then the equation would predict a negative value of FEV1 (\(-1.255\)).

A safe policy is to restrict the use of the equation to the range of the \(X\) observed in the sample.