11.2 Log-linear models

A log-linear model is when the log of the response variable is modeled using a linear combination of predictors.

\[ln(Y) \sim XB +\epsilon\]

Recall that in statistics, when we refer to the log, we mean the natural log ln.

This type of model is often use to model count data using the Poisson distribution (Section 11.5).

Why are we transforming the outcome? Typically to achieve normality when the response variable is highly skewed.

Interpreting results

Since we transformed our outcome before performing the regression, we have to back-transform the coefficient before interpretation. Similar to logistic regression, we need to exponentiate the regression coefficient before interpreting.

When using log transformed outcomes, the effect on Y becomes multiplicative instead of additive.

  • Additive For every 1 unit increase in X, y increases by b1
  • Multiplicative For every 1 unit increase in X, y is multiplied by \(e^{b1}\)

Example, let \(b_{1} = 0.2\).

  • Additive For every 1 unit increase in X, y increases by 0.2 units.
  • Multiplicative For every 1 unit increase in X, y changes by \(e^{0.2} = 1.22\) = 22%

Thus we interpret the coefficient as a percentage change in \(Y\) for a unit increase in \(x_{j}\).

  • \(b_{j}<0\) : Positive slope, positive association. The expected value of \(Y\) for when \(x=0\) is \(1 - e^{b_{j}}\) percent lower than when \(x=1\)
  • \(b_{j} \geq 0\) : Negative slope, negative association. The expected value of \(Y\) for when \(x=0\) is \(e^{b_{j}}\) percent higher than when \(x=1\)

This UCLA resource is my “go-to” reference on how to interpret the results when your response, predictor, or both variables are log transformed.

https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/

11.2.1 Example

We are going to analyze personal income from the AddHealth data set. First I need to clean up, and log transform the variable for personal earnings H4EC2 by following the steps below in order.

  1. Remove values above 999995 (structural missing).
  2. Create a new variable called income, that sets all values of personal income to be NA if below the federal poverty line.
    • First set income= H4EC2
    • Then set income to missing, if H4EC2 < 10210 (the federal poverty limit from 2008)
  3. Then create a new variable: logincome that is the natural log (ln) of income. e.g. addhealth$logincome = log(addhealth$income)

Why are we transforming income? To achieve normality.

par(mfrow=c(2,2))
hist(addhealth$income, probability = TRUE); lines(density(addhealth$income, na.rm=TRUE), col="red")
hist(addhealth$logincome, probability = TRUE); lines(density(addhealth$logincome, na.rm=TRUE), col="blue")
qqnorm(addhealth$income); qqline(addhealth$income, col="red")
qqnorm(addhealth$logincome); qqline(addhealth$logincome, col="blue")

Identify variables

  • Quantitative outcome that has been log transformed: Income (variable logincome)
  • Binary predictor: Ever smoked a cigarette (variable eversmoke_c)
  • Binary confounder: Gender (variable female_c)

The mathematical multivariable model looks like:

\[ln(Y) \sim \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2}\]

Fit a linear regression model

ln.mod.2 <- lm(logincome~wakeup + female_c, data=addhealth)
summary(ln.mod.2) %>% pander()
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.65 0.026 409.8 0
wakeup -0.01491 0.003218 -4.633 3.73e-06
female_cFemale -0.1927 0.017 -11.34 2.564e-29
Fitting linear model: logincome ~ wakeup + female_c
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
3813 0.5233 0.03611 0.0356
1-exp(confint(ln.mod.2)[-1,])
##                     2.5 %      97.5 %
## wakeup         0.02099299 0.008561652
## female_cFemale 0.20231394 0.147326777

Interpret the results

  • For every hour later one wakes up in the morning, one can expect to earn 1-exp(-0.015) = 1.4% less income than someone who wakes up one hour earlier. This is after controlling for gender.
  • Females have on average 1-exp(-0.19) = 17% percent lower income than males, after controlling for the wake up time.

Both gender and time one wakes up are significantly associated with the amount of personal earnings one makes. Waking up later in the morning is associated with 1.4% (95% CI 0.8%-2%, p<.0001) percent lower income than someone who wakes up one hour earlier. Females have 17% (95% CI 15%-20%, p<.0001) percent lower income than males.