9.3 Interactions

If we care about how species changes the relationship between petal and sepal length, we can fit a model with an interaction between sepal length (\(x_{1}\)) and species. For this first example let \(x_{2}\) be an indicator for when species == setosa . Note that both main effects of sepal length, and setosa species are also included in the model. Interactions are mathematically represented as a multiplication between the two variables that are interacting.

\[ Y_{i} \sim \beta_{0} + \beta_{1}x_{i} + \beta_{2}x_{2i} + \beta_{3}x_{1i}x_{2i}\]

If we evaluate this model for both levels of \(x_{2}\), the resulting models are the same as the stratified models.

When \(x_{2} = 0\), the record is on an iris not from the setosa species.

\[ Y_{i} \sim \beta_{0} + \beta_{1}x_{i} + \beta_{2}(0) + \beta_{3}x_{1i}(0)\] which simplifies to \[ Y_{i} \sim \beta_{0} + \beta_{1}x_{i}\]

When \(x_{2} = 1\), the record is on an iris of the setosa species.

\[ Y_{i} \sim \beta_{0} + \beta_{1}x_{i} + \beta_{2}(1) + \beta_{3}x_{1i}(1)\] which simplifies to \[ Y_{i} \sim (\beta_{0} + \beta_{2}) + (\beta_{1} + \beta_{3})x_{i}\]

Each subgroup model has a different intercept and slope, but we had to estimate 4 parameters in the interaction model, and 6 for the fully stratified model.

Interactions are fit in R by simply multiplying * the two variables together in the model statement.

The coefficient \(b_{3}\) for the interaction term is significant, confirming that species changes the relationship between sepal length and petal length.

9.3.1 Example 1

  • If \(x_{2}=0\), then the effect of \(x_{1}\) on \(Y\) simplifies to: \(\beta_{1}\)
    • \(b_{1}\) The effect of sepal length on petal length for non-setosa species of iris (setosa=0)
    • For non-setosa species, the petal length increases 1.03cm for every additional cm of sepal length.
  • If \(x_{2}=1\), then the effect of \(x_{1}\) on \(Y\) model simplifies to: \(\beta_{1} + \beta_{3}\)
    • For setosa species, the petal length increases by 1.03-0.9=0.13 cm for every additional cm of sepal length.
The main effects (\(b_{1}\), \(b_{2}\)) cannot be interpreted by themselves when there is an interaction in the model.

Let’s up the game now and look at the full interaction model with a categorical version of species. Recall \(x_{1}\) is Sepal Length, \(x_{2}\) is the indicator for versicolor, and \(x_{3}\) the indicator for virginica .

\[ Y_{i} \sim \beta_{0} + \beta_{1}x_{i} + \beta_{2}x_{2i} + \beta_{3}x_{3i} + \beta_{4}x_{1i}x_{2i} + \beta_{5}x_{1i}x_{3i}+\epsilon_{i}\]

The slope of the relationship between sepal length and petal length is calculated as follows, for each species:

  • setosa \((x_{2}=0, x_{3}=0): b_{1}=0.13\)
  • versicolor \((x_{2}=1, x_{3}=0): b_{1} + b_{2} + b_{4} = 0.13+0.55 = 0.68\)
  • virginica \((x_{2}=0, x_{3}=1): b_{1} + b_{3} + b_{5} = 0.13+0.62 = 0.75\)

Compare this to the estimates gained from the stratified model:

They’re the same! Proof that an interaction is equivalent to stratification.

9.3.2 Example 2

What if we now wanted to include other predictors in the model? How does sepal length relate to petal length after controlling for petal width? We add the variable for petal width into the model

So far, petal width, and the combination of species and sepal length are both significantly associated with petal length.

Note of caution: Stratification implies that the stratifying variable interacts with all other variables. So if we were to go back to the stratified model where we fit the model of petal length on sepal length AND petal width, stratified by species, we would be implying that species interacts with both sepal length and petal width.

E.g. the following stratified model

  • \(Y = A + B + C + D + C*D\), when D=1
  • \(Y = A + B + C + D + C*D\), when D=0

is the same as the following interaction model:

  • \(Y = A + B + C + D + A*D + B*D + C*D\)

9.3.3 Example 3: The relationship between income, employment status and depression.

This example follows section 10.3.3.

Here I create the binary indicators of lowincome (annual income <$10k/year) and underemployed (part time or unemployed).

The Main Effects model assumes that the effect of income on depression is independent of employment status, and the effect of employment status on depression is independent of income.

To formally test whether an interaction term is necessary, we add the interaction term into the model and assess whether the coefficient for the interaction term is significantly different from zero.