## 9.2 Moderation

Moderation occurs when the relationship between two variables depends on a third variable.

• The third variable is referred to as the moderating variable or simply the moderator.
• The moderator affects the direction and/or strength of the relationship between the explanatory ($$x$$) and response ($$y$$) variable.
• This tends to be an important
• When testing a potential moderator, we are asking the question whether there is an association between two constructs, but separately for different subgroups within the sample.
• This is also called a stratified model, or a subgroup analysis.

Here are 3 scenarios demonstrating how a third variable can modify the relationship between the original two variables.

Scenario 1 - Significant relationship at bivariate level (saying expect the effect to exist in the entire population) then when test for moderation the third variable is a moderator if the strength (i.e., p-value is Non-Significant) of the relationship changes. Could just change strength for one level of third variable, not necessarily all levels of the third variable.

Scenario 2 - Non-significant relationship at bivariate level (saying do not expect the effect to exist in the entire population) then when test for moderation the third variable is a moderator if the relationship becomes significant (saying expect to see it in at least one of the sub-groups or levels of third variable, but not in entire population because was not significant before tested for moderation). Could just become significant in one level of the third variable, not necessarily all levels of the third variable.

Scenario 3 - Significant relationship at bivariate level (saying expect the effect to exist in the entire population) then when test for moderation the third variable is a moderator if the direction (i.e., means change order/direction) of the relationship changes. Could just change direction for one level of third variable, not necessarily all levels of the third variable.

Recall that common analysis methods for analyzing bivariate relationships come in very few flavors:

• Correlation (Q~Q)
• Linear Regression (Q~Q)
• $$\chi^{2}$$ (C~C)
• ANOVA (Q~C)

### 9.2.1 Example 1: Sepal vs Petal Length

We just got done looking at the relationship between the length of an iris’s Sepal, and the length (cm) of it’s petal.

overall <- ggplot(iris, aes(x=Sepal.Length, y=Petal.Length)) +
geom_point() + geom_smooth(se=FALSE) +
theme_bw()

by_spec <- ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species)) +
geom_point() + geom_smooth(se=FALSE) +
theme_bw() + theme(legend.position="top")

library(gridExtra)
grid.arrange(overall, by_spec , ncol=2)  Is the relationship between sepal length and petal length the same within each species?

Let’s look at the correlation between these two continuous variables

overall

cor(iris$Sepal.Length, iris$Petal.Length)
##  0.8717538

stratified by species

by(iris, iris$Species, function(x) cor(x$Sepal.Length, x$Petal.Length)) ## iris$Species: setosa
##  0.2671758
## --------------------------------------------------------
## iris$Species: versicolor ##  0.754049 ## -------------------------------------------------------- ## iris$Species: virginica
##  0.8642247

There is a strong, positive, linear relationship between the sepal length of the flower and the petal length when ignoring the species. The correlation coefficient $$r$$ for virginica and veriscolor are similar to the overall $$r$$ value, 0.86 and 0.75 respectively compared to 0.87. However the correlation between sepal and petal length for species setosa is only 0.26.

The points are clearly clustered by species, the slope of the lowess line between virginica and versicolor appear similar in strength, whereas the slope of the line for setosa is closer to zero. This would imply that petal length for Setosa may not be affected by the length of the sepal.

### 9.2.2 Example 2: Simpson’s Paradox

Sometimes moderating variables can result in what’s known as Simpson’s Paradox