17.8 Centering terms
- Sometimes it might be better to measure the effect of a specific level relative to the average within cluster, rather than overall average.
- The “frog pond” effect
- A student with an average IQ may be more confident and excel in a group of students with less than average IQ
- But they may be discouraged and not perform to their potential in a group of students with higher than average IQ.
- If the effect of a specific level of a factor is dependent on where the level is in reference to other cluster members, more so than where the level is in reference to all other participants, the model should be adjusted for as follows:
- Instead of using the actual value in the regression model you would…
- calculate the cluster specific average
- calculate the difference between individual and specific cluster average
- both cluster average (macro) and difference (micro) are included in the model.
17.8.1 A generic dplyr
approach to centering.
group.means <- data %>% group_by(cluster) %>% summarise(c.ave=mean(variable))
newdata <- data %>% left_join(group.means) %>% mutate(diff = variable - c.ave)
- Create a new data set that I call
group.means
that- takes the original
data
set and then (%>%
)… - groups it by the clustering variable so that all subsequent actions are done on each group
- makes a new variable that I call
c.ave
that is the average of thevariable
of interest
- takes the original
- I then take the original
data
set, and then- merge onto
data
, thisgroup.means
data set that only contains the clustering variable, and the cluster average variablec.ave
. - I also toss in a
mutate
to create a new variable that is thediff
erence between thevariable
of interest and the group averages. - and assign all of this to a
newdata
set
- merge onto