17.8 Centering terms

  • Sometimes it might be better to measure the effect of a specific level relative to the average within cluster, rather than overall average.
  • The “frog pond” effect
    • A student with an average IQ may be more confident and excel in a group of students with less than average IQ
    • But they may be discouraged and not perform to their potential in a group of students with higher than average IQ.
  • If the effect of a specific level of a factor is dependent on where the level is in reference to other cluster members, more so than where the level is in reference to all other participants, the model should be adjusted for as follows:
  • Instead of using the actual value in the regression model you would…
    • calculate the cluster specific average
    • calculate the difference between individual and specific cluster average
    • both cluster average (macro) and difference (micro) are included in the model.

17.8.1 A generic dplyr approach to centering.

group.means <- data %>% group_by(cluster) %>% summarise(c.ave=mean(variable))
newdata <- data %>% left_join(group.means) %>% mutate(diff = variable - c.ave)
  1. Create a new data set that I call group.means that
    • takes the original data set and then (%>%)…
    • groups it by the clustering variable so that all subsequent actions are done on each group
    • makes a new variable that I call c.ave that is the average of the variable of interest
  2. I then take the original data set, and then
    • merge onto data, this group.means data set that only contains the clustering variable, and the cluster average variable c.ave.
    • I also toss in a mutate to create a new variable that is the difference between the variable of interest and the group averages.
    • and assign all of this to a newdata set