8.4 Example 2 (cont.) Correlation & Regression
Is the relationship between sepal length and petal length the same within each species?
Let’s look at the correlation between these two continuous variables
overall
stratified by species
by(iris, iris$Species, function(x) cor(x$Sepal.Length, x$Petal.Length))
## iris$Species: setosa
## [1] 0.2671758
## ------------------------------------------------------------
## iris$Species: versicolor
## [1] 0.754049
## ------------------------------------------------------------
## iris$Species: virginica
## [1] 0.8642247
There is a strong, positive, linear relationship between the sepal length of the flower and the petal length when ignoring the species. The correlation coefficient \(r\) for virginica and veriscolor are similar to the overall \(r\) value, 0.86 and 0.75 respectively compared to 0.87. However the correlation between sepal and petal length for species setosa is only 0.26.
How does the species change the regression equation?
overall
lm(iris$Petal.Length ~ iris$Sepal.Length) |> summary() |> broom::tidy()
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -7.10 0.507 -14.0 6.13e-29
## 2 iris$Sepal.Length 1.86 0.0859 21.6 1.04e-47
stratified by species
by(iris, iris$Species, function(x) {
lm(x$Petal.Length ~ x$Sepal.Length) |> summary() |> broom::tidy()
})
## iris$Species: setosa
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.803 0.344 2.34 0.0238
## 2 x$Sepal.Length 0.132 0.0685 1.92 0.0607
## ------------------------------------------------------------
## iris$Species: versicolor
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.185 0.514 0.360 7.20e- 1
## 2 x$Sepal.Length 0.686 0.0863 7.95 2.59e-10
## ------------------------------------------------------------
## iris$Species: virginica
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.610 0.417 1.46 1.50e- 1
## 2 x$Sepal.Length 0.750 0.0630 11.9 6.30e-16
- Overall: -7.1 + 1.86x, significant positive slope p = 1.04x10-47
- Setosa: 0.08 + 0.13x, non-significant slope, p=.06
- Versicolor: 0.19 + 0.69x, significant positive slope p=2.6x10-10
- Virginica: 0.61 + 0.75x, significant positive slope p= 6.3x10-16
So we can say that iris specis moderates the relationship between sepal and petal length.