## 8.4 Example 2 (cont.) Correlation & Regression

Is the relationship between sepal length and petal length the same within each species?

Let’s look at the correlation between these two continuous variables

overall

cor(iris$Sepal.Length, iris$Petal.Length)
## [1] 0.8717538

stratified by species

by(iris, iris$Species, function(x) cor(x$Sepal.Length, x$Petal.Length)) ## iris$Species: setosa
## [1] 0.2671758
## ------------------------------------------------------------
## iris$Species: versicolor ## [1] 0.754049 ## ------------------------------------------------------------ ## iris$Species: virginica
## [1] 0.8642247

There is a strong, positive, linear relationship between the sepal length of the flower and the petal length when ignoring the species. The correlation coefficient $$r$$ for virginica and veriscolor are similar to the overall $$r$$ value, 0.86 and 0.75 respectively compared to 0.87. However the correlation between sepal and petal length for species setosa is only 0.26.

How does the species change the regression equation?

overall

lm(iris$Petal.Length ~ iris$Sepal.Length) |> summary() |> broom::tidy()
## # A tibble: 2 × 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          -7.10    0.507      -14.0 6.13e-29
## 2 iris$Sepal.Length 1.86 0.0859 21.6 1.04e-47 stratified by species by(iris, iris$Species, function(x) {
lm(x$Petal.Length ~ x$Sepal.Length) |> summary() |> broom::tidy()
})
## iris$Species: setosa ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.803 0.344 2.34 0.0238 ## 2 x$Sepal.Length    0.132    0.0685      1.92  0.0607
## ------------------------------------------------------------
## iris$Species: versicolor ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.185 0.514 0.360 7.20e- 1 ## 2 x$Sepal.Length    0.686    0.0863     7.95  2.59e-10
## ------------------------------------------------------------
## iris$Species: virginica ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.610 0.417 1.46 1.50e- 1 ## 2 x$Sepal.Length    0.750    0.0630     11.9  6.30e-16
• Overall: -7.1 + 1.86x, significant positive slope p = 1.04x10-47
• Setosa: 0.08 + 0.13x, non-significant slope, p=.06
• Versicolor: 0.19 + 0.69x, significant positive slope p=2.6x10-10
• Virginica: 0.61 + 0.75x, significant positive slope p= 6.3x10-16

So we can say that iris specis moderates the relationship between sepal and petal length.