batting2 = filter(batting, batting$yearID > 1961) %>% select(yearID,H,X2B,X3B,HR,AB,BB,HBP,SF) %>% na.omit() %>% mutate(TB = H+2*X2B+(3*X3B)+(4*HR)) %>% mutate(SLG = TB/AB) %>% na.omit() %>% mutate(OBP = (H + BB + HBP)/(AB + BB + SF + HBP)) %>% mutate(OPS = SLG + OBP) %>% mutate(steroidEra = (yearID>1983 & yearID<2003)) %>% filter(AB>160)
We used the filter function from the “dplyr” package and filtered out the data below at the year 1961, since this is the year that baseball had become today’s standard of 162 games. Then we used the select function to pick our desired variables and then omitted any missing values(Na’s). We filtered out players with few at bats. Lastly we used mutate to create a series of new variables which ended up producing OPS, as such
\[TB=H+2(X2B)+3(X3B)+4(HR)\] \[ SLG=TB/AB \] \[OBP=(H+BB+HBP)/(AB +BB+SF+HBP) \] \[OPS=SLG+OBP \]
table = batting2 %>% group_by(yearID,steroidEra) %>% summarise(avgOPS = mean(OPS))
Next, we created created another subset of the data with just year, whether the year is in the steroid era, and a new variable, average OPS for the year.
ggplot(batting2, aes(x =yearID, y = OPS)) + geom_point(aes(x =yearID, y = OPS ,col=steroidEra)) + geom_smooth(se=FALSE) + ggtitle("OPS Over Time") + xlab("Year") + ylab("OPS") + scale_color_discrete(name='Steroid Era') + theme_minimal() + scale_color_manual(name= "Era",labels=c("Non-Steroid Era","Steroid Era"),values = c("green","red")) + theme_bw()
ggplot(table, aes(x =yearID, y = avgOPS)) + geom_point(aes(x =yearID, y = avgOPS ,col=steroidEra)) + geom_smooth(se=FALSE) + ggtitle("Average OPS Over Time") + xlab("Year") + ylab("Average OPS") + scale_color_manual(name= "Era",labels=c("Non-Steroid Era","Steroid Era"),values = c("green","red")) + theme_bw()
pitching2 = filter(pitching, pitching$yearID > 1961)%>%
select(yearID,ERA,G) %>% na.omit %>%
mutate(steroidEra = (yearID>1983 & yearID<2003)) %>% filter(G>30)
table2 = pitching2 %>% group_by(yearID,steroidEra) %>% summarise(avgERA = mean(ERA))
We used the filter function on the pitching data to get years from 1961 to current. This time we did not have to do any extra calculations to get ERA. Thus, we selected year, whether the year was in the steroid era, and the amount of games the played played. After, we filtered out players with few games.
Again, we created created another subset of the data with just year, whether the year is in the steroid era, and ERA.
ggplot(pitching2, aes(x =yearID, y = ERA)) + geom_point(aes(col = steroidEra)) + geom_smooth(se=TRUE) + ggtitle("ERA Over Time") + xlab("Year") + ylab("Average ERA") + scale_color_discrete(name='Steroid Era') + theme_minimal() + scale_color_manual(name= "Era",labels=c("Non-Steroid Era","Steroid Era"),values = c("green","red")) + theme_bw()
ggplot(table2, aes(x =yearID, y = avgERA)) + geom_point(aes(col = steroidEra)) + geom_smooth(se=FALSE) + ggtitle("Average ERA Over Time") + xlab("Year") + ylab("Average ERA") + scale_color_discrete(name='Steroid Era') + scale_color_manual(name= "Era",labels=c("Non-Steroid Era","Steroid Era"),values = c("green","red")) + theme_bw()
opsera = left_join(table,table2, by='yearID')
ggplot(opsera, aes(avgERA,avgOPS,col = steroidEra.x)) + geom_point() + geom_smooth(se=FALSE) + scale_color_manual(name= "Era",labels=c("Non-Steroid Era","Steroid Era"),values = c("green","red")) + theme_bw() + xlab("Average ERA") + ylab("Average OPS")
From the 60’s up till the steroid era, we see an increase in average OPS and decrease in average ERA, signifying that preformance was increasing for both pitchers and hitters.
During the steriod era, ERA increase when OPS did, and then both begin to decrease around the end of the steriod era.
It appears that steriods did give an advantage to batters over pithers during the steroid era.