--- title: "Lesson 10 - Graphing Enhancements" date: "Last Updated `r format(Sys.Date(), '%m-%d-%Y')`" output: html_document: highlight: tango theme: yeti css: ../css/yeti.css toc: yes toc_float: yes pdf_document: toc: no --- # [Introduction (video)]() The first lesson on graphing focused on creating the correct graphics for the data types. This lesson expands on those tools and demonstrates how to customize some features and add enhancements to clarify or to add information to a plot. ## Student Learning Outcomes After completing this lesson students will be able to * Change the color of plots * Create a grid/panel of plots. * Create several multivariable graphics * Know of a few new packages that contain functions that enhance plots > There is *NO* need to remember all of these adjustments when there are incredibly handy reference guides available. The resource section of the [Week 4 overview](../wk4.html) has links to some of my favorite sites. ## Preparation Prior to this lesson students should 1. Install the following packages: `gridExtra`, `RColorBrewer` and `sjPlot` packages. 2. Download the notes file for this lesson 3. Update the paths to read in the data and ensure the first code chunk of your notes work. ```{r, warning=FALSE, message=FALSE} ``` # [Themes (video)]() The standard theme has a gray background, white grid lines etc. Themes can be changed by adding `theme_X()` where `X` has several options ```{r} ``` See more about themes here: http://www.sthda.com/english/wiki/ggplot2-themes-and-background-colors-the-3-elements # [Legends (video)]() ## Titles Add the `name=` argument to whatever layer you added that created the legend. Here I specified a `fill`, and it was a `discrete` variable. So I use the `scale_fill_discrete()` layer. ```{r} ``` Here I `col`or the box outlines so the layer is `scale_color_discrete()`. ```{r} ``` ## Removing the legend In this case the legend show the same information. That's redundant and takes up space, so let's remove it entirely by adding `guide=FALSE` to the `scale_` layer. ```{r} ``` # [Titles (video)]() A good plot needs to stand alone and convey as much information as possible. * `ggtitle()` adds an overall plot title * `ylab()` and `xlab()` adjust the axis labels. ```{r} ``` # [Removing NA categories (video)]() This happens when there is no data is present for one or more levels. One way to get around this is to use `select` to only choose variables that we are going to use in the plot directly, delete all rows with any missing values using `na.omit()`, then pipe `ggplot` directly in after deleting all rows with missing data. ```{r} ``` > Warning: `na.omit()` can be dangerous to use. It will drop EVERY row with ANY value missing. We are using it safely here because we are only selecting the variables that we will use in the plot. # [Changing colors (video)]() ## Manual ```{r} ``` There's some spectacularly named colors as well, here is the [full list of names](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf). ```{r} ``` And you can use [hex colors](https://www.w3schools.com/colors/colors_picker.asp) ```{r} ``` ## Using a color palette (recommended) ```{r} ``` Review the links in the additional resources section of the [Week 4 overview](../wk4.html) on how to choose colors. With great power comes great responsibility. Colors can just as easily detract from a plot as they can enhance it. ---- # [Barchart enhancements (video)]() ## Numbers to the top of bars. The biggest addition to a barchart is the numbers on top of the bars. ```{r} ``` * Play with the `+50` on the y axis to move the numbers up/down the bars. * Play with the `size` modifier to find something readable. ## Plotting Proportions Often you don't want to compare counts but percents. To accomplish this in ggplot is doable, but involves pre-calculating the percentages first and then plotting the summary numbers. This is a great place to introduce a different package: `sjPlot`. `ggplot2` is pretty great for easy out-of-the-box plots that don't look half bad. Well for something a bit better when it comes to labeled barcharts let's turn to the `sjPlot` package. Note this package uses ggplot graphics, but does not follow the syntax. So you'll need to reference the [vignette](https://strengejacke.github.io/sjPlot/) often to to learn more of what this package can do and what syntax it uses. To plot the frequency distribution of a single categorical variable we use the "plot freq" function `plot_frq` ```{r} ``` To plot the results of a contingency table we can use the `plot_xtab` function. ```{r} ``` As before, you *MUST* do a common sense check on the proportions that are shown, which also means knowing what the default settings for a function are. The defaults for `plot_xtab` show the proportion of the fill variable, within the variable that's on the x. In context of the plot above, 17.5% of women who are not married are mature moms. 86.8% of all moms are younger moms. You can see the available adjustments using function arguments by reviewing the [vignette for that function](https://strengejacke.github.io/sjPlot/reference/plot_xtab.html). Here I will turn off the total column by adding `show.total=FALSE` and turn it into row percents by adding `margin='row'`. ```{r} ``` Now 81.1% of mature moms are not married, and 41.6% of younger moms are married. Always make sure your %'s being displayed match your interpretation. ---- # [Multivariate Plotting (video)]() So far we've only looked at how to plot 2 variables on the same plot. Let's up our game some. ## Boxplots with three variables. In the first plotting lesson, we saw that for boxplots the continuous variable goes on the y, categorical on the x. Well you can also fill (or color) the boxes by a third variable if you want an additional comparison within the categories on the x axis. ```{r} ``` This lets us compare the average gestation period for smokers vs non smokers, and mature vs younger moms. Displaying the boxes this way allows for easy comparison of weeks between smokers & non smokers, within maturity status. If we flip the x axis and fill, ```{r} ``` Now the emphasis is on comparing weeks between mature & younger moms, within each smoking status. This way of grouping levels of one aspect within categories of another aspect is extended and enhanced using faceting or paneling. ## Faceting / paneling ggplot introduces yet another term called `faceting`. The definition is _a particular aspect or feature of something_, or _one side of something many-sided, especially of a cut gem_. Basically instead of plotting the grouped graphics on the same plotting area, we let each group have it's own plot, or facet. We add a `facet_wrap()` and specify that we want to panel on the color group. Note the twiddle in front of color. (Not here we're switching to the `NCbirths` data set.) ```{r} ``` The grid placement can be semi-controlled by using the `ncol` argument in the `facet_wrap()` statement. ```{r, fig.height=6} ``` It is important to compare distributions across groups on the same scale, and our eyes can compare items vertically better than horizontally. ## Multiple plots per window This method works only for plots created with `ggplot`. Here we use the `grid.arrange` function in the `gridExtra` package. 1. assign the output of a ggplot object to an object (here it's `plot1` and `plot2`). 2. Use `grid.arrange()` to arrange them either side by side or top and bottom. ```{r} ``` ## Paneling on two variables Who says we're stuck with only faceting on one variable? A variant on `facet_wrap` is `facet_grid`. Here we can specify multiple variables to panel on. ```{r, fig.width=10, fig.height=5} ``` ----- #