GBBO continued: an interactive Plotly jitter plot

How to chart baker performance across all seasons?

As part of the good old Great British Bake Off (GBBO) project, I wanted to create a chart that would succinctly capture every baker’s performance in every season. First, this necessitated the invention of a “score” feature. On the actual show, no one gets a numerical score. However, on the (very helpful) wikipedia pages for each season, fans have recorded which bakers were favorites, least favorites, Star Bakers and eliminated each episode (read this post to learn more about how I wrangled the data from wikipedia tables). This information can be used to create our own numerical representation of baker performance. I decided to give each baker 1 point for “favorite”, -1 point for “least favorite”, and 2 points for being named the Star Baker.

To compare every baker’s performance in every season, I summed all bakers’ scores to get their “final score” when they were eliminated (or when they won). For each season, a dot plot comparing each baker’s final score could look something like this:

Combining all seasons would give you something like this:

Full code here

The dot-plot-specific portion of the code is below:

geom_dotplot(binaxis='y', stackdir='center',
             aes(fill = winflag)
             ,dotsize = .5
             ,alpha = .8
             ,stackgroups = TRUE)

We can see that the points at around 0 Final Score start to run into each other when stacked, and things begin to look a little cluttered. This is where the jitter plot comes in.

Jitter plots

The jitter plot strikes me as a strange being in the world of data visualization. Unlike almost all other plots, the jitter plot adds an element of chaos. It forgoes absolute precision for the sake of readability.

In essence, a jitter plot is just a dot-plot for situations when there are too many dots for stacking. Instead of spacing the dots equally apart from each other without overlap, a jitter plot “jitters” the dots in a random manner, within a given area.

Using geom_jitter() instead of geom_dotplot():

The jitter-plot-specific portion of the code is below:

geom_jitter(height = .1,width = .25,
            aes(color = winflag), 
              alpha =.8,
              size = 4)

The points here are less precise, but flow better. There is some overlap, but the randomness makes it easier to tell points apart, and a sense of the density of the points is not lost.

The height and width variables determine how much wiggle room the jitter plot has to work with in the x and y dimensions. smaller numbers would mean a tighter radius and more overlapping. Bigger numbers would give a wider radius to jitter within.

It is possible to achieve jittering while using geom_dotplot() using position = position_jitter(width = ?, height = ?), though there are other advantages of using geom_jitter(). We will use geom_jitter() from here on out for this post.

One advantage to geom_jitter is that it is easy to change dot attributes based on a group variable (not so with geom_dotplot – this is because there is no size or shape variable built into the parameters, only “dotsize”). To highlight winners and runners up in my jitter plot, I was able to change the code as follows:

Full code here

To change size based on a categorical variable, I added size as an attribute to the geom_jitter aesthetics. I then had to add a scale_size_manual() function to define the size for each category. If you have multiple aesthetics that you want to show up in one legend, t’s important to define the same name (even if it’s blank) for the all of the aesthetic functions:

geom_jitter(height = .1,width = .25,
              aes(color = winflag,
                  size = winflag), 
              alpha =.8) +
scale_color_manual(name = "",
                     values = c('Winner' = 'goldenrod',
                                'Runner-up' = '#86dba5',
                                'Eliminated before final' = '#e68a95')) +
scale_size_manual(name = "",
                    values = c('Winner' = 5,
                               'Runner-up' = 4,
                               'Eliminated before final' = 2))

The trouble with labels

I’m sure you noticed that there were helpful data labels in my single-season example. If we were to add the same labels to our jitter plot, we would get this beauty:

Clearly, there are too many observations here to have both intelligible labels and visible points. Wouldn’t it be nice if we could have an interactive plot that would allow a user to choose a point and see more information dynamically? We can!

Plotly

Using Plotly, we can turn our static plot into a dynamic plot that provides much more information upon hover or click:

Charts or apps can be made completely in Plotly, or we can use the ggplotly() function to turn a plot originally made in ggplot into an interactive Plotly visualization.

On a very basic level, all I did to turn our jitter plot into a Plotly plot was save the jitterplot and apply the ggplotly() function. The full code to get the formatted, interactive Plotly visualization is below:

#save fully formatted ggplot jitterplot

p <- ggplot(jitter, aes(season, endsum), group = baker) +
  geom_jitter(height = .1,width = .25,
              aes(color = winflag,
                  size = winflag,
                  text = paste('Baker:', baker,
                               '<br>Status:', winflag,
                               '<br>Max Episode:', maxep,
                               '<br>Final Score:', endsum)), 
              alpha =.8
  ) +
  scale_x_continuous(limits = c(1.8,12.2), breaks=seq(2,12,by=1)) +
  scale_y_continuous(limits = c(-4,13), breaks=seq(-4,13,by=2)) +
  coord_flip() +
  geom_vline(xintercept=2.5, color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept = 3.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept = 4.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept=5.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept = 6.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept=7.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept = 8.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept=9.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept=10.5,color = "gray30", linetype = "dashed", size = .5) +
  geom_vline(xintercept = 11.5,color = "gray30", linetype = "dashed", size = .5) +
  labs(x = "Season", y = "Final Score") +
  theme_minimal() +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_blank(),
    axis.text.x = element_text(family = "Arial"),
    text = element_text(size = 14, family = 'Arial')
  ) +
  scale_color_manual(name = "",
                     values = c('Winner' = 'goldenrod',
                                'Runner-up' = '#86dba5',
                                'Eliminated before final' = '#e68a95')) +
  scale_size_manual(name = "",
                    values = c('Winner' = 5,
                               'Runner-up' = 4,
                               'Eliminated before final' = 2)) 

#apply ggplotly() function to our plot, specify the text for the tooltip, remove the toolbar and format the legend

ggplotly(p,tooltip = "text") %>%
  config(displayModeBar = F) %>%
  layout(legend = list(orientation = "v", 
                       xanchor = "center", 
                       x = 1,
                       y=.3,
                       bordercolor = "#edd99f",
                       borderwidth = 2,
                       bgcolor = "#ffdbfa",
                       font = list(
                         family = "Arial",
                         size = 14,
                         color = "#000")))

Only one tweak had to be made to the original ggplot jitter plot code to get the Plotly visualization to work correctly. This was the addition of the “text” aesthetic in the geom_jitter() function:

text = paste('Baker:', baker,
                               '<br>Status:', winflag,
                               '<br>Max Episode:', maxep,
                               '<br>Final Score:', endsum)

You’ll notice that in the ggplotly() function, the tooltip parameter was set to “text”.

ggplotly(p,tooltip = "text")

This code is crucial in defining what text the user will see when hovering over a point.

The other additions to the ggplotly code are aesthetic. I use config(displayModeBar = F) to remove the toolbar that automatically gets added to Plotly plots (not necessary, I just don’t like how it looks). The layout() function is used to create the custom legend.

And there you have it: making a static plot dynamic was that easy!