Visualizing gender-neutral baby names with ggplot and Plotly

Visualizing gender-neutral baby names with ggplot and Plotly

I’m finally taking a much-anticipated (by me) class for my MA program called “Data Visualization.” An optional exercise was to play around with a dataset of baby names from US census data. I had some fun creating this interactive chart of the most popular gender-neutral baby names over time.

Names included in this chart must have been in the top 10% of all names for a given year, with a boy:girl or girl:boy ratio of no more than 100:1.

The design of this chart exposes patterns in the predominant sex of a given name over time. Interestingly, it looks like a majority of popular baby names move from a higher ratio of boys to girls to a lower ratio over time. There are many more fascinating insights to find!

The code I wrote to generate this chart is below:

library(babynames)
library(ggplot2)   
library(magrittr)   
library(dplyr)  
library(RColorBrewer)
library(colorways2) #my color package
library(ggthemes)

f <- babynames %>% filter(sex=="F")
m <- babynames %>% filter(sex=="M")

unisex1 <- merge(f,m ,by=c("name","year"),all = TRUE)

base1 <- unisex1 %>%
  group_by(year) %>%
  mutate(overall=n.x+n.y) %>%
  mutate(ratio= n.y/n.x) %>%
  arrange(desc(ratio)) %>%
  mutate(logratio=log(ratio)) %>%
  mutate(overallcentile = ntile(overall,10)) %>%
  filter(tolower(name) != "unknown") %>%
  filter(tolower(name) != "infant") %>%
  filter(tolower(name) != "baby") %>%
  filter(overallcentile >=  10) %>%
  filter(abs(logratio) <= 2) 

d <- highlight_key(base1, ~name)

#had to make a new palette out of an existing one with 74 colors, one for each name
nb.cols <- 74
mycolors <- colorRampPalette(ballpit)(nb.cols)

p <- ggplot(d, aes(year, logratio, col= name)) + 
  geom_hline(yintercept=0, linetype="dashed", color = "black") +
  geom_line() + 
  theme_tufte() +
  geom_point() + 
  scale_y_continuous(labels = c("1:100", "1:10", "1:1","10:1","100:1")) +
  labs(title="Gender Distribution of Most Popular Gender-Neutral Names Over Time", x ="", y = "Boy:Girl ratio (log scale)") +
  theme( text=element_text(family="Helvetica",size = 14),plot.title = element_text(size = 14),axis.text = element_text(size = 12), axis.title = element_text(size = 14))+
  scale_x_continuous(breaks = round(seq(min(1880), max(2020), by = 10),1)) +
  scale_color_manual(values = mycolors) 
  
gg <- ggplotly(p)   
  
highlight(gg, dynamic = F, color = "black",selected = attrs_selected(showlegend = FALSE)) %>% 
 layout(margin = list(b = 40)) %>%
 layout(legend=list(title=list(text='')))