Visualizing gender-neutral baby names with ggplot and Plotly
I’m finally taking a much-anticipated (by me) class for my MA program called “Data Visualization.” An optional exercise was to play around with a dataset of baby names from US census data. I had some fun creating this interactive chart of the most popular gender-neutral baby names over time.
Names included in this chart must have been in the top 10% of all names for a given year, with a boy:girl or girl:boy ratio of no more than 100:1.
The design of this chart exposes patterns in the predominant sex of a given name over time. Interestingly, it looks like a majority of popular baby names move from a higher ratio of boys to girls to a lower ratio over time. There are many more fascinating insights to find!
The code I wrote to generate this chart is below:
library(babynames)
library(ggplot2)
library(magrittr)
library(dplyr)
library(RColorBrewer)
library(colorways2) #my color package
library(ggthemes)
f <- babynames %>% filter(sex=="F")
m <- babynames %>% filter(sex=="M")
unisex1 <- merge(f,m ,by=c("name","year"),all = TRUE)
base1 <- unisex1 %>%
group_by(year) %>%
mutate(overall=n.x+n.y) %>%
mutate(ratio= n.y/n.x) %>%
arrange(desc(ratio)) %>%
mutate(logratio=log(ratio)) %>%
mutate(overallcentile = ntile(overall,10)) %>%
filter(tolower(name) != "unknown") %>%
filter(tolower(name) != "infant") %>%
filter(tolower(name) != "baby") %>%
filter(overallcentile >= 10) %>%
filter(abs(logratio) <= 2)
d <- highlight_key(base1, ~name)
#had to make a new palette out of an existing one with 74 colors, one for each name
nb.cols <- 74
mycolors <- colorRampPalette(ballpit)(nb.cols)
p <- ggplot(d, aes(year, logratio, col= name)) +
geom_hline(yintercept=0, linetype="dashed", color = "black") +
geom_line() +
theme_tufte() +
geom_point() +
scale_y_continuous(labels = c("1:100", "1:10", "1:1","10:1","100:1")) +
labs(title="Gender Distribution of Most Popular Gender-Neutral Names Over Time", x ="", y = "Boy:Girl ratio (log scale)") +
theme( text=element_text(family="Helvetica",size = 14),plot.title = element_text(size = 14),axis.text = element_text(size = 12), axis.title = element_text(size = 14))+
scale_x_continuous(breaks = round(seq(min(1880), max(2020), by = 10),1)) +
scale_color_manual(values = mycolors)
gg <- ggplotly(p)
highlight(gg, dynamic = F, color = "black",selected = attrs_selected(showlegend = FALSE)) %>%
layout(margin = list(b = 40)) %>%
layout(legend=list(title=list(text='')))