2022
- November 1 : Sofia’s 1st Annual Halloween Spooktacular: Candy Decision Trees (Decision Trees, Machine Learning, Python, Scikitlearn, Statistical Analysis)To be perfectly honest, the only reason I decided to write this was to have somewhere to put this absolutely adorable little pumpkin with the fall foliage boots. Every great artist must sacrifice for their creations. The Candy Ranking Dataset Last fall as I browsed Kaggle, searching for datasets to use for a class assignment, […]
- October 14 : Matrix of Confusion: Tuning the Logistic Regression Prediction Threshold for Imbalanced Classes (the long way) (Machine Learning, Statistical Analysis)I must admit that lately I’ve found myself feeling like the brain trapped in the Matrix of Confusion™ (confusion matrix) pictured above. It’s been quite a busy little month or so. I got back from a wonderful and much needed vacation at the beginning of September and subsequently whiplashed straight into a new semester (and […]
- August 19 : Predicting pro-choice vs. pro-life tweets with NLP: Part 1 (prepping and training the model) (Machine Learning, Natural Language Processing, Python, Scikitlearn)This summer, I decided to enroll in an accelerated, half-semester long Natural Language Processing class. Why not enhance the already miserably hot and humid month of July with an additional 6 hours per week of sitting in a small wooden seat after work, I thought. It was indeed a brutal month that has left me […]
- July 25 : GBBO continued: an interactive Plotly jitter plot (Data Visualization, R)How to chart baker performance across all seasons? As part of the good old Great British Bake Off (GBBO) project, I wanted to create a chart that would succinctly capture every baker’s performance in every season. First, this necessitated the invention of a “score” feature. On the actual show, no one gets a numerical score. […]
- June 29 : ggplot’s geom_tile… not just for heat maps! (Data Visualization, R)If you’ve read my previous blog post, you’ll know that I was able to convince a group of unsuspecting peers (ok, maybe one was suspecting) to create a final project Shiny app all about the Great British Bake Off (GBBO) using data I had previously scraped and wrangled from Wikipedia. An unexpected challenge that came […]
- March 29 : Mapping California wine regions using Shapefiles and tmap (Data Visualization, Mapping)Always wanted to make your own maps using Shapefiles but never knew how?? This one's for you.
- February 17 : Visualizing Winter Olympics success (Data Visualization, Data Wrangling, R)Several cool visualizations of historical Winter Olympics data using ggplot and plotly.
- February 4 : Visualizing gender-neutral baby names with ggplot and Plotly (Data Visualization, R)Play with this interactive chart to see how popular gender-neutral baby names have shifted demographics over time.
- January 23 : Designing functions to generate and display color palettes in R (Data Visualization, R)How I conceptualized and built out the functions in my colorways R package 🙂
- January 16 : Creating a local R package using RStudio (R)Learn how to create a simple package for personal use and connect it to GitHub!
- January 2 : Gender and Race in the Tech Industry – Analysis of Bias in Compensation (R, Statistical Analysis)I used hierarchical multiple regression to find significant differences in pay between genders and races.
- November 1 : Whipping up some Great British Bake Off Data (Data Wrangling, R)Well between this and the Spongebob GAN project I guess the secret is out that I am obsessed with TV.
- September 22 : Wrangling an Image Dataset (under the sea) (Data Wrangling, Python)Follow my process to scrape and wrangle a dataset of Spongebob title card images for eventual GAN processing.
- August 31 : Understanding ANOVA and Interaction Effects (Python, Statistical Analysis)Learn how to interpret the results of a two-way Analysis of Variance (ANOVA) test with interaction effects.