California’s Napa Valley is only about an hour’s drive from Berkeley, where I grew up. Even as a sub-drinking age child, I was often given a good reason to spend a day in or around Napa. Hiking, visiting a friend’s grandparents with a pool, driving my younger sister to artistic rollerskating practice (a story for another time). Add to this a family-wide obsession with the early 2000’s classic Sideways and frequent trips through the Santa Ynez Valley for some classic Solvang aebelskiver, and you get the recipe for a lifelong love of the California Wine Country.
My partner works in the world of fancy beverages, and has recently started cataloguing the wines he’s tasted. There are some good apps available (Vivino) for tracking and reviewing wines, and generating predictions based on taste profiles. However, he has found that Vivino falls short when it comes to geographically tracking specific wines and varietals. He wants to be able to see which regions and subregions he has tried wines from in an interactive map format rather than a list. I agree that this would be more interesting. Though after doing my own research, I can see why Vivino lacks a robust mapping feature. There is very little reliable geographic data freely available for world wine regions online. I think the primary reason for this is probably gatekeeping, because the information exists, it’s just in very expensive published books. Who would have guessed that there would be snobbery surrounding data on wine 🙂
The good news is that there is some high quality data available, at least for US wine regions, thanks to the UC Davis library: here. It’s a really cool project – they’ve catalogued viticultural areas for California and other wine producing states in the US in great detail. I decided to start exploring this data by creating my own wine region map for California. I also used this little project as an opportunity to explore some of R’s mapping features with the tmap package.
Get CA Viticultural Area Polygons via Shapefile
The first step in plotting any kind of area on a map is getting the actual data that holds information about its shape. This can come in several forms – raster (pixels, like a JPEG) or polygons (area boundaries are stored as formulas for a polygon, i.e. vectorized like a PDF). Polygon areas are commonly stored as either GeoJSON files or Shapefiles. Either can be easily read into R using a package called rgdal. The polygons I wanted to use for my map come as either GeoJSON or Shapefiles.
I chose to use the Shapefile to import the polygons into R. When you download a Shapefile, it will come as a folder with several files like this:
Don’t worry about what exactly all these separate files are. All you need to do to import the full Shapefile into R is run a very simple line of code using the readOGR function from the rgdal package:
cawine <- readOGR(dsn = "Desktop/CA_avas_shapefile", layer = "CA_avas")
All I did here was add the path to the Shapefile folder, and specify the layer name, which is the name of all the files within the folder. This will get you a “Large SpatialPolygonsDataFrame” in the object space, which looks like this when you click it:
You’ll notice that the data and polygons object both have 142 items, one for each viticultural area. When you expand the data object, you can see that it is formatted as a data frame with columns that provide information and additional context for the polygons:
If you expand the polygons object, you’ll see that each item contains coordinate information for a specific viticultural area:
Plot the areas with tmap
Now that we have the viticultural area Shapefile loaded, we can begin to plot the wine regions using the tmap package:
library(tmap)
tm_shape(cawine) +
tm_fill("name",legend.show = FALSE, alpha = .5) +
tm_borders()
tmap works like ggplot in that plots are built by adding function layers on top of each other. Each plot needs a tm_shape() object, which provides the data for the layers that follow. tmap will send back an error if you call only tm_shape() without at least one layer following it. Here I’ve called tm_shape() using our cawine shapefile, added a fill layer with tm_fill() based on the name of each area, and added a border for each area with tm_borders(). The output looks like this:
We’re getting somewhere, but obviously the areas are floating in space with no background map for context. There are several ways we can fix this, and which one we use depends on the end goal. Perhaps the most straightforward fix is to add an additional Shapefile layer with the shape of California to our plot. This will require finding a Shapefile for the outline of California. I just googled this, clicked on one of the first links, here, and downloaded the state boundary file. I added this Shapefile as a base layer for the viticultural areas like so:
ca <- readOGR(dsn = "Desktop/CA-state-boundary", layer = "CA_State_TIGER2016")
tmap_mode("plot")
tm_shape(ca) +
tm_fill("white") +
tm_borders() +
tm_shape(cawine) +
tm_fill("name",legend.show = FALSE, alpha = .5) +
tm_borders() +
tm_layout(bg.color = "#cfe4e8")
This doesn’t look bad, though there is a bit of an awkward edge around the coastline for some reason – this could potentially be a result of the viticultural area Shapefile being mapped to a different projection than the California Shapefile. To check this, I changed the map projection for both Shapefiles to the Equal Earth projection like so:
tmap_mode("plot")
tm_shape(ca, projection = 8857) +
tm_fill("white") +
tm_borders() +
tm_shape(cawine, projection = 8857) +
tm_fill("name",legend.show = FALSE, alpha = .5) +
tm_borders() +
tm_layout(bg.color = "#cfe4e8")
A fascinating new shape for California! We see that the edge on the coastline is still there, so we can safely assume that this is by design. In fact, I believe it is because the California Shapefile extends partially into the ocean territory that belongs to the state.
This static map is nice to get a broad overview of the regions, or to build an attractive map that could be printed out, but if we wanted more detailed information or additional context about the viticultural areas, it would be worth our while to construct a more dynamic map.
Make the map interactive
Making a map interactive using tmap is surprisingly simple. To get started, all you need to do is change the mapping mode to “view” using tmap_mode():
tmap_mode("view")
tm_shape(ca) +
tm_fill("white") +
tm_borders() +
tm_shape(cawine) +
tm_fill("name",legend.show = FALSE, alpha = .5) +
tm_borders()
Clearly there is some refining to do here, but this is a great start toward building interactivity that took essentially no work. If you click on the layers icon you can toggle between different default Esri base maps, and add or remove the ca and cawine Shapefile layers.
In the next iteration of our interactive map, I’ll add more information in the popup labels using the popup.vars argument in our fill layer. I also changed the id argument to use the name variable so that the nicely formatted viticultural area name is displayed upon hover. I removed the California Shapefile, as it was no longer needed, and added a custom tm_basemap() layer with Stamen’s TonerLite map:
tmap_mode("view")
tm_shape(cawine) +
tm_fill("name",legend.show = FALSE, alpha = .5,
id="name",
popup.vars = c("Within" = "within",
"Counties" = "county",
"Created" = "created")) +
tm_borders() +
tm_basemap("Stamen.TonerLite")
Now we have a map that actually provides a good deal of information on the geographic location of all the viticultural areas in the Shapefile dataset. Of course there are infinite ways to iterate on this basic map, but for my purposes, simply visualizing where each region is and getting some basic information displayed is enough, and I’m pleased with the simplicity of the final product. Stay tuned for more wine data exploration in the future!
Code and data used to create the maps in this tutorial are here