Designing with Data

Creating A Book Cover with R

R
ggplot2
Audio Summary

Audio created by Google NotebookLM. Note: I have not verified the accuracy of the audio or transcript. Download the transcript here (created by Restream.


There can be more to a book cover than meets the eye. The following is a book cover I designed for my online Chemistry Handbook that I use in my classes. The image actually conveys quantitative data. I document the process for generating such an image using R.

You can use the provided code examples in an .Rmd, .Rmarkdown, .Qmd, etc. file.

Image Description

The book cover displays a map of the state of Mississippi and the counties therein. Each county is shaded a color based on a gradient. The numerical data attached to each county represents the number of “pageviews” that my academic website has accrued between 2020 to 2022. Pink represents a large number whereas blue represents a small number. The website analytics was obtained from Google Analytics. The towns of University, MS and Mississippi State, MS are denoted by a blue dot on the map based on GPS coordinates.

Data Wrangling

Google Analytics can report traffic sources filtered by “Country” (United States) -> “Region” (Mississippi) -> “City”. Here is a resulting truncated table showing the number of pageviews from Mississippi by city.

Unfortunately, I could not find the analytics broken down by county. I ended up using StatsAmerica that allowed me to (painfully) search each city for the county it resided in giving me

Finally, I combined the pageviews in each county to finally obtain

This data is simply saved to an Excel file called “data.xlsx”.

Import the maps package to fetch coordinate data of Mississippi by county.

library(maps)
mississippi_county <- subset(map_data("county"), region=="mississippi")
head(mississippi_county)
           long      lat group order      region subregion
41877 -91.31801 31.74759  1368 41877 mississippi     adams
41878 -91.29510 31.71321  1368 41878 mississippi     adams
41879 -91.29510 31.68456  1368 41879 mississippi     adams
41880 -91.27790 31.67311  1368 41880 mississippi     adams
41881 -91.23207 31.63300  1368 41881 mississippi     adams
41882 -91.15759 31.63300  1368 41882 mississippi     adams

Write this data to an .xlsx file, here named “ms-counties.xlsx”. It will be written in the directory that your Rmd file by default.

write_xlsx(mississippi_county, 'ms-counties.xlsx')

This next part can be done in a variety of ways. We must create a new column in “ms-counties.xlsx” (here called “pageviews”) that includes the number of pageviews we identified from that county (stored in “data.xlsx”). You could write a shell script for this, use some Excel functions (e.g. VLOOKUP), or something more sophisticated.

Notice that there are multiple entries for “adams” county in the table above. If I have identified 17 pageviews from that county, I will insert the value of “17” next to each entry of adams county in the new column. Our table now has the form

Now our dataset is complete.

Create the Map

What I find fascinating is the map on the book cover is actually a plot. Import the ggplot2 package along with readxl and tidyverse.

library(ggplot2)
library(readxl)
library(tidyverse)

Now read in the data from “ms-counties.xlsx” into a tibble called ‘dataset’.

dataset <- as_tibble(read_xlsx("ms-counties.xlsx"))

Here is what ‘dataset’ looks like.

head(dataset)
# A tibble: 6 × 7
   long   lat group order region      subregion pageviews
  <dbl> <dbl> <dbl> <dbl> <chr>       <chr>         <dbl>
1 -91.3  31.7  1368 41877 mississippi adams            17
2 -91.3  31.7  1368 41878 mississippi adams            17
3 -91.3  31.7  1368 41879 mississippi adams            17
4 -91.3  31.7  1368 41880 mississippi adams            17
5 -91.2  31.6  1368 41881 mississippi adams            17
6 -91.2  31.6  1368 41882 mississippi adams            17

Now I can start generating the plot! Our x-axis will be longitudinal coordinates while our y-axis will be latitude. I will group the data together by using the unique IDs given in the group column of the dataset using group=group. I need to scale the coordinate system so the aspect ratio is proper (Mississippi is more of a rectangle than a square). Here I use coord_fixed(1.3) but you could choose another value. Finally, I will outline in the counties using a white border.

ms_map <- 
  ggplot(
    data=dataset,                          # Define the data
    aes(x=long, y=lat,                     # Set x/y data
        group=group                        # Group county data
        )
    ) +
    coord_fixed(1.3) +                     # Set reasonable aspect ratio
    geom_polygon(color="white")            # Add borders to counties

ms_map                                     # Show the map

Notice that I piped the ggplot code to a variable called ms_map via the <- operator. Now we have a good start to our book cover. Let’s start tweaking the image!

Clean Up The Map

We can start by removing some elements of the plot including the axes labels, values, tickmarks, etc. I’ll do this in the themes() block.

ms_map <- 
  ggplot(
    data=dataset, 
    aes(x=long, y=lat, 
        group=group)
    ) +
    coord_fixed(1.3) + 
    geom_polygon(color="white") +
    theme(                                 # "Blank" the
      axis.title.x=element_blank(),        #   x-axis title
      axis.text.x=element_blank(),         #   x-axis numbers
      axis.ticks.x=element_blank(),        #   x-axis tickmarks
      axis.title.y=element_blank(),        #   y-axis title
      axis.text.y=element_blank(),         #   y-axis numbers
      axis.ticks.y=element_blank(),        #   y-axis tickmarks
      panel.background=element_blank()     #   background
      )

ms_map

Stylize the Map

Given the large variance in pageviews from different counties (Oktibbeha county unsurprisingly has the most pageviews since that is the home of Mississippi State University), I will switch to a log scale. In addition, I don’t like the default blue gradient. I will use the lovely wesanderson palettes that Karthik Ram has generously provided. I’ll use the ‘GrandBudapest2’ palette.

library(wesanderson)

Let’s add the relevant code now.

ms_map <- 
  ggplot(
    data=dataset,
    aes(x=long, y=lat,
        group=group,
        fill=pageviews
        )
    ) +
    coord_fixed(1.3) + 
    geom_polygon(color="white") +
    theme(
      axis.title.x=element_blank(),
      axis.text.x=element_blank(),
      axis.ticks.x=element_blank(),
      axis.title.y=element_blank(),
      axis.text.y=element_blank(),
      axis.ticks.y=element_blank(),
      panel.background=element_blank()
      )

ms_map <-                                  # Overwrite ms_map
  ms_map +                                 # Call in ms_map to append code
  scale_fill_gradientn(                    # Smooth gradient between n colors
    trans="log10",                         # log10 scale
    colours=rev(
      wes_palette("GrandBudapest2")        # Choose specific palette
      )
    ) 

ms_map

Here, pink represents a large number whereas dark blue is a small number.

Let us continue to stylize our plot. I want a softer, more pastel like feel to the colors. We can apply some transparency to the colors. Here I add alpha to the geom_polygon() section. You can choose another value other than 0.65. Let us also remove the legend.

ms_map <- 
  ggplot(
    data=dataset,
    aes(x=long, y=lat,
        group=group,
        fill=pageviews
        )
    ) +
    coord_fixed(1.3) + 
    geom_polygon(color="white", 
                 alpha=0.65                # Add transparency
                 ) +
    guides(fill="none") +                  # Remove legend
    theme(
      axis.title.x=element_blank(),
      axis.text.x=element_blank(),
      axis.ticks.x=element_blank(),
      axis.title.y=element_blank(),
      axis.text.y=element_blank(),
      axis.ticks.y=element_blank(),
      panel.background=element_blank()
      )

ms_map <-
  ms_map +
  scale_fill_gradientn(
    trans="log10",
    colours=rev(
      wes_palette("GrandBudapest2")
      )
    ) 

ms_map

I can add data points to specific locations on the map. Given that I earned my PhD at the University of Mississippi and my following employer was Mississippi State University, I will subtly mark these on the map. I can simply use Google to provide this by searching, for example, “GPS coordinates for Mississippi State University”. I will add this information to the plot via geom_point(). Notice how the GPS coordinates are simply x/y values. I have colored and sized these points as well.

ms_map <- 
  ggplot(
    data=dataset,
    aes(x=long, y=lat,
        group=group,
        fill=pageviews
        )
    ) +
    coord_fixed(1.3) + 
    geom_polygon(color="white", 
                 alpha=0.65
                 ) +
    geom_point(                            # University of Mississippi
      aes(x=-89.5384,y=34.3647), 
      colour="#7E9CD8", 
      size=1.6
      ) +
    geom_point(                            # Mississippi State University
      aes(x=-88.7944,y=33.4552), 
      colour="#7E9CD8", 
      size=1.4
      ) +
    guides(fill="none") +
    theme(
      axis.title.x=element_blank(),
      axis.text.x=element_blank(),
      axis.ticks.x=element_blank(),
      axis.title.y=element_blank(),
      axis.text.y=element_blank(),
      axis.ticks.y=element_blank(),
      panel.background=element_blank()
      )

ms_map <-
  ms_map +
  scale_fill_gradientn(
    trans="log10",
    colours=rev(
      wes_palette("GrandBudapest2")
      )
    ) 

ms_map

Warning

The size of the dots look unnecessarily large in the image. Later you will see that they are actually much smaller than what is shown given the inconsistent nature of how the image is output to the screen versus saved to an image file. The take home message is, depending on how you save your image in terms of its dimension and dpi, you may want to readjust the geom_point().

Finally, I can change the color of the background from white to something else if desired. Note that I set the colors of the grid lines to that of the background color as well. I will use hexadecimal notation to denote the color.

ms_map <- 
  ggplot(
    data=dataset,
    aes(x=long, y=lat,
        group=group,
        fill=pageviews
        )
    ) +
    coord_fixed(1.3) + 
    geom_polygon(color="white", 
                 alpha=0.65
                 ) +
    geom_point(
      aes(x=-89.5384,y=34.3647), 
      colour="#7E9CD8", 
      size=1.6
      ) +
    geom_point(
      aes(x=-88.7944,y=33.4552), 
      colour="#7E9CD8", 
      size=1.4
      ) +
    guides(fill="none") +
    theme(
      axis.title.x=element_blank(),
      axis.text.x=element_blank(),
      axis.ticks.x=element_blank(),
      axis.title.y=element_blank(),
      axis.text.y=element_blank(),
      axis.ticks.y=element_blank(),
      panel.background=                    # Set background color
        element_rect(fill='#FEFBEA'),
      panel.grid.major=                    # Set major grid lines color
        element_line(color='#FEFBEA', size=0),
      panel.grid.minor=                    # Set minor grid lines color
        element_line(color='#FEFBEA', size=0)
      )

ms_map <-
  ms_map +
  scale_fill_gradientn(
    trans="log10",
    colours=rev(
      wes_palette("GrandBudapest2")
      )
    ) 

ms_map

Add Text

Now I will add text to the book cover. I will use some additional packages for this. You can use any font installed on your Windows OS; however, the set up is kind of a pain.

First, use the extrafont package to index your system fonts via font_import(). This process takes a while so I recommend commenting out the line until you have to run it again (such as if you install a new font).

library(extrafont)
#font_import()                             # takes a few minutes

Next, use the showtext package. There is a lot to digest in the next code block but it should be fairly explanatory. In short, we add the font we want to use via font_add() and then use annotate() blocks to insert text at particular locations. Each instance of annotate() can have the font stylized however I want.

library(showtext)

font_add(                                  # Specify a specific font to add
  family="font",                           # Give it a name to use below
  regular="C:\\Windows\\Fonts\\BebasNeue Regular.otf" # Path to font file
  )

ms_map <- 
  ms_map +
  annotate(                                # Book Title
    label="CHEMISTRY",                     #   Title text
    "text",                                #   Tell ggplot2 this is text
    x=-89.873,                             #   x-coordinate position
    y=33.925,                              #   y-coordinate position
    size=165.0,                            #   Font size
    family="font"                          #   Font to use (specified above)
    ) +
  annotate(                                # Book Title (second position)
    label="STUDENT HANDBOOK", 
    "text", 
    x=-89.965, 
    y=33.515, 
    size=47.0, 
    family="font"
    ) +
  annotate(                                # Book Version
    label="Beta 1.0", 
    "text", 
    x=-88.48, 
    y=33.7119, 
    size=12, 
    family="font"
    ) +
  annotate(                                # Author Information
    label="Eric Van Dornshuld", 
    "text", 
    x=-90.81, 
    y=30.65, 
    size=35.5, 
    family="font"
    ) +
  annotate(                                # Author Affiliation
    label="Mississippi State University", 
    "text", 
    x=-90.859, 
    y=30.51, 
    size=22.8, 
    family="font"
    )
Warning

You may notice that I did not output the cover via ms_map at the end of the code block above. This is because the rendering of the text on the image is broken. We can work around this by saving the image to our computer rather than outputting it directly to the screen. I will do this using ggsave. This will save the image to your current working directory.

Also, the font sizes and x/y coordinates I specified in the previous code block correlate with the dimensions (and corresponding unit) that I gave my png file. If I changed the dimensions and/or units, I would have to go back and respecify all of my annotate() blocks, a royal pain.

Save the Image

Save a high quality image with ggsave.

ggsave(
  "book-cover-test.png",                   # Image name
  plot=ms_map,                             # Variable name of our plot
  device=png,                              # Output as png
  width=10,
  height=17,
  dpi=300,                                 # Save at high resolution
  units="in"                               # Dimension units
  )

Here is the final image.

Complete Code

Here is the full set of (tidied) code that to generate the image. Feel free to download the ‘ms-counties.xlsx’ here.

Final Code
library(maps)
library(ggplot2)
library(readxl)
library(tidyverse)
library(wesanderson)
library(extrafont)
library(showtext)
#font_import()                             # takes a few minutes

dataset <- 
  as_tibble(read_xlsx("ms-counties.xlsx")) # Read in Data

font_add(                                  # Add Font
  family="font",
  regular="C:\\Windows\\Fonts\\BebasNeue Regular.otf")

ms_map <-
  ggplot(                                  # Create Plot
    data=dataset,
    aes(x=long, y=lat,
        group=group,
        fill=pageviews)
    ) +
    coord_fixed(1.3) + 
    geom_polygon(color="white", 
                 alpha=0.65) +
    geom_point(
      aes(x=-89.5384,y=34.3647), 
      colour="#7E9CD8", 
      size=1.6) +
    geom_point(
      aes(x=-88.7944,y=33.4552), 
      colour="#7E9CD8", 
      size=1.4) +
    guides(fill="none") +
    theme(
      axis.title.x=element_blank(),
      axis.text.x=element_blank(),
      axis.ticks.x=element_blank(),
      axis.title.y=element_blank(),
      axis.text.y=element_blank(),
      axis.ticks.y=element_blank(),
      panel.background=
        element_rect(fill='#FEFBEA'),
      panel.grid.major=
        element_line(color='#FEFBEA', size=0),
      panel.grid.minor=
        element_line(color='#FEFBEA', size=0)
      ) +
  scale_fill_gradientn(                    # Transform Scale
    trans="log10",
    colours=rev(wes_palette("GrandBudapest2"))
    ) +
  annotate(                                # Add Text
    label="CHEMISTRY",
    "text",
    x=-89.873,
    y=33.925,
    size=165.0,
    family="font") +
  annotate(
    label="STUDENT HANDBOOK", 
    "text", 
    x=-89.965, 
    y=33.515, 
    size=47.0, 
    family="font") +
  annotate(
    label="Beta 1.0", 
    "text", 
    x=-88.48, 
    y=33.7119, 
    size=12, 
    family="font") +
  annotate(
    label="Eric Van Dornshuld", 
    "text", 
    x=-90.81, 
    y=30.65, 
    size=35.5, 
    family="font") +
  annotate(
    label="Mississippi State University", 
    "text", 
    x=-90.859, 
    y=30.51, 
    size=22.8, 
    family="font")

ggsave(                                   # Save Image
  "book-cover-test.png",
  plot=ms_map,
  device=png,
  width=10,
  height=17,
  dpi=300,
  units="in")
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] wesanderson_0.3.6.9000 forcats_0.5.2          stringr_1.4.1          dplyr_1.0.10           purrr_0.3.5            readr_2.1.3           
 [7] tidyr_1.2.1            tibble_3.1.8           tidyverse_1.3.2        maps_3.4.1             showtext_0.9-5         showtextdb_3.0        
[13] sysfonts_0.8.8         extrafont_0.18         readxl_1.4.1           ggplot2_3.3.6         

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0    xfun_0.34           haven_2.5.1         gargle_1.2.1        colorspace_2.0-3    vctrs_0.5.0         generics_0.1.3     
 [8] utf8_1.2.2          rlang_1.0.6         pillar_1.8.1        glue_1.6.2          withr_2.5.0         DBI_1.1.3           dbplyr_2.2.1       
[15] modelr_0.1.9        lifecycle_1.0.3     munsell_0.5.0       gtable_0.3.1        cellranger_1.1.0    rvest_1.0.3         labeling_0.4.2     
[22] knitr_1.40          tzdb_0.3.0          fansi_1.0.3         Rttf2pt1_1.3.8      broom_1.0.1         scales_1.2.1        backports_1.4.1    
[29] googlesheets4_1.0.1 jsonlite_1.8.3      farver_2.1.1        fs_1.5.2            digest_0.6.30       hms_1.1.2           stringi_1.7.8      
[36] grid_4.2.2          cli_3.4.1           tools_4.2.2         magrittr_2.0.3      crayon_1.5.2        extrafontdb_1.0     pkgconfig_2.0.3    
[43] ellipsis_0.3.2      xml2_1.3.3          reprex_2.0.2        googledrive_2.0.0   lubridate_1.8.0     assertthat_0.2.1    httr_1.4.4         
[50] rstudioapi_0.14     R6_2.5.1            compiler_4.2.2