Designing with Data
Creating A Book Cover with R
Audio created by Google NotebookLM. Note: I have not verified the accuracy of the audio or transcript. Download the transcript here (created by Restream.
There can be more to a book cover than meets the eye. The following is a book cover I designed for my online Chemistry Handbook that I use in my classes. The image actually conveys quantitative data. I document the process for generating such an image using R.
You can use the provided code examples in an .Rmd, .Rmarkdown, .Qmd, etc. file.
Image Description
The book cover displays a map of the state of Mississippi and the counties therein. Each county is shaded a color based on a gradient. The numerical data attached to each county represents the number of “pageviews” that my academic website has accrued between 2020 to 2022. Pink represents a large number whereas blue represents a small number. The website analytics was obtained from Google Analytics. The towns of University, MS and Mississippi State, MS are denoted by a blue dot on the map based on GPS coordinates.
Data Wrangling
Google Analytics can report traffic sources filtered by “Country” (United States) -> “Region” (Mississippi) -> “City”. Here is a resulting truncated table showing the number of pageviews from Mississippi by city.
Unfortunately, I could not find the analytics broken down by county. I ended up using StatsAmerica that allowed me to (painfully) search each city for the county it resided in giving me
Finally, I combined the pageviews in each county to finally obtain
This data is simply saved to an Excel file called “data.xlsx”.
Import the maps
package to fetch coordinate data of Mississippi by county.
library(maps)
<- subset(map_data("county"), region=="mississippi")
mississippi_county head(mississippi_county)
long lat group order region subregion
41877 -91.31801 31.74759 1368 41877 mississippi adams
41878 -91.29510 31.71321 1368 41878 mississippi adams
41879 -91.29510 31.68456 1368 41879 mississippi adams
41880 -91.27790 31.67311 1368 41880 mississippi adams
41881 -91.23207 31.63300 1368 41881 mississippi adams
41882 -91.15759 31.63300 1368 41882 mississippi adams
Write this data to an .xlsx file, here named “ms-counties.xlsx”. It will be written in the directory that your Rmd file by default.
write_xlsx(mississippi_county, 'ms-counties.xlsx')
This next part can be done in a variety of ways. We must create a new column in “ms-counties.xlsx” (here called “pageviews”) that includes the number of pageviews we identified from that county (stored in “data.xlsx”). You could write a shell script for this, use some Excel functions (e.g. VLOOKUP), or something more sophisticated.
Notice that there are multiple entries for “adams” county in the table above. If I have identified 17 pageviews from that county, I will insert the value of “17” next to each entry of adams county in the new column. Our table now has the form
Now our dataset is complete.
Create the Map
What I find fascinating is the map on the book cover is actually a plot. Import the ggplot2
package along with readxl
and tidyverse
.
library(ggplot2)
library(readxl)
library(tidyverse)
Now read in the data from “ms-counties.xlsx” into a tibble called ‘dataset’.
<- as_tibble(read_xlsx("ms-counties.xlsx")) dataset
Here is what ‘dataset’ looks like.
head(dataset)
# A tibble: 6 × 7
long lat group order region subregion pageviews
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 -91.3 31.7 1368 41877 mississippi adams 17
2 -91.3 31.7 1368 41878 mississippi adams 17
3 -91.3 31.7 1368 41879 mississippi adams 17
4 -91.3 31.7 1368 41880 mississippi adams 17
5 -91.2 31.6 1368 41881 mississippi adams 17
6 -91.2 31.6 1368 41882 mississippi adams 17
Now I can start generating the plot! Our x-axis will be longitudinal coordinates while our y-axis will be latitude. I will group the data together by using the unique IDs given in the group column of the dataset using group=group
. I need to scale the coordinate system so the aspect ratio is proper (Mississippi is more of a rectangle than a square). Here I use coord_fixed(1.3)
but you could choose another value. Finally, I will outline in the counties using a white border.
<-
ms_map ggplot(
data=dataset, # Define the data
aes(x=long, y=lat, # Set x/y data
group=group # Group county data
)+
) coord_fixed(1.3) + # Set reasonable aspect ratio
geom_polygon(color="white") # Add borders to counties
# Show the map ms_map
Notice that I piped the ggplot code to a variable called ms_map
via the <-
operator. Now we have a good start to our book cover. Let’s start tweaking the image!
Clean Up The Map
We can start by removing some elements of the plot including the axes labels, values, tickmarks, etc. I’ll do this in the themes()
block.
<-
ms_map ggplot(
data=dataset,
aes(x=long, y=lat,
group=group)
+
) coord_fixed(1.3) +
geom_polygon(color="white") +
theme( # "Blank" the
axis.title.x=element_blank(), # x-axis title
axis.text.x=element_blank(), # x-axis numbers
axis.ticks.x=element_blank(), # x-axis tickmarks
axis.title.y=element_blank(), # y-axis title
axis.text.y=element_blank(), # y-axis numbers
axis.ticks.y=element_blank(), # y-axis tickmarks
panel.background=element_blank() # background
)
ms_map
Link the Data
I now link the user data to the counties. Recall that I want to shade each county by the number of pageviews.
<-
ms_map ggplot(
data=dataset,
aes(x=long, y=lat,
group=group,
fill=pageviews # Color counties with pageviews
)+
) coord_fixed(1.3) +
geom_polygon(color="white") +
theme(
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank()
)
ms_map
Notice the default color gradient applied. This will not do.
Stylize the Map
Given the large variance in pageviews from different counties (Oktibbeha county unsurprisingly has the most pageviews since that is the home of Mississippi State University), I will switch to a log scale. In addition, I don’t like the default blue gradient. I will use the lovely wesanderson
palettes that Karthik Ram has generously provided. I’ll use the ‘GrandBudapest2’ palette.
library(wesanderson)
Let’s add the relevant code now.
<-
ms_map ggplot(
data=dataset,
aes(x=long, y=lat,
group=group,
fill=pageviews
)+
) coord_fixed(1.3) +
geom_polygon(color="white") +
theme(
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank()
)
<- # Overwrite ms_map
ms_map + # Call in ms_map to append code
ms_map scale_fill_gradientn( # Smooth gradient between n colors
trans="log10", # log10 scale
colours=rev(
wes_palette("GrandBudapest2") # Choose specific palette
)
)
ms_map
Here, pink represents a large number whereas dark blue is a small number.
Let us continue to stylize our plot. I want a softer, more pastel like feel to the colors. We can apply some transparency to the colors. Here I add alpha
to the geom_polygon()
section. You can choose another value other than 0.65. Let us also remove the legend.
<-
ms_map ggplot(
data=dataset,
aes(x=long, y=lat,
group=group,
fill=pageviews
)+
) coord_fixed(1.3) +
geom_polygon(color="white",
alpha=0.65 # Add transparency
+
) guides(fill="none") + # Remove legend
theme(
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank()
)
<-
ms_map +
ms_map scale_fill_gradientn(
trans="log10",
colours=rev(
wes_palette("GrandBudapest2")
)
)
ms_map
I can add data points to specific locations on the map. Given that I earned my PhD at the University of Mississippi and my following employer was Mississippi State University, I will subtly mark these on the map. I can simply use Google to provide this by searching, for example, “GPS coordinates for Mississippi State University”. I will add this information to the plot via geom_point()
. Notice how the GPS coordinates are simply x/y values. I have colored and sized these points as well.
<-
ms_map ggplot(
data=dataset,
aes(x=long, y=lat,
group=group,
fill=pageviews
)+
) coord_fixed(1.3) +
geom_polygon(color="white",
alpha=0.65
+
) geom_point( # University of Mississippi
aes(x=-89.5384,y=34.3647),
colour="#7E9CD8",
size=1.6
+
) geom_point( # Mississippi State University
aes(x=-88.7944,y=33.4552),
colour="#7E9CD8",
size=1.4
+
) guides(fill="none") +
theme(
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank()
)
<-
ms_map +
ms_map scale_fill_gradientn(
trans="log10",
colours=rev(
wes_palette("GrandBudapest2")
)
)
ms_map
The size of the dots look unnecessarily large in the image. Later you will see that they are actually much smaller than what is shown given the inconsistent nature of how the image is output to the screen versus saved to an image file. The take home message is, depending on how you save your image in terms of its dimension and dpi, you may want to readjust the geom_point()
.
Finally, I can change the color of the background from white to something else if desired. Note that I set the colors of the grid lines to that of the background color as well. I will use hexadecimal notation to denote the color.
<-
ms_map ggplot(
data=dataset,
aes(x=long, y=lat,
group=group,
fill=pageviews
)+
) coord_fixed(1.3) +
geom_polygon(color="white",
alpha=0.65
+
) geom_point(
aes(x=-89.5384,y=34.3647),
colour="#7E9CD8",
size=1.6
+
) geom_point(
aes(x=-88.7944,y=33.4552),
colour="#7E9CD8",
size=1.4
+
) guides(fill="none") +
theme(
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background= # Set background color
element_rect(fill='#FEFBEA'),
panel.grid.major= # Set major grid lines color
element_line(color='#FEFBEA', size=0),
panel.grid.minor= # Set minor grid lines color
element_line(color='#FEFBEA', size=0)
)
<-
ms_map +
ms_map scale_fill_gradientn(
trans="log10",
colours=rev(
wes_palette("GrandBudapest2")
)
)
ms_map
Add Text
Now I will add text to the book cover. I will use some additional packages for this. You can use any font installed on your Windows OS; however, the set up is kind of a pain.
First, use the extrafont
package to index your system fonts via font_import()
. This process takes a while so I recommend commenting out the line until you have to run it again (such as if you install a new font).
library(extrafont)
#font_import() # takes a few minutes
Next, use the showtext
package. There is a lot to digest in the next code block but it should be fairly explanatory. In short, we add the font we want to use via font_add()
and then use annotate()
blocks to insert text at particular locations. Each instance of annotate()
can have the font stylized however I want.
library(showtext)
font_add( # Specify a specific font to add
family="font", # Give it a name to use below
regular="C:\\Windows\\Fonts\\BebasNeue Regular.otf" # Path to font file
)
<-
ms_map +
ms_map annotate( # Book Title
label="CHEMISTRY", # Title text
"text", # Tell ggplot2 this is text
x=-89.873, # x-coordinate position
y=33.925, # y-coordinate position
size=165.0, # Font size
family="font" # Font to use (specified above)
+
) annotate( # Book Title (second position)
label="STUDENT HANDBOOK",
"text",
x=-89.965,
y=33.515,
size=47.0,
family="font"
+
) annotate( # Book Version
label="Beta 1.0",
"text",
x=-88.48,
y=33.7119,
size=12,
family="font"
+
) annotate( # Author Information
label="Eric Van Dornshuld",
"text",
x=-90.81,
y=30.65,
size=35.5,
family="font"
+
) annotate( # Author Affiliation
label="Mississippi State University",
"text",
x=-90.859,
y=30.51,
size=22.8,
family="font"
)
You may notice that I did not output the cover via ms_map
at the end of the code block above. This is because the rendering of the text on the image is broken. We can work around this by saving the image to our computer rather than outputting it directly to the screen. I will do this using ggsave
. This will save the image to your current working directory.
Also, the font sizes and x/y coordinates I specified in the previous code block correlate with the dimensions (and corresponding unit) that I gave my png file. If I changed the dimensions and/or units, I would have to go back and respecify all of my annotate()
blocks, a royal pain.
Save the Image
Save a high quality image with ggsave
.
ggsave(
"book-cover-test.png", # Image name
plot=ms_map, # Variable name of our plot
device=png, # Output as png
width=10,
height=17,
dpi=300, # Save at high resolution
units="in" # Dimension units
)
Here is the final image.
Complete Code
Here is the full set of (tidied) code that to generate the image. Feel free to download the ‘ms-counties.xlsx’ here.
Final Code
library(maps)
library(ggplot2)
library(readxl)
library(tidyverse)
library(wesanderson)
library(extrafont)
library(showtext)
#font_import() # takes a few minutes
<-
dataset as_tibble(read_xlsx("ms-counties.xlsx")) # Read in Data
font_add( # Add Font
family="font",
regular="C:\\Windows\\Fonts\\BebasNeue Regular.otf")
<-
ms_map ggplot( # Create Plot
data=dataset,
aes(x=long, y=lat,
group=group,
fill=pageviews)
+
) coord_fixed(1.3) +
geom_polygon(color="white",
alpha=0.65) +
geom_point(
aes(x=-89.5384,y=34.3647),
colour="#7E9CD8",
size=1.6) +
geom_point(
aes(x=-88.7944,y=33.4552),
colour="#7E9CD8",
size=1.4) +
guides(fill="none") +
theme(
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=
element_rect(fill='#FEFBEA'),
panel.grid.major=
element_line(color='#FEFBEA', size=0),
panel.grid.minor=
element_line(color='#FEFBEA', size=0)
+
) scale_fill_gradientn( # Transform Scale
trans="log10",
colours=rev(wes_palette("GrandBudapest2"))
+
) annotate( # Add Text
label="CHEMISTRY",
"text",
x=-89.873,
y=33.925,
size=165.0,
family="font") +
annotate(
label="STUDENT HANDBOOK",
"text",
x=-89.965,
y=33.515,
size=47.0,
family="font") +
annotate(
label="Beta 1.0",
"text",
x=-88.48,
y=33.7119,
size=12,
family="font") +
annotate(
label="Eric Van Dornshuld",
"text",
x=-90.81,
y=30.65,
size=35.5,
family="font") +
annotate(
label="Mississippi State University",
"text",
x=-90.859,
y=30.51,
size=22.8,
family="font")
ggsave( # Save Image
"book-cover-test.png",
plot=ms_map,
device=png,
width=10,
height=17,
dpi=300,
units="in")
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] wesanderson_0.3.6.9000 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.5 readr_2.1.3
[7] tidyr_1.2.1 tibble_3.1.8 tidyverse_1.3.2 maps_3.4.1 showtext_0.9-5 showtextdb_3.0
[13] sysfonts_0.8.8 extrafont_0.18 readxl_1.4.1 ggplot2_3.3.6
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 xfun_0.34 haven_2.5.1 gargle_1.2.1 colorspace_2.0-3 vctrs_0.5.0 generics_0.1.3
[8] utf8_1.2.2 rlang_1.0.6 pillar_1.8.1 glue_1.6.2 withr_2.5.0 DBI_1.1.3 dbplyr_2.2.1
[15] modelr_0.1.9 lifecycle_1.0.3 munsell_0.5.0 gtable_0.3.1 cellranger_1.1.0 rvest_1.0.3 labeling_0.4.2
[22] knitr_1.40 tzdb_0.3.0 fansi_1.0.3 Rttf2pt1_1.3.8 broom_1.0.1 scales_1.2.1 backports_1.4.1
[29] googlesheets4_1.0.1 jsonlite_1.8.3 farver_2.1.1 fs_1.5.2 digest_0.6.30 hms_1.1.2 stringi_1.7.8
[36] grid_4.2.2 cli_3.4.1 tools_4.2.2 magrittr_2.0.3 crayon_1.5.2 extrafontdb_1.0 pkgconfig_2.0.3
[43] ellipsis_0.3.2 xml2_1.3.3 reprex_2.0.2 googledrive_2.0.0 lubridate_1.8.0 assertthat_0.2.1 httr_1.4.4 [50] rstudioapi_0.14 R6_2.5.1 compiler_4.2.2