Skip to content
R for the Rest of Us Logo

Data Visualization Best Practices

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Learn More

Below is the tweet that inspired me to use this visualization by John Burn-Murdoch throughout this section of the course.

If you're curious to see the code used to make the above visualization, it is below. The code no longer works (I believe because the structure of the data source was changed), but you can see how it is set up. Much of what you see below should be familiar to you, which is inspiring!

read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv") %>%
  gather(date, cases, 5:ncol(.)) %>%
  mutate(date = as.Date(date, "%m/%d/%y")) %>%
  group_by(country = `Country/Region`, date) %>%
  summarise(cases = sum(cases)) %>%
  filter(country != "Others" & country != "Mainland China") %>%
  bind_rows(
    tibble(country = "Republic of Korea", date = as.Date("2020-03-11"), cases = 7755)
  ) %>%
  group_by(country) %>%
  mutate(days_since_100 = as.numeric(date-min(date[cases >= 100]))) %>%
  ungroup() %>%
  filter(is.finite(days_since_100)) %>% 
  group_by(country) %>%
  mutate(new_cases = cases-cases[days_since_100 == 0]) %>%
  filter(sum(cases >= 100) >= 5) %>%
  filter(cases >= 100) %>% 
  bind_rows(
    tibble(country = "33% daily rise", days_since_100 = 0:18) %>%
      mutate(cases = 100*1.33^days_since_100)
  ) %>%
  ungroup() %>%
  mutate(
    country = country %>% str_replace_all("( SAR)|( \\(.+)|(Republic of )", "")
  ) %>%
  # filter(days_since_100 <= 10) %>%
  ggplot(aes(days_since_100, cases, col = country)) +
  geom_hline(yintercept = 100) +
  geom_vline(xintercept = 0) +
  geom_line(size = 0.8) +
  geom_point(pch = 21, size = 1) +
  scale_y_log10(expand = expand_scale(add = c(0,0.1)), breaks=c(100, 200, 500, 1000, 2000, 5000, 10000)) +
  # scale_y_continuous(expand = expand_scale(add = c(0,100))) +
  scale_x_continuous(expand = expand_scale(add = c(0,1))) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none",
    plot.margin = margin(3,15,3,3,"mm")
  ) +
  coord_cartesian(clip = "off") +
  scale_colour_manual(values = c("UK" = "#ce3140", "US" = "#EB5E8D", "Italy" = "black", "France" = "#c2b7af", "Germany" = "#c2b7af", "Hong Kong" = "#1E8FCC", "Iran" = "#9dbf57", "Japan" = "#208fce", "Singapore" = "#1E8FCC", "Korea" = "#208fce", "Belgium" = "#c2b7af", "Netherlands" = "#c2b7af", "Norway" = "#c2b7af", "Spain" = "#c2b7af", "Sweden" = "#c2b7af", "Switzerland" = "#c2b7af", "33% daily rise" = "#D9CCC3")) +
  geom_shadowtext(aes(label = paste0(" ",country)), hjust=0, vjust = 0, data = . %>% group_by(country) %>% top_n(1, days_since_100), bg.color = "white") +
  labs(x = "Number of days since 100th case", y = "", subtitle = "Total number of cases")

If you want to learn about high-quality data visualization more generally, I'd recommend checking out the work of Stephanie Evergreen or Ann Emery. They both offer materials to help you learn the principles of high-quality data visualization.

On the idea of small tweaks to make your data viz sparkle, you might find the "little of visualization" series by Andy Kirk interesting.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Nico Schoutteet

Nico Schoutteet

March 6, 2023

Great introduction, thanks! Just a quick question though: any idea how to obtain the horizontal arrow in the x axis title (can't figure this out from the code that was shared)?