Get access to all lessons in this course.
-
Advanced Data Wrangling
- Downloading and Importing Data
- Overview of Tidy Data
- Tidy Data Rule #1: Every Column is a Variable
- Tidy Data Rule #3: Every Cell is a Single Value
- Tidy Data Rule #2: Every Row is an Observation
- Changing Variable Types
- Dealing with Missing Data
- Advanced Summarizing
- Binding Data Frames
- Functions
- Data Merging
- Exporting Data
- Bring It All Together (Advanced Data Wrangling)
-
Advanced Data Visualization
- Best Practices in Data Visualization
- Tidy Data
- Pipe Data into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Add Descriptive Labels to Your Plots
- Use Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Create a Custom Theme
- Customize Your Fonts
- Try New Plot Types
- Bring it All Together (Advanced Data Visualization)
-
Quarto
- Advanced Markdown
- Advanced YAML and Code Chunk Options
- Tables
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: PDF Edition
- Making Your Reports Shine: HTML Edition
- Presentations
- Dashboards
- Websites
- Publishing Your Work
- Quarto Extensions
- Parameterized Reporting, Part 1
- Parameterized Reporting, Part 2
- Parameterized Reporting, Part 3
- Wrapping up Going Deeper with R
Going Deeper with R
Data Visualization Best Practices
This lesson is locked
This lesson is called Data Visualization Best Practices, part of the Going Deeper with R course. This lesson is called Data Visualization Best Practices, part of the Going Deeper with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Learn More
Below is the tweet that inspired me to use this visualization by John Burn-Murdoch throughout this section of the course.
See the little backgrounds that make the country names in this plot easier to read?@jburnmurdoch was kind enough to share his #rstats code so I could see how it was made (https://t.co/F7K171s4FR).
— David Keyes (@dgkeyes) March 11, 2020
Learned about the shadowtext package by @guangchuangyu: https://t.co/YKz3h4piKv https://t.co/1wYTSUvZuS
If you're curious to see the code used to make the above visualization, it is below. The code no longer works (I believe because the structure of the data source was changed), but you can see how it is set up. Much of what you see below should be familiar to you, which is inspiring!
read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv") %>%
gather(date, cases, 5:ncol(.)) %>%
mutate(date = as.Date(date, "%m/%d/%y")) %>%
group_by(country = `Country/Region`, date) %>%
summarise(cases = sum(cases)) %>%
filter(country != "Others" & country != "Mainland China") %>%
bind_rows(
tibble(country = "Republic of Korea", date = as.Date("2020-03-11"), cases = 7755)
) %>%
group_by(country) %>%
mutate(days_since_100 = as.numeric(date-min(date[cases >= 100]))) %>%
ungroup() %>%
filter(is.finite(days_since_100)) %>%
group_by(country) %>%
mutate(new_cases = cases-cases[days_since_100 == 0]) %>%
filter(sum(cases >= 100) >= 5) %>%
filter(cases >= 100) %>%
bind_rows(
tibble(country = "33% daily rise", days_since_100 = 0:18) %>%
mutate(cases = 100*1.33^days_since_100)
) %>%
ungroup() %>%
mutate(
country = country %>% str_replace_all("( SAR)|( \\(.+)|(Republic of )", "")
) %>%
# filter(days_since_100 <= 10) %>%
ggplot(aes(days_since_100, cases, col = country)) +
geom_hline(yintercept = 100) +
geom_vline(xintercept = 0) +
geom_line(size = 0.8) +
geom_point(pch = 21, size = 1) +
scale_y_log10(expand = expand_scale(add = c(0,0.1)), breaks=c(100, 200, 500, 1000, 2000, 5000, 10000)) +
# scale_y_continuous(expand = expand_scale(add = c(0,100))) +
scale_x_continuous(expand = expand_scale(add = c(0,1))) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
plot.margin = margin(3,15,3,3,"mm")
) +
coord_cartesian(clip = "off") +
scale_colour_manual(values = c("UK" = "#ce3140", "US" = "#EB5E8D", "Italy" = "black", "France" = "#c2b7af", "Germany" = "#c2b7af", "Hong Kong" = "#1E8FCC", "Iran" = "#9dbf57", "Japan" = "#208fce", "Singapore" = "#1E8FCC", "Korea" = "#208fce", "Belgium" = "#c2b7af", "Netherlands" = "#c2b7af", "Norway" = "#c2b7af", "Spain" = "#c2b7af", "Sweden" = "#c2b7af", "Switzerland" = "#c2b7af", "33% daily rise" = "#D9CCC3")) +
geom_shadowtext(aes(label = paste0(" ",country)), hjust=0, vjust = 0, data = . %>% group_by(country) %>% top_n(1, days_since_100), bg.color = "white") +
labs(x = "Number of days since 100th case", y = "", subtitle = "Total number of cases")
If you want to learn about high-quality data visualization more generally, I'd recommend checking out the work of Stephanie Evergreen or Ann Emery. They both offer materials to help you learn the principles of high-quality data visualization.
On the idea of small tweaks to make your data viz sparkle, you might find the "little of visualization" series by Andy Kirk interesting.
You need to be signed-in to comment on this post. Login.
Nico Schoutteet
March 6, 2023
Great introduction, thanks! Just a quick question though: any idea how to obtain the horizontal arrow in the x axis title (can't figure this out from the code that was shared)?