Skip to content
R for the Rest of Us Logo

Use Annotations to Explain

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(scales)
library(ggrepel)
library(ggtext)

# Create Directory --------------------------------------------------------

dir_create("data")

# Download Data -----------------------------------------------------------

# download.file("https://github.com/rfortherestofus/going-deeper-v2/raw/main/data/third_grade_math_proficiency.rds",
#               mode = "wb",
#               destfile = "data/third_grade_math_proficiency.rds")

# Import Data -------------------------------------------------------------

third_grade_math_proficiency <- 
  read_rds("data/third_grade_math_proficiency.rds") |> 
  select(academic_year, school, school_id, district, proficiency_level, number_of_students) |> 
  mutate(is_proficient = case_when(
    proficiency_level >= 3 ~ TRUE,
    .default = FALSE
  )) |> 
  group_by(academic_year, school, district, school_id, is_proficient) |> 
  summarize(number_of_students = sum(number_of_students, na.rm = TRUE)) |> 
  ungroup() |> 
  group_by(academic_year, school, district, school_id) |> 
  mutate(percent_proficient = number_of_students / sum(number_of_students, na.rm = TRUE)) |> 
  ungroup() |> 
  filter(is_proficient == TRUE) |> 
  select(academic_year, school, district, percent_proficient) |> 
  rename(year = academic_year) |> 
  mutate(percent_proficient = case_when(
    is.nan(percent_proficient) ~ NA,
    .default = percent_proficient
  ))

# Plot --------------------------------------------------------------------

top_growth_school <- 
  third_grade_math_proficiency |>
  filter(district == "Portland SD 1J") |> 
  group_by(school) |> 
  mutate(growth_from_previous_year = percent_proficient - lag(percent_proficient)) |> 
  ungroup() |> 
  drop_na(growth_from_previous_year) |>
  slice_max(order_by = growth_from_previous_year,
            n = 1) |> 
  pull(school)

third_grade_math_proficiency |>
  filter(district == "Portland SD 1J") |>
  mutate(highlight_school = case_when(
    school == top_growth_school ~ "Y",
    .default = "N"
  )) |> 
  mutate(percent_proficient_formatted = case_when(
    school == top_growth_school ~ percent(percent_proficient, accuracy = 1)
  )) |> 
  mutate(percent_proficient_formatted = case_when(
    highlight_school == "Y" & year == "2021-2022" ~ str_glue("{percent_proficient_formatted} of students
                                                             were proficient 
                                                             in {year}"),
    highlight_school == "Y" & year == "2018-2019" ~ percent_proficient_formatted
  )) |> 
  mutate(school = fct_relevel(school, top_growth_school, after = Inf)) |>
  ggplot(aes(x = year,
             y = percent_proficient,
             group = school,
             color = highlight_school,
             label = percent_proficient_formatted)) +
  geom_line() +
  geom_text_repel(hjust = 0,
                  lineheight = 0.9,
                  direction = "x") +
  scale_color_manual(values = c(
    "N" = "grey90",
    "Y" = "orange"
  )) +
  scale_y_continuous(labels = percent_format()) +
  annotate(geom = "text",
           x = 2.02,
           y = 0.6,
           hjust = 0,
           lineheight = 0.9,
           color = "grey80",
           label = str_glue("Each grey line
                            represents one school")) +
  labs(title = str_glue("<b style='color: orange;'>{top_growth_school}</b> 
                        showed large growth in math proficiency over the
                        last two years")) +
  theme_minimal() +
  theme(axis.title = element_blank(),
        legend.position = "none",
        plot.title = element_markdown(),
        plot.title.position = "plot",
        panel.grid = element_blank())

Your Turn

Add an annotation to explain what the grey lines represent

(Optional) You're welcome to add other annotations as well in case you want to test out the power of the annotate function.

Learn More

The article How to add annotations in ggplot: should you use geoms or annotations? by Albert Rapp is a good overview of how to select between using geom_text() and annotate().

Cara Thompson's blog post Level Up Your Labels: Tips and Tricks for Annotating Plots is a masterclass in using annotations to improve the quality of your plots.

This video, titled How to add annotations to ggplots in R, is a good walkthrough of using annotations in ggplot.

If you want to learn more about the importance of annotation in data visualization, check out this article from Elijah Meeks titled Making Annotations First-Class Citizens in Data Visualization. Also check out this article from Alberto Cairo discussing another example of work from the Financial Times that uses annotations well (folks at the FT are experts at annotations, in case you haven’t yet picked that up!).

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Ally de Alcuaz

Ally de Alcuaz

November 16, 2023

Late to the game here, but does the placement of the annotation code matter? it seems to run fine on my end when I add it to the end of the code from the previous lesson. I can see why that might not be best practice and could lead to confusion though?

Gracielle Higino

Gracielle Higino Coach

November 16, 2023

Hi Ally! Good question. I think the best practice is to keep building blocks of code together. For example, if you need to do transformations in your data within a ggplot call, you should keep these transformations together; or define details for the theme around the same lines; or add your annotations in a block. So, yeah, it should work if you add your annotation in the end and it seems like a good practice to me. I guess it could be confusing depending on how you have organized the rest of your code, but if it's all well organized and commented on, it should be fine! [=