Skip to content
R in 3 Months Starts March 13. Learn More →
R for the Rest of Us Logo

Going Deeper with R

Use Color to Highlight Findings

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)

# Create Directory --------------------------------------------------------

dir_create("data")

# Download Data -----------------------------------------------------------

# download.file("https://github.com/rfortherestofus/going-deeper-v2/raw/main/data/third_grade_math_proficiency.rds",
#               mode = "wb",
#               destfile = "data/third_grade_math_proficiency.rds")

# Import Data -------------------------------------------------------------

third_grade_math_proficiency <- 
  read_rds("data/third_grade_math_proficiency.rds") |> 
  select(academic_year, school, school_id, district, proficiency_level, number_of_students) |> 
  mutate(is_proficient = case_when(
    proficiency_level >= 3 ~ TRUE,
    .default = FALSE
  )) |> 
  group_by(academic_year, school, district, school_id, is_proficient) |> 
  summarize(number_of_students = sum(number_of_students, na.rm = TRUE)) |> 
  ungroup() |> 
  group_by(academic_year, school, district, school_id) |> 
  mutate(percent_proficient = number_of_students / sum(number_of_students, na.rm = TRUE)) |> 
  ungroup() |> 
  filter(is_proficient == TRUE) |> 
  select(academic_year, school, district, percent_proficient) |> 
  rename(year = academic_year) |> 
  mutate(percent_proficient = case_when(
    is.nan(percent_proficient) ~ NA,
    .default = percent_proficient
  ))

# Plot --------------------------------------------------------------------

top_growth_school <- 
  third_grade_math_proficiency |>
  filter(district == "Portland SD 1J") |> 
  group_by(school) |> 
  mutate(growth_from_previous_year = percent_proficient - lag(percent_proficient)) |> 
  ungroup() |> 
  drop_na(growth_from_previous_year) |>
  slice_max(order_by = growth_from_previous_year,
            n = 1) |> 
  pull(school)

third_grade_math_proficiency |>
  filter(district == "Portland SD 1J") |>
  mutate(highlight_school = case_when(
    school == top_growth_school ~ "Y",
    .default = "N"
  )) |> 
  mutate(school = fct_relevel(school, top_growth_school, after = Inf)) |>
  ggplot(aes(x = year,
             y = percent_proficient,
             group = school,
             color = highlight_school)) +
  geom_line() +
  scale_color_manual(values = c(
    "N" = "grey80",
    "Y" = "orange"
  ))

Your Turn

Highlight the district in your line chart that had the largest increase in its Hispanic/Latino population between 2021-2022 and 2022-2023.

Learn More

The Datawrapper blog has an amazing blog post by Lisa Charlotte Muth on using color effectively in data viz.

The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Matt Newman

Matt Newman • May 6, 2024

Really interesting set of lessons here! But I kept making dumb mistakes in my code by confusing "top_growth_hispanic" with "highlight_district." Could you walk through why both steps were needed to make these customizations work? Why create the "highlight_district" variable versus just plotting with the "top_growth_hispanic" value? Thank you!

top_growth_hispanic <-
     enrollment_by_race_ethnicity |> 
     filter(race_ethnicity == "Hispanic/Latino") |> 
     group_by(district) |> 
     mutate(hispanic_growth = pct - lag(pct)) |> 
     ungroup() |> 
     drop_na(hispanic_growth) |> 
     slice_max(order_by = hispanic_growth,
               n=1) |> 
     pull(district)

enrollment_by_race_ethnicity |> 
     filter(race_ethnicity == "Hispanic/Latino") |> 
     mutate(highlight_district = case_when(
          district == top_growth_hispanic ~ "Y",
          .default = "N"
     )) |> 

Course Content

44 Lessons