Skip to content
Coming soon: Ally. Your guide to the world of AI and R. Learn More →
R for the Rest of Us Logo

Going Deeper with R

Data Merging

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx

# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
#
# download.file(
#   "https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts-and-schools.xlsx",
#   mode = "wb",
#   destfile = "data-raw/oregon-districts-and-schools.xlsx"
# )

# Import, Tidy, and Clean Data -------------------------------------------

clean_math_proficiency_data <- function(raw_data) {
  read_excel(
    path = raw_data
  ) |>
    clean_names() |>
    filter(student_group == "Total Population (All Students)") |>
    filter(grade_level == "Grade 3") |>
    select(academic_year, school_id, contains("number_level")) |>
    pivot_longer(
      cols = starts_with("number_level"),
      names_to = "proficiency_level",
      values_to = "number_of_students"
    ) |>
    mutate(proficiency_level = parse_number(proficiency_level)) |>
    mutate(number_of_students = parse_number(number_of_students)) |>
    mutate(
      pct = number_of_students / sum(number_of_students, na.rm = TRUE),
      .by = school_id
    )
}


third_grade_math_proficiency_2021_2022 <-
  clean_math_proficiency_data(
    raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx"
  )

third_grade_math_proficiency_2018_2019 <-
  clean_math_proficiency_data(
    raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx"
  )

third_grade_math_proficiency <-
  bind_rows(
    third_grade_math_proficiency_2021_2022,
    third_grade_math_proficiency_2018_2019
  )

third_grade_math_proficiency

oregon_districts_and_schools <-
  read_excel("data-raw/oregon-districts-and-schools.xlsx") |>
  clean_names() |>
  rename(school_id = attending_school_institutional_id) |>
  glimpse()

left_join(
  third_grade_math_proficiency,
  oregon_districts_and_schools
) |>
  glimpse()

Your Turn

  1. Download the oregon-districts.xlsx file into the data-raw folder. You can download it from this URL: https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts.xlsx

  2. Import a new data frame called oregon_districts from oregon-districts.xlsx.

  3. Merge the oregon_districts data frame into the enrollment_by_race_ethnicity data frame so you can see the names of the districts.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Eda Akpek

Eda Akpek • April 28, 2026

Which enrollment_by_race_ethnicity dataframe are we using? I have the ones from 2021-2022 and 2022-2023

Gracielle Higino

Gracielle Higino Coach • May 1, 2026

Hi Eda! I don't think it matters in this case, but in the last exercise both 2021-2022 and 2022-2023 were combined together. So maybe create a enrollment_by_race_ethnicity object with these combined datasets, and you should be able to complete the exercise!

Felipe Coelho

Felipe Coelho • May 6, 2026

I reckon that the video with the solution here is the same as the one from the previous class.

Gracielle Higino

Gracielle Higino Coach • May 7, 2026

Thanks for noticing, Felipe! We're looking into it.

Course Content

44 Lessons