Data Merging
This lesson is called Data Merging, part of the Going Deeper with R course. This lesson is called Data Merging, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Loading transcript...
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
#
# download.file(
# "https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts-and-schools.xlsx",
# mode = "wb",
# destfile = "data-raw/oregon-districts-and-schools.xlsx"
# )
# Import, Tidy, and Clean Data -------------------------------------------
clean_math_proficiency_data <- function(raw_data) {
read_excel(
path = raw_data
) |>
clean_names() |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level")) |>
pivot_longer(
cols = starts_with("number_level"),
names_to = "proficiency_level",
values_to = "number_of_students"
) |>
mutate(proficiency_level = parse_number(proficiency_level)) |>
mutate(number_of_students = parse_number(number_of_students)) |>
mutate(
pct = number_of_students / sum(number_of_students, na.rm = TRUE),
.by = school_id
)
}
third_grade_math_proficiency_2021_2022 <-
clean_math_proficiency_data(
raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx"
)
third_grade_math_proficiency_2018_2019 <-
clean_math_proficiency_data(
raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx"
)
third_grade_math_proficiency <-
bind_rows(
third_grade_math_proficiency_2021_2022,
third_grade_math_proficiency_2018_2019
)
third_grade_math_proficiency
oregon_districts_and_schools <-
read_excel("data-raw/oregon-districts-and-schools.xlsx") |>
clean_names() |>
rename(school_id = attending_school_institutional_id) |>
glimpse()
left_join(
third_grade_math_proficiency,
oregon_districts_and_schools
) |>
glimpse()
Your Turn
Download the
oregon-districts.xlsxfile into thedata-rawfolder. You can download it from this URL:https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts.xlsxImport a new data frame called
oregon_districtsfromoregon-districts.xlsx.Merge the
oregon_districtsdata frame into theenrollment_by_race_ethnicitydata frame so you can see the names of the districts.
Have any questions? Put them below and we will help you out!
Course Content
44 Lessons
You need to be signed-in to comment on this post. Login.
Eda Akpek • April 28, 2026
Which enrollment_by_race_ethnicity dataframe are we using? I have the ones from 2021-2022 and 2022-2023
Gracielle Higino Coach • May 1, 2026
Hi Eda! I don't think it matters in this case, but in the last exercise both 2021-2022 and 2022-2023 were combined together. So maybe create a
enrollment_by_race_ethnicityobject with these combined datasets, and you should be able to complete the exercise!Felipe Coelho • May 6, 2026
I reckon that the video with the solution here is the same as the one from the previous class.
Gracielle Higino Coach • May 7, 2026
Thanks for noticing, Felipe! We're looking into it.