Data Merging
This lesson is called Data Merging, part of the R in 3 Months (Fall 2025) course. This lesson is called Data Merging, part of the R in 3 Months (Fall 2025) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Loading transcript...
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
#
# download.file("https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts-and-schools.xlsx",
# mode = "wb",
# destfile = "data-raw/oregon-districts-and-schools.xlsx")
# Import, Tidy, and Clean Data --------------------------------------------
clean_math_proficiency_data <- function(raw_data) {
read_excel(path = raw_data) |>
clean_names() |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level")) |>
pivot_longer(cols = starts_with("number_level"),
names_to = "proficiency_level",
values_to = "number_of_students") |>
mutate(proficiency_level = case_when(
proficiency_level == "number_level_4" ~ "4",
proficiency_level == "number_level_3" ~ "3",
proficiency_level == "number_level_2" ~ "2",
proficiency_level == "number_level_1" ~ "1"
)) |>
mutate(number_of_students = parse_number(number_of_students)) |>
group_by(school_id) |>
mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |>
ungroup()
}
third_grade_math_proficiency_2021_2022 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
third_grade_math_proficiency_2018_2019 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
third_grade_math_proficiency <-
bind_rows(third_grade_math_proficiency_2018_2019,
third_grade_math_proficiency_2021_2022)
oregon_districts_and_schools <-
read_excel("data-raw/oregon-districts-and-schools.xlsx") |>
clean_names() |>
rename(school_id = attending_school_institutional_id)
left_join(third_grade_math_proficiency,
oregon_districts_and_schools,
join_by(school_id)) |>
view()
Your Turn
Download the
oregon-districts.xlsxfile into thedata-rawfolder. You can download it from this URL:https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts.xlsxImport a new data frame called
oregon_districtsfromoregon-districts.xlsx.Merge the
oregon_districtsdata frame into theenrollment_by_race_ethnicitydata frame so you can see the names of the districts.
Have any questions? Put them below and we will help you out!
Course Content
128 Lessons
You need to be signed-in to comment on this post. Login.
Marisel Strand • March 18, 2024
Hello, I was able to run this code, but now I get this message:
--Thanks!
David Keyes Founder • March 18, 2024
Are you sure you have created both objects prior to running the
left_join()? If so, can you please share your full code so I can review it?Marisel Strand • March 19, 2024
I was able to run the join_left() function. Then when I was working on the next section: "Exporting data", it didn't work. So, I tried running everything again, and that's when my code to merge them stopped working.
David Keyes Founder • March 19, 2024
Can you please post your full code so I can review that?
Matt Newman • April 29, 2024
I want to make sure I'm understanding "left" and "right" join: Do they just refer to the first and second dataframes listed?
For example, would
Produce the same result as right_join(german_speakers_2019, french_speakers_2019, join_by(state))
David Keyes Founder • April 29, 2024
Yes, that's right!
Matt Newman • April 30, 2024
Thank you!
Tosan Okome • December 5, 2024
I tried practicing with variables of different names and ran as shown in the lesson video:
But I got an error.
Just a reminder to use "==" instead. So to work with variables of different names, it's:
David Keyes Founder • December 9, 2024
Thanks, Tosan! Fix coming soon.