Skip to content
R in 3 Months Starts March 13. Learn More →
R for the Rest of Us Logo

Going Deeper with R

Data Merging

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx

# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
#
# download.file("https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts-and-schools.xlsx",
#               mode = "wb",
#               destfile = "data-raw/oregon-districts-and-schools.xlsx")

# Import, Tidy, and Clean Data --------------------------------------------

clean_math_proficiency_data <- function(raw_data) {
  
  read_excel(path = raw_data) |> 
    clean_names() |> 
    filter(student_group == "Total Population (All Students)") |> 
    filter(grade_level == "Grade 3") |> 
    select(academic_year, school_id, contains("number_level")) |> 
    pivot_longer(cols = starts_with("number_level"),
                 names_to = "proficiency_level",
                 values_to = "number_of_students") |> 
    mutate(proficiency_level = case_when(
      proficiency_level == "number_level_4" ~ "4",
      proficiency_level == "number_level_3" ~ "3",
      proficiency_level == "number_level_2" ~ "2",
      proficiency_level == "number_level_1" ~ "1"
    )) |> 
    mutate(number_of_students = parse_number(number_of_students)) |> 
    group_by(school_id) |> 
    mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |> 
    ungroup()
  
}


third_grade_math_proficiency_2021_2022 <-
  clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")

third_grade_math_proficiency_2018_2019 <-
  clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")


third_grade_math_proficiency <- 
  bind_rows(third_grade_math_proficiency_2018_2019,
            third_grade_math_proficiency_2021_2022)

oregon_districts_and_schools <- 
  read_excel("data-raw/oregon-districts-and-schools.xlsx") |> 
  clean_names() |> 
  rename(school_id = attending_school_institutional_id)

left_join(third_grade_math_proficiency,
          oregon_districts_and_schools,
          join_by(school_id)) |> 
  view()

Your Turn

  1. Download the oregon-districts.xlsx file into the data-raw folder. You can download it from this URL: https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts.xlsx

  2. Import a new data frame called oregon_districts from oregon-districts.xlsx.

  3. Merge the oregon_districts data frame into the enrollment_by_race_ethnicity data frame so you can see the names of the districts.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Marisel Strand

Marisel Strand • March 18, 2024

Hello, I was able to run this code, but now I get this message:

left_join(enrollment_by_race_ethnicity,
          oregon_districts,
          join_by(district_institution_id)) |>
  view()

Error in UseMethod("left_join") : 
  no applicable method for 'left_join' applied to an object of class "NULL"

--Thanks!

David Keyes

David Keyes Founder • March 18, 2024

Are you sure you have created both objects prior to running the left_join()? If so, can you please share your full code so I can review it?

Marisel Strand

Marisel Strand • March 19, 2024

I was able to run the join_left() function. Then when I was working on the next section: "Exporting data", it didn't work. So, I tried running everything again, and that's when my code to merge them stopped working.

David Keyes

David Keyes Founder • March 19, 2024

Can you please post your full code so I can review that?

Matt Newman

Matt Newman • April 29, 2024

I want to make sure I'm understanding "left" and "right" join: Do they just refer to the first and second dataframes listed?

For example, would

left_join(french_speakers_2019,
german_speakers_2019,
join_by(state))

Produce the same result as right_join(german_speakers_2019, french_speakers_2019, join_by(state))

?
David Keyes

David Keyes Founder • April 29, 2024

Yes, that's right!

Matt Newman

Matt Newman • April 30, 2024

Thank you!

Tosan Okome

Tosan Okome • December 5, 2024

I tried practicing with variables of different names and ran as shown in the lesson video:

join_by (id_variable_1 = id_variable_2) 

But I got an error.

Just a reminder to use "==" instead. So to work with variables of different names, it's:

join_by (id_variable_1 == id_variable_2) 
David Keyes

David Keyes Founder • December 9, 2024

Thanks, Tosan! Fix coming soon.

Course Content

44 Lessons