Data Merging
This lesson is called Data Merging, part of the Going Deeper with R course. This lesson is called Data Merging, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
#
# download.file("https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts-and-schools.xlsx",
# mode = "wb",
# destfile = "data-raw/oregon-districts-and-schools.xlsx")
# Import, Tidy, and Clean Data --------------------------------------------
clean_math_proficiency_data <- function(raw_data) {
read_excel(path = raw_data) |>
clean_names() |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level")) |>
pivot_longer(cols = starts_with("number_level"),
names_to = "proficiency_level",
values_to = "number_of_students") |>
mutate(proficiency_level = case_when(
proficiency_level == "number_level_4" ~ "4",
proficiency_level == "number_level_3" ~ "3",
proficiency_level == "number_level_2" ~ "2",
proficiency_level == "number_level_1" ~ "1"
)) |>
mutate(number_of_students = parse_number(number_of_students)) |>
group_by(school_id) |>
mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |>
ungroup()
}
third_grade_math_proficiency_2021_2022 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
third_grade_math_proficiency_2018_2019 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
third_grade_math_proficiency <-
bind_rows(third_grade_math_proficiency_2018_2019,
third_grade_math_proficiency_2021_2022)
oregon_districts_and_schools <-
read_excel("data-raw/oregon-districts-and-schools.xlsx") |>
clean_names() |>
rename(school_id = attending_school_institutional_id)
left_join(third_grade_math_proficiency,
oregon_districts_and_schools,
join_by(school_id)) |>
view()
Your Turn
Download the
oregon-districts.xlsx
file into thedata-raw
folder. You can download it from this URL:https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts.xlsx
Import a new data frame called
oregon_districts
fromoregon-districts.xlsx
.Merge the
oregon_districts
data frame into theenrollment_by_race_ethnicity
data frame so you can see the names of the districts.
Have any questions? Put them below and we will help you out!
Course Content
44 Lessons
1
Downloading and Importing Data
10:32
2
Overview of Tidy Data
05:50
3
Tidy Data Rule #1: Every Column is a Variable
07:43
4
Tidy Data Rule #3: Every Cell is a Single Value
10:04
5
Tidy Data Rule #2: Every Row is an Observation
04:42
6
Changing Variable Types
04:51
7
Dealing with Missing Data
04:55
8
Advanced Summarizing
06:25
9
Binding Data Frames
07:17
10
Functions
15:06
11
Data Merging
09:27
12
Exporting Data
04:38
13
Bring It All Together (Advanced Data Wrangling)
13:03
1
Best Practices in Data Visualization
03:44
2
Tidy Data
02:25
3
Pipe Data into ggplot
09:54
4
Reorder Plots to Highlight Findings
03:37
5
Line Charts
04:17
6
Use Color to Highlight Findings
09:16
7
Declutter
08:29
8
Add Descriptive Labels to Your Plots
09:10
9
Use Titles to Highlight Findings
08:14
10
Use Annotations to Explain
07:09
11
Tweak Spacing
05:11
12
Create a Custom Theme
03:47
13
Customize Your Fonts
08:32
14
Try New Plot Types
03:24
15
Bring it All Together (Advanced Data Visualization)
14:30
1
Advanced Markdown
06:43
2
Advanced YAML and Code Chunk Options
05:53
3
Tables
18:36
4
Inline R Code
04:42
5
Making Your Reports Shine: Word Edition
04:30
6
Making Your Reports Shine: PDF Edition
06:11
7
Making Your Reports Shine: HTML Edition
06:06
8
Presentations
10:12
9
Dashboards
05:38
10
Websites
06:43
11
Publishing Your Work
04:38
12
Quarto Extensions
05:50
13
Parameterized Reporting, Part 1
10:57
14
Parameterized Reporting, Part 2
05:11
15
Parameterized Reporting, Part 3
07:47
16
Wrapping up Going Deeper with R
You need to be signed-in to comment on this post. Login.
Marisel Strand • March 18, 2024
Hello, I was able to run this code, but now I get this message:
--Thanks!
David Keyes Founder • March 18, 2024
Are you sure you have created both objects prior to running the
left_join()
? If so, can you please share your full code so I can review it?Marisel Strand • March 19, 2024
I was able to run the join_left() function. Then when I was working on the next section: "Exporting data", it didn't work. So, I tried running everything again, and that's when my code to merge them stopped working.
David Keyes Founder • March 19, 2024
Can you please post your full code so I can review that?
Matt Newman • April 29, 2024
I want to make sure I'm understanding "left" and "right" join: Do they just refer to the first and second dataframes listed?
For example, would
Produce the same result as right_join(german_speakers_2019, french_speakers_2019, join_by(state))
David Keyes Founder • April 29, 2024
Yes, that's right!
Matt Newman • April 30, 2024
Thank you!
Tosan Okome • December 5, 2024
I tried practicing with variables of different names and ran as shown in the lesson video:
But I got an error.
Just a reminder to use "==" instead. So to work with variables of different names, it's:
David Keyes Founder • December 9, 2024
Thanks, Tosan! Fix coming soon.