Data Merging
This lesson is called Data Merging, part of the R in 3 Months (Spring 2025) course. This lesson is called Data Merging, part of the R in 3 Months (Spring 2025) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
#
# download.file("https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts-and-schools.xlsx",
# mode = "wb",
# destfile = "data-raw/oregon-districts-and-schools.xlsx")
# Import, Tidy, and Clean Data --------------------------------------------
clean_math_proficiency_data <- function(raw_data) {
read_excel(path = raw_data) |>
clean_names() |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level")) |>
pivot_longer(cols = starts_with("number_level"),
names_to = "proficiency_level",
values_to = "number_of_students") |>
mutate(proficiency_level = case_when(
proficiency_level == "number_level_4" ~ "4",
proficiency_level == "number_level_3" ~ "3",
proficiency_level == "number_level_2" ~ "2",
proficiency_level == "number_level_1" ~ "1"
)) |>
mutate(number_of_students = parse_number(number_of_students)) |>
group_by(school_id) |>
mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |>
ungroup()
}
third_grade_math_proficiency_2021_2022 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
third_grade_math_proficiency_2018_2019 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
third_grade_math_proficiency <-
bind_rows(third_grade_math_proficiency_2018_2019,
third_grade_math_proficiency_2021_2022)
oregon_districts_and_schools <-
read_excel("data-raw/oregon-districts-and-schools.xlsx") |>
clean_names() |>
rename(school_id = attending_school_institutional_id)
left_join(third_grade_math_proficiency,
oregon_districts_and_schools,
join_by(school_id)) |>
view()
Your Turn
Download the
oregon-districts.xlsx
file into thedata-raw
folder. You can download it from this URL:https://github.com/rfortherestofus/going-deeper-v2/raw/main/data-raw/oregon-districts.xlsx
Import a new data frame called
oregon_districts
fromoregon-districts.xlsx
.Merge the
oregon_districts
data frame into theenrollment_by_race_ethnicity
data frame so you can see the names of the districts.
Have any questions? Put them below and we will help you out!
Course Content
127 Lessons
1
Welcome to Getting Started with R
00:57
2
Install R
02:05
3
Install RStudio
02:14
4
Files in R
04:33
5
Projects
07:54
6
Packages
02:38
7
Import Data
05:24
8
Objects and Functions
03:16
9
Examine our Data
12:50
10
Import Our Data Again
07:11
11
Getting Help
07:46
12
Week 1 Live Session (Spring 2025)
1:03:11
1
Welcome to Fundamentals of R
01:36
2
Update Everything
02:45
3
Start a New Project
02:16
4
The Tidyverse
03:34
5
Pipes
04:15
6
select()
07:25
7
mutate()
04:25
8
filter()
10:05
9
summarize()
05:59
10
group_by() and summarize()
05:54
11
arrange()
02:07
12
Create a New Data Frame
03:58
13
Bring it All Together (Data Wrangling)
07:29
14
Week 2 Project Assignment
09:39
15
Week 2 Coworking Session (Spring 2025)
16
Week 2 Live Session (Spring 2025)
1:03:24
1
The Grammar of Graphics
04:39
2
Scatterplots
03:46
3
Histograms
05:47
4
Bar Charts
06:37
5
Setting color and fill Aesthetic Properties
02:39
6
Setting color and fill Scales
05:40
7
Setting x and y Scales
03:09
8
Adding Text to Plots
07:32
9
Plot Labels
03:57
10
Themes
02:19
11
Facets
03:12
12
Save Plots
02:57
13
Bring it All Together (Data Visualization)
06:42
14
Week 3 Project Assignment
03:30
15
Week 3 Coworking Session (Spring 2025)
16
Week 3 Live Session (Spring 2025)
1:02:31
1
Downloading and Importing Data
10:32
2
Overview of Tidy Data
05:50
3
Tidy Data Rule #1: Every Column is a Variable
07:43
4
Tidy Data Rule #3: Every Cell is a Single Value
10:04
5
Tidy Data Rule #2: Every Row is an Observation
04:42
6
Week 6 Coworking Session (Spring 2025)
7
Week 6 Live Session (Spring 2025)
1:02:38
1
Best Practices in Data Visualization
03:44
2
Tidy Data
02:25
3
Pipe Data into ggplot
09:54
4
Reorder Plots to Highlight Findings
03:37
5
Line Charts
04:17
6
Use Color to Highlight Findings
09:16
7
Declutter
08:29
8
Add Descriptive Labels to Your Plots
09:10
9
Use Titles to Highlight Findings
08:14
10
Use Annotations to Explain
07:09
11
Week 9 Coworking Session (Spring 2025)
12
Week 9 Live Session (Spring 2025)
59:09
1
Advanced Markdown
06:43
2
Tables
18:36
3
Advanced YAML and Code Chunk Options
05:53
4
Inline R Code
04:42
5
Making Your Reports Shine: Word Edition
04:30
6
Making Your Reports Shine: PDF Edition
06:11
7
Making Your Reports Shine: HTML Edition
06:06
8
Presentations
10:12
9
Dashboards
05:38
10
Websites
06:43
11
Publishing Your Work
04:38
12
Quarto Extensions
05:50
13
Parameterized Reporting, Part 1
10:57
14
Parameterized Reporting, Part 2
05:11
15
Parameterized Reporting, Part 3
07:47
16
Week 12 Coworking Session (Spring 2025)
17
Week 12 Live Session (Spring 2025)
57:01
You need to be signed-in to comment on this post. Login.
Marisel Strand • March 18, 2024
Hello, I was able to run this code, but now I get this message:
--Thanks!
David Keyes Founder • March 18, 2024
Are you sure you have created both objects prior to running the
left_join()
? If so, can you please share your full code so I can review it?Marisel Strand • March 19, 2024
I was able to run the join_left() function. Then when I was working on the next section: "Exporting data", it didn't work. So, I tried running everything again, and that's when my code to merge them stopped working.
David Keyes Founder • March 19, 2024
Can you please post your full code so I can review that?
Matt Newman • April 29, 2024
I want to make sure I'm understanding "left" and "right" join: Do they just refer to the first and second dataframes listed?
For example, would
Produce the same result as right_join(german_speakers_2019, french_speakers_2019, join_by(state))
David Keyes Founder • April 29, 2024
Yes, that's right!
Matt Newman • April 30, 2024
Thank you!
Tosan Okome • December 5, 2024
I tried practicing with variables of different names and ran as shown in the lesson video:
But I got an error.
Just a reminder to use "==" instead. So to work with variables of different names, it's:
David Keyes Founder • December 9, 2024
Thanks, Tosan! Fix coming soon.