-
Advanced Data Wrangling
- Downloading and Importing Data
- Overview of Tidy Data
- Tidy Data Rule #1: Every Column is a Variable
- Tidy Data Rule #3: Every Cell is a Single Value
- Tidy Data Rule #2: Every Row is an Observation
- Changing Variable Types
- Dealing with Missing Data
- Advanced Summarizing
- Binding Data Frames
- Functions
- Data Merging
- Exporting Data
- Bring It All Together (Advanced Data Wrangling)
-
Advanced Data Visualization
- Best Practices in Data Visualization
- Tidy Data
- Pipe Data into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Add Descriptive Labels to Your Plots
- Use Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Create a Custom Theme
- Customize Your Fonts
- Try New Plot Types
- Bring it All Together (Advanced Data Visualization)
-
Quarto
- Advanced Markdown
- Advanced YAML and Code Chunk Options
- Tables
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: PDF Edition
- Making Your Reports Shine: HTML Edition
- Presentations
- Dashboards
- Websites
- Publishing Your Work
- Quarto Extensions
- Parameterized Reporting, Part 1
- Parameterized Reporting, Part 2
- Parameterized Reporting, Part 3
- Wrapping up Going Deeper with R
Going Deeper with R
Functions
This lesson is called Functions, part of the Going Deeper with R course. This lesson is called Functions, part of the Going Deeper with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
# Import, Tidy, and Clean Data --------------------------------------------
clean_math_proficiency_data <- function(raw_data) {
read_excel(path = raw_data) |>
clean_names() |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level")) |>
pivot_longer(cols = starts_with("number_level"),
names_to = "proficiency_level",
values_to = "number_of_students") |>
mutate(proficiency_level = case_when(
proficiency_level == "number_level_4" ~ "4",
proficiency_level == "number_level_3" ~ "3",
proficiency_level == "number_level_2" ~ "2",
proficiency_level == "number_level_1" ~ "1"
)) |>
mutate(number_of_students = parse_number(number_of_students)) |>
group_by(school_id) |>
mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |>
ungroup()
}
third_grade_math_proficiency_2021_2022 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
third_grade_math_proficiency_2018_2019 <-
clean_math_proficiency_data(raw_data = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
third_grade_math_proficiency <-
bind_rows(third_grade_math_proficiency_2018_2019,
third_grade_math_proficiency_2021_2022)
Your Turn
Create a function to clean each year of enrollment data.
To check that your function works, create
enrollment_by_race_ethnicity_2021_2022
andenrollment_by_race_ethnicity_2022_2023
data frames and then bind them together withbind_rows()
.
This exercise is challenging! Use the starter code below to help you if you need to.
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20222023.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20212022.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20202021.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20192020.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20182019.xlsx")
# Import, Tidy, and Clean Data -----------------------------------------------------
clean_enrollment_data <- function(excel_file,
sheet_name) {
read_excel(path = YOURCODEHERE,
sheet = YOURCODEHERE) |>
clean_names() |>
# I've selected by column position rather than names
# because the column names vary in the data between years
# but they're always in the same positions
select(1, 3, 7:19) |>
select(-contains("percent")) |>
set_names("district_institution_id",
YOURCODEHERE,
YOURCODEHERE,
YOURCODEHERE,
YOURCODEHERE,
YOURCODEHERE,
YOURCODEHERE,
YOURCODEHERE,
YOURCODEHERE) |>
pivot_longer(cols = -c(district_institution_id, school_institution_id),
names_to = "race_ethnicity",
values_to = "number_of_students") |>
mutate(race_ethnicity = case_when(
race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
race_ethnicity == "asian" ~ "Asian",
race_ethnicity == "black_african_american" ~ "Black/African American",
race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
race_ethnicity == "multiracial" ~ "Multi-Racial",
race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
race_ethnicity == "white" ~ "White",
race_ethnicity == "multi_racial" ~ "Multiracial"
)) |>
mutate(number_of_students = parse_number(number_of_students)) |>
group_by(district_institution_id, race_ethnicity) |>
summarize(number_of_students = sum(number_of_students, na.rm = TRUE)) |>
ungroup() |>
group_by(district_institution_id) |>
mutate(pct = number_of_students / sum(number_of_students)) |>
ungroup() |>
mutate(year = sheet_name)
}
enrollment_by_race_ethnicity_2021_2022 <-
clean_enrollment_data(excel_file = YOURCODEHERE,
sheet_name = YOURCODEHERE)
enrollment_by_race_ethnicity_2022_2023 <-
clean_enrollment_data(excel_file = YOURCODEHERE,
sheet_name = YOURCODEHERE)
enrollment_by_race_ethnicity <-
bind_rows(enrollment_by_race_ethnicity_2021_2022,
enrollment_by_race_ethnicity_2022_2023)
Learn More
If you want to learn more about the global options I showed in this lesson, the video from another lesson is below.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
You need to be signed-in to comment on this post. Login.
Danisha Lequet
December 17, 2023
enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx", sheet = "School 2022-23") |> clean_names() math_scores_2021_2022 <- read_excel ("data-raw/pagr schools math tot raceethnicity_2122.xlsx") |> clean_names()
math_scores_2021_2022 |> filter(student_group == "Total Population (All Students)") |> filter(grade_level == "Grade 3") glimpse select(academic_year, school_id, contains("number_level_")) |> view() pivot_longer(cols = contains("number_level_")), view() names_to = "proficiency_levels", values_to = "number_of_students")
third_grade_math_proficiency_2021_2022 |> mutate(proficiency_level = str_remove(proficiency_level, pattern = "number_level_")) third_grade_math_proficiency_2021_2022 |> mutate(proficiency_level = recode(proficiency_level, "number_level_4" = "4", "number_level_3" = "3", "number_level_2" = "2", "number_level_1" = "1"))
third_grade_math_proficiency_2021_2022 |> mutate(proficiency_level = case_when( proficiency_level == "number_level_4" ~"4" proficiency_level == "number_level_4" ~"3" proficiency_level == "number_level_4" ~"2" proficiency_level == "number_level_4" ~"1"
))
third_grade_math_proficiency_2021_2022 |> mutate(proficiency_level = parse_number(proficiency_level))
enrollment_by_race_ethnicity_2022_2023 <- enrollment_2022_2023 |> select(district_institution_id, x2022_23_american_indian_alaska_native:x2022_23_percent_multi_racial) |> select(-contains("percent")) |> pivot_longer(cols = -district_institution_id, names_to = "race_ethnicity", values_to = "number_of_students")
mutate(number_of_students = parse_number(number_of_students) bind_rows(third_grade_proficiency_2018_2019)
gss_cat |> view() third_grade_math_proficiency_2021_2022 |> group_by(school_id) mutate(pct = number_of_students / sum(number_of_students, na.rm =TRUE
third_grade_math_proficiency_2021_2022 |> enrollment_by_race_ethnicity_2022_2023 <- enrollment_2022_2023 |> select(district_institution_id, school_institution_id, x2022_23_american_indian_alaska_native:x2022_23_multi_racial) |> select(-contains("percent")) |> pivot_longer(cols = -c(district_institution_id, school_institution_id), names_to = "race_ethnicity", values_to = "number_of_students") |> mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |> mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "multiracial" ~ "Multi-Racial", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander", race_ethnicity == "white" ~ "White", race_ethnicity == "multi_racial" ~ "Multiracial" ))
Can you review my code and apply any corrective changes. There is something I am not incorporating correctly. The data export is not displaying as expected.
Gracielle Higino Coach
December 17, 2023
Hi Danisha! It's not clear to me what you are trying to assign to object
math_scores_2021_2022
. I suspect the main error is there. Also, I noticed you're doing afilter
operation that looks more like aselect
combined withmutate
. When in doubt, run your code line by line from scratch, starting a new R session, until an error appears. Try to copy-paste the code displayed in the "Your turn" section and fill in the blanks to see if this process works for you.Eric Juskewitz
January 24, 2024
A question for code readability:
ATM we use set_names() to get a column name which we later on change with mutate(race_ethnicity = case_when(...)) Would it be cleaner to immediately change them to the final out come or is the code more readable/understandable that way?
David Keyes Founder
January 25, 2024
To be honest, this is really just a personal preference thing. Whatever works best for you is great, as long as you understand what is going on (and, by your comment, I'm pretty sure you do!).