Tidy Data Rule #3: Every Cell is a Single Value
This lesson is called Tidy Data Rule #3: Every Cell is a Single Value, part of the R in 3 Months (Fall 2025) course. This lesson is called Tidy Data Rule #3: Every Cell is a Single Value, part of the R in 3 Months (Fall 2025) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Loading transcript...
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(janitor)
library(readxl)
library(fs)
# Create Directory --------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# download.file(url = "https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
# Import Data -------------------------------------------------------------
math_scores_2021_2022 <-
read_excel("data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |>
clean_names()
third_grade_math_proficiency_2021_2022 <-
math_scores_2021_2022 |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level_")) |>
pivot_longer(cols = contains("number_level_"),
names_to = "proficiency_level",
values_to = "number_of_students")
third_grade_math_proficiency_2021_2022 |>
mutate(proficiency_level = str_remove(proficiency_level,
pattern = "number_level_"))
third_grade_math_proficiency_2021_2022 |>
mutate(proficiency_level = recode(proficiency_level,
"number_level_4" = "4",
"number_level_3" = "3",
"number_level_2" = "2",
"number_level_1" = "1"))
third_grade_math_proficiency_2021_2022 |>
mutate(proficiency_level = if_else(proficiency_level == "number_level_4",
true = "4",
false = proficiency_level)) |>
mutate(proficiency_level = if_else(proficiency_level == "number_level_3",
true = "3",
false = proficiency_level)) |>
mutate(proficiency_level = if_else(proficiency_level == "number_level_2",
true = "2",
false = proficiency_level)) |>
mutate(proficiency_level = if_else(proficiency_level == "number_level_1",
true = "1",
false = proficiency_level))
third_grade_math_proficiency_2021_2022 |>
mutate(proficiency_level = case_when(
proficiency_level == "number_level_4" ~ "4",
proficiency_level == "number_level_3" ~ "3",
proficiency_level == "number_level_2" ~ "2",
proficiency_level == "number_level_1" ~ "1"
))
third_grade_math_proficiency_2021_2022 |>
mutate(proficiency_level = parse_number(proficiency_level))
Your Turn
Remove the “x2022_23_” portion of the
race_ethnicityvariable usingstr_remove().Convert all instances of the
race_ethnicityvariable to more meaningful observations (e.g. turn “american_indian_alaska_native” into “American Indian/Alaska Native”) usingrecode(),if_else(), orcase_when().
Use the following starter code to help you if necessary.
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20222023.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20212022.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20202021.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20192020.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20182019.xlsx")
# Import Data -------------------------------------------------------------
enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
sheet = "School 2022-23") |>
clean_names()
# Tidy and Clean Data -----------------------------------------------------
enrollment_by_race_ethnicity_2022_2023 <-
enrollment_2022_2023 |>
select(district_institution_id, x2022_23_american_indian_alaska_native:x2022_23_percent_multi_racial) |>
select(-contains("percent")) |>
pivot_longer(cols = -district_institution_id,
names_to = "race_ethnicity",
values_to = "number_of_students")
Learn More
Bob Rudis has a semi-complex article on how he uses case_when() to work with data related to his work on internet security.
You might also find this video by Sharon Machlis on case_when() helpful.
And, in case you think I’m the only one who loves case_when() check out this love letter to the function by Matt Kerlogue.
Have any questions? Put them below and we will help you out!
Course Content
128 Lessons
You need to be signed-in to comment on this post. Login.
Joseph Lehman • April 23, 2024
We may get to this in the future or already covered it, but is there a way to drop characters either at the beginning or end of a cell? I am thinking something like the "RIGHT" and "LEFT" functions in Excel.
David Keyes Founder • April 24, 2024
Yup, check out this quick video! Code I used in it is below:
Hope this helps!
Raouf Kilada • November 1, 2024
Hey David, I am trying to do exactly wht you showed inthe video: third_grade_math_proficiency_2021_2022 |> group_by(school_id) |> mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE))
I got this error:
Error in
mutate(): In argument:pct = number_of_students/sum(number_of_students, na.rm = TRUE). In group 1:school_id = 2. Caused by error insum(): ! invalid 'type' (character) of argument Runrlang::last_trace()to see where the error occurred.Gracielle Higino Coach • November 4, 2024
Hi Raouf!
I'd need to take a look at your code to be able to debug it properly (feel free to send it to me by email), but it seems like R in interpreting the variable
number_of_studentsas a character. Try adding the following after yourgroup_by:So your code would become this:
Da'Shon Carr • April 17, 2025
I cleaned up my variables before doing the pivot longer and used rename_with str_remove. It seemed to work this way also.
enrollment_2022_2023 |> select(district_institution_id, school_institution_id, x2022_23_american_indian_alaska_native:x2022_23_multi_racial) |> select(-contains("percent")) |> rename_with(~str_remove(.,"x2022_23_")) |> pivot_longer(cols = -c(district_institution_id, school_institution_id), names_to = "2022_23_race_ethnicity", values_to = "number_of_students") ->enrollment_by_race_ethnicity_2022_2023
gene trevino • August 4, 2025
I keep getting NA in the race_ethnicity variable.
enrollment_by_race_ethnicity_2022_2023 %>% mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23")) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "AIAN", race_ethnicity == "asian" ~ "A", race_ethnicity == "black_african_american" ~ "B", race_ethnicity == "hispanic_latino" ~ "HL", race_ethnicity == "multiracial" ~ "MR", race_ethnicity == "native_hawaiian_pacicif_islander" ~ "H", race_ethnicity == "white" ~ "WHITE", race_ethnicity == "multi_racial" ~ "M"))
HELP !
David Keyes Founder • August 5, 2025
Can you please post your full code all the way from data import so I can see what might be going on?
Annie Borland • October 23, 2025
Hi,
Could you please give an example of how to write the .default line when using case_when?
I.e. how do you leave the values that you haven't set a condition for unchanged?
Gracielle Higino Coach • October 22, 2025
Hi Annie! Sure! You can set the
.defaultargument to a single value (e.g.,.default = "unknown"), or to the corresponding value in a column (e.g.,.default = column_name). Try running the code below to see what happens to the dataset in each case:If you want to recode a column instead of creating a new one and keep the original values if no condition is fulfilled, then you can do this:
The only thing is that the columns must contain the same value type in all rows. So you can't mix characters and numbers, and you need to be careful with that when creating your conditions. For example, if you want to keep the original values and they are numeric, your conditions should only contain numeric values to replace the original ones, just like the last example above.
Lorenzo Dragani • December 28, 2025
I believe there is a small typo in the slide about case_when(): .default must be written with = (i.e., .default = "…") rather than with ~, because ~ is only used for condition–value pairs in case_when()
David Keyes Founder • January 6, 2026
Thanks! Will update it.