Advanced Variable Creation
This lesson is called Advanced Variable Creation, part of the R in 3 Months (Fall 2022) course. This lesson is called Advanced Variable Creation, part of the R in 3 Months (Fall 2022) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Remove the “x2018_19” portion of the
race_ethnicity
variable usingstr_remove()
Convert all instances of the
race_ethnicity
variable to more meaningful observations (e.g. turn “american_indian_alaska_native” into “American Indian/Alaskan Native”) using any of the following:
recode()
if_else()
case_when()
Learn More
Bob Rudis has a semi-complex article on how he uses case_when()
to work with data related to his work on internet security.
You might also find this video by Sharon Machlis on case_when()
helpful.
And, in case you think I’m the only one who loves case_when()
check out this love letter to the function by Matt Kerlogue.
Have any questions? Put them below and we will help you out!
Course Content
142 Lessons
You need to be signed-in to comment on this post. Login.
Atlang Mompe • April 18, 2021
Hi David, when I run this code: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019")), nothing happens?
Abby Isaacson • April 22, 2021
I'm stuck with an error that it can't find my race_ethnicity variable (I see it in my data frame): Error: Problem with
mutate()
inputrace_ethnicity
. x object 'race_ethnicity' not found ℹ Inputrace_ethnicity
isstr_remove(race_ethnicity, "x2018_19")
. Runrlang::last_error()
to see where the error occurred.CODE: enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% summarize(total = sum(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19"))
Harold Stanislaw • April 22, 2021
Is it just me or does the syntax for recode seem backwards to normal programming convention? I would have expected the new value to be left of the equals sign and the old value to be on the right side of the equals sign. Are there other instances of R where we would need to watch out for this? (Or maybe I'm the one who's backwards!)
Harold Stanislaw • April 22, 2021
Judging by the warning messages generated in the video, am I correct in concluding that R is smart enough to exclude the NA entries from being recoded inappropriately (e.g., when using the TRUE option in case_when)? I ask because SPSS isn't so smart, depending upon how the receding is specified. Also, if one wanted to recode the NA entries, the missing = new value option could be used, correct? I'm just checking my understanding of the help page.
Isaac Macha • April 22, 2021
Hello David. I have run the code successfully but it only shows the changes in the tibble and not in the data view. How do I reflect the change in the data frame? What may be the issue
Eduardo Rodriguez • April 28, 2021
Hi David, thanks again for the great instruction. Out of curiosity, is there a function similar to Excel's substitute function? In Excel I would use the substitute function to substitute " " for every "" and wrap a proper function around it to capitalize each word. Something like =proper(substitute(B2, "", " ")). Case_When is great but is potentially time consuming if there are 50 different instances.
Carolyn Ford • April 23, 2022
This code:
enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity = recode(race_ethnicity, "american_indian_alaska_native" = "American Indian/Alaska Native", "asian" = "Asian", "black_african_american" = "Black/African American", "hispanic_latino" = "Hispanic/Latino", "multiracial" = "Multiracial", "native_hawaiian_pacific_islander" = "Native Hawaiian/Pacific Islander", "white" = "White" ))
... produces this error - I don't understand why the arguments are "unused":
Error in
mutate()
: ! Problem while computingrace_ethnicity = recode(...)
. Caused by error inrecode()
: ! unused arguments (american_indian_alaska_native = "American Indian/Alaska Native", asian = "Asian", black_african_american = "Black/African American", hispanic_latino = "Hispanic/Latino", multiracial = "Multiracial", native_hawaiian_pacific_islander = "Native Hawaiian/Pacific Islander", white = "White")Andrew Paquin • April 24, 2023
Hi David, I noticed that you left the following line (from the last lesson) in your code, and that you added the various methods of mutation after it in the pipeline: summarize(number_proficient = sum(number_proficient, na.rm - TRUE)) I did the same thing and it didn't work. That summarize line resulted in a tiny table that showed only the total number of students in the districts, and it did not contain any instances of "x2018_19" to remove, so I got an error message. I'm not sure why it worked for you to leave it in but not for me.
Daved Fared • April 26, 2023
Hi David, when I followed the solution guide video and used the same exact code lines that you did, I got a table with race_ethnicity being all NA. Any idea what could have happened?
Kiana Robinson • May 16, 2023
This is my code (error message follows): enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("percent")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols= -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students=na_if(number_of_students, "-")) %>% mutate(number_of_students=as.numeric(number_of_students)) %>% mutate(number_of_students=replace_na(number_of_students, 0)) %>% mutate(race_ethnicity=str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity= case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~"Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black", race_ethnicity == "hispanic_latino" ~ "Hispanic", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial" ))
This is the error message: Error in
case_when()
: ! Failed to evaluate the left-hand side of formula 1. Caused by error: ! object 'race_ethnicity' not found Runrlang::last_trace()
to see where the error occurred. >