Skip to content
R for the Rest of Us Logo

Advanced Variable Creation

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  1. Remove the “x2018_19” portion of the race_ethnicity variable using str_remove()

  2. Convert all instances of the race_ethnicity variable to more meaningful observations (e.g. turn “american_indian_alaska_native” into “American Indian/Alaskan Native”) using any of the following:

  • recode()

  • if_else()

  • case_when()

Learn More

Bob Rudis has a semi-complex article on how he uses case_when() to work with data related to his work on internet security.

You might also find this video by Sharon Machlis on case_when() helpful.

And, in case you think I’m the only one who loves case_when() check out this love letter to the function by Matt Kerlogue.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Atlang Mompe

Atlang Mompe

April 18, 2021

Hi David, when I run this code: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019")), nothing happens?

Abby Isaacson

Abby Isaacson

April 22, 2021

I'm stuck with an error that it can't find my race_ethnicity variable (I see it in my data frame): Error: Problem with mutate() input race_ethnicity. x object 'race_ethnicity' not found ℹ Input race_ethnicity is str_remove(race_ethnicity, "x2018_19"). Run rlang::last_error() to see where the error occurred.

CODE: enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% summarize(total = sum(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19"))

Harold Stanislaw

Harold Stanislaw

April 22, 2021

Is it just me or does the syntax for recode seem backwards to normal programming convention? I would have expected the new value to be left of the equals sign and the old value to be on the right side of the equals sign. Are there other instances of R where we would need to watch out for this? (Or maybe I'm the one who's backwards!)

Harold Stanislaw

Harold Stanislaw

April 22, 2021

Judging by the warning messages generated in the video, am I correct in concluding that R is smart enough to exclude the NA entries from being recoded inappropriately (e.g., when using the TRUE option in case_when)? I ask because SPSS isn't so smart, depending upon how the receding is specified. Also, if one wanted to recode the NA entries, the missing = new value option could be used, correct? I'm just checking my understanding of the help page.

Isaac Macha

Isaac Macha

April 22, 2021

Hello David. I have run the code successfully but it only shows the changes in the tibble and not in the data view. How do I reflect the change in the data frame? What may be the issue

Eduardo Rodriguez

Eduardo Rodriguez

April 28, 2021

Hi David, thanks again for the great instruction. Out of curiosity, is there a function similar to Excel's substitute function? In Excel I would use the substitute function to substitute " " for every "" and wrap a proper function around it to capitalize each word. Something like =proper(substitute(B2, "", " ")). Case_When is great but is potentially time consuming if there are 50 different instances.

Carolyn Ford

Carolyn Ford

April 23, 2022

This code:

enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity = recode(race_ethnicity, "american_indian_alaska_native" = "American Indian/Alaska Native", "asian" = "Asian", "black_african_american" = "Black/African American", "hispanic_latino" = "Hispanic/Latino", "multiracial" = "Multiracial", "native_hawaiian_pacific_islander" = "Native Hawaiian/Pacific Islander", "white" = "White" ))

... produces this error - I don't understand why the arguments are "unused":

Error in mutate(): ! Problem while computing race_ethnicity = recode(...). Caused by error in recode(): ! unused arguments (american_indian_alaska_native = "American Indian/Alaska Native", asian = "Asian", black_african_american = "Black/African American", hispanic_latino = "Hispanic/Latino", multiracial = "Multiracial", native_hawaiian_pacific_islander = "Native Hawaiian/Pacific Islander", white = "White")

Andrew Paquin

Andrew Paquin

April 24, 2023

Hi David, I noticed that you left the following line (from the last lesson) in your code, and that you added the various methods of mutation after it in the pipeline: summarize(number_proficient = sum(number_proficient, na.rm - TRUE)) I did the same thing and it didn't work. That summarize line resulted in a tiny table that showed only the total number of students in the districts, and it did not contain any instances of "x2018_19" to remove, so I got an error message. I'm not sure why it worked for you to leave it in but not for me.

Daved Fared

Daved Fared

April 26, 2023

Hi David, when I followed the solution guide video and used the same exact code lines that you did, I got a table with race_ethnicity being all NA. Any idea what could have happened?

Kiana Robinson

Kiana Robinson

May 16, 2023

This is my code (error message follows): enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("percent")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols= -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students=na_if(number_of_students, "-")) %>% mutate(number_of_students=as.numeric(number_of_students)) %>% mutate(number_of_students=replace_na(number_of_students, 0)) %>% mutate(race_ethnicity=str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity= case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~"Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black", race_ethnicity == "hispanic_latino" ~ "Hispanic", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial" ))

This is the error message: Error in case_when(): ! Failed to evaluate the left-hand side of formula 1. Caused by error: ! object 'race_ethnicity' not found Run rlang::last_trace() to see where the error occurred. >