Get access to all lessons in this course.
-
Advanced Data Wrangling and Analysis
- Overview
- Importing Data
- Tidy Data
- Reshaping Data
- Dealing with Missing Data
- Changing Variable Types
- Advanced Variable Creation
- Advanced Summarizing
- Binding Data Frames
- Functions
- Merging Data
- Renaming Variables
- Quick Interlude to Reorganize our Code
- Exporting Data
-
Advanced Data Visualization
- Data Visualization Best Practices
- Tidy Data
- Pipe Data Into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Use the scales Package for Nicely Formatted Values
- Use Direct Labeling
- Use Axis Text Wisely
- Use Titles to Highlight Findings
- Use Color in Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Customize Your Theme
- Customize Your Fonts
- Try New Plot Types
-
Advanced RMarkdown
- Advanced Markdown Text Formatting
- Tables
- Advanced YAML
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: HTML Edition
- Making Your Reports Shine: PDF Edition
- Presentations
- Dashboards
- Other Formats
-
Wrapping Up
- You Did It!
Going Deeper with R
Advanced Variable Creation
This lesson is locked
This lesson is called Advanced Variable Creation, part of the Going Deeper with R course. This lesson is called Advanced Variable Creation, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Remove the “x2018_19” portion of the
race_ethnicity
variable usingstr_remove()
Convert all instances of the
race_ethnicity
variable to more meaningful observations (e.g. turn “american_indian_alaska_native” into “American Indian/Alaskan Native”) using any of the following:
recode()
if_else()
case_when()
Learn More
Bob Rudis has a semi-complex article on how he uses case_when()
to work with data related to his work on internet security.
You might also find this video by Sharon Machlis on case_when()
helpful.
And, in case you think I’m the only one who loves case_when()
check out this love letter to the function by Matt Kerlogue.
You need to be signed-in to comment on this post. Login.
Atlang Mompe
April 18, 2021
Hi David, when I run this code: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019")), nothing happens?
Lucilla Piccari
April 18, 2021
Hello, I have the same issue... As well as if I run:
enrollment_by_race_ethnicity_18_19 %>% mutate(race_ethnicity = recode(race_ethnicity, "x2018_2019_american_indian_alaska_native" = "American Indian/Alaska Native"))
enrollment_by_race_ethnicity_18_19 %>% mutate(race_ethnicity = if_else(race_ethnicity == "x2018_2019_american_indian_alaska_native", "American Indian/Alaska Native", race_ethnicity))
Erin Guthrie
April 20, 2021
I think you are doing what I am doing...it's "x2018_19"...I had "x2018_2019" and it could not match to it! Maybe this helps you all???
Erin Guthrie
April 20, 2021
Did you ever figure this out? I am having the same issue.
David Keyes
April 20, 2021
Any chance one of you could record a short video of yourself attempting to do this and getting an error? You can use Loom or something similar and post a link. I want to help folks debug but it's hard without seeing what's going on.
Erin Guthrie
April 20, 2021
I will record a video of this tonight...once I figure out how to record a video :-)...but it is odd because there are no errors, it just doesn't seem to want to change the information!
Erin Guthrie
April 20, 2021
I think my issue was super simple and me not paying attention...I was asking to remove the string "x2018_2019" instead of "x2018_19". I fixed this and now everything works as intended! Duh!
Jody Oconnor
April 20, 2021
Erin you figured it out! Human brain issue ("x2018_2019" processes exactly the same as "x2018_19"!). The example solution code uses "x2018_2019".
My case_when 'NA' issue is now solved as well (since there were no cases that didn't include "x2018-19"). Yay!
Lucilla Piccari
April 21, 2021
(Facepalm) You are right! I must have gone over the code a thousand times... I guess the machine is always right :D
Thank you!!
Jody Oconnor
April 20, 2021
I'm getting the same thing for str_remove. The script runs with no error message. But it just doesn't change the data frame.
Also, when I use case_when all the race_ethnicity values are changed to 'NA's. Here is the code I'm running: race_ethnicity_enrollment_18_19 % select(-contains("percent")) %>% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, "0")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>%
mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019_")) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaskan Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "multiracial" ~ "Multi-Racial", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander", race_ethnicity == "white" ~ "White"))
Jody Oconnor
April 20, 2021
Not sure why the first line of code I pasted before got truncated, but this is what is actually in the first line:
race_ethnicity_enrollment_18_19 %
Jody Oconnor
April 20, 2021
argh, it truncated the first line again, but anyway, that line works properly (it creates the new object from the enrollment_18_19 data frame). Everything works until the str_remove line.
David Keyes
April 20, 2021
Please correct me if I'm wrong, but it seems like the issue is that your original race_ethnicity_enrollment_18_19 data frame hasn't changed, correct? If so, the reason why is you need to assign it back to itself in order to change the data frame. For example:
race_ethnicity_enrollment_18_19 <- race_ethnicity_enrollment_18_19 %>% YOURCODEHERE
Does that help?
Jody Oconnor
April 21, 2021
Hi David, my problem was completely due to trying to remove the string “x2018_2019” instead of “x2018_19”. I think you will want to fix that in your solution for this lesson so it doesn't throw more people off. Thanks for your responsiveness :)
Atlang Mompe
April 21, 2021
HI Erin, shew great that you figured it out, I am only getting to this now, thanks for sharing your solution.
Abby Isaacson
April 21, 2021
I'm stuck with an error that it can't find my race_ethnicity variable (I see it in my data frame): Error: Problem with
mutate()
inputrace_ethnicity
. x object 'race_ethnicity' not found ℹ Inputrace_ethnicity
isstr_remove(race_ethnicity, "x2018_19")
. Runrlang::last_error()
to see where the error occurred.CODE: enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% summarize(total = sum(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19"))
David Keyes
April 21, 2021
I think this was a typo on my part. Can you remove the summarize line and try again?
Abby Isaacson
April 21, 2021
yes solved, thanks! I had pounded it out on my own, but that didn't work either. Removal was key!
Abby Isaacson
April 22, 2021
I also see that you mentioned it in the solution video.
Harold Stanislaw
April 22, 2021
Is it just me or does the syntax for recode seem backwards to normal programming convention? I would have expected the new value to be left of the equals sign and the old value to be on the right side of the equals sign. Are there other instances of R where we would need to watch out for this? (Or maybe I'm the one who's backwards!)
David Keyes
April 22, 2021
You're not the only one who thinks this, Harold. I always forget the order and have to either look it up or try it one way (usually unsuccessfully at first) and then the other.
Harold Stanislaw
April 22, 2021
Judging by the warning messages generated in the video, am I correct in concluding that R is smart enough to exclude the NA entries from being recoded inappropriately (e.g., when using the TRUE option in case_when)? I ask because SPSS isn't so smart, depending upon how the receding is specified. Also, if one wanted to recode the NA entries, the missing = new value option could be used, correct? I'm just checking my understanding of the help page.
David Keyes
April 22, 2021
Yes, that's exactly right. And if you want to do something with the NA value, you can use the
is.na()
function to test for them.Isaac Macha
April 22, 2021
Hello David. I have run the code successfully but it only shows the changes in the tibble and not in the data view. How do I reflect the change in the data frame? What may be the issue
David Keyes
April 22, 2021
Have you assigned the result of your pipeline to an object? Here's some pseudo code to show what I mean:
new_data_frame <- old_data_frame %>% code_to_make_changes()
Isaac Macha
April 22, 2021
Hello David. Thanks for the feedback. It works now.
Eduardo Rodriguez
April 28, 2021
Hi David, thanks again for the great instruction. Out of curiosity, is there a function similar to Excel's substitute function? In Excel I would use the substitute function to substitute " " for every "" and wrap a proper function around it to capitalize each word. Something like =proper(substitute(B2, "", " ")). Case_When is great but is potentially time consuming if there are 50 different instances.
David Keyes
April 28, 2021
Yup, check out
str_replace()
!Eduardo Rodriguez
April 29, 2021
Thanks David! I like that R has the case_when function - it makes it so much easier to deal with long if statements.
For tasks where I'm simply removing underscores and capitalizing words, is it okay to do something like this? Or does it go against programming convention? Or take away customizability?
enrollment_by_race_ethnicity_17_18 % select(-contains(c("grade","kindergarten","percent"))) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2017_18_")) %>% mutate(race_ethnicity = str_to_title( str_replace_all(race_ethnicity,"_", " ") ))
David Keyes
April 29, 2021
If you're talking about the last line in your code, that's totally fine. I generally separate out things into multiple lines so I'd have one line that removes underscores and then another to make things title case, but you can do it however works best for you!
Carolyn Ford
April 23, 2022
This code:
enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity = recode(race_ethnicity, "american_indian_alaska_native" = "American Indian/Alaska Native", "asian" = "Asian", "black_african_american" = "Black/African American", "hispanic_latino" = "Hispanic/Latino", "multiracial" = "Multiracial", "native_hawaiian_pacific_islander" = "Native Hawaiian/Pacific Islander", "white" = "White" ))
... produces this error - I don't understand why the arguments are "unused":
Error in
mutate()
: ! Problem while computingrace_ethnicity = recode(...)
. Caused by error inrecode()
: ! unused arguments (american_indian_alaska_native = "American Indian/Alaska Native", asian = "Asian", black_african_american = "Black/African American", hispanic_latino = "Hispanic/Latino", multiracial = "Multiracial", native_hawaiian_pacific_islander = "Native Hawaiian/Pacific Islander", white = "White")David Keyes
April 25, 2022
Sorry, I can't reproduce your issue. Could you please post your full code from this section as a GitHub gist and post the link? Thanks!
Andrew Paquin
April 23, 2023
Hi David, I noticed that you left the following line (from the last lesson) in your code, and that you added the various methods of mutation after it in the pipeline: summarize(number_proficient = sum(number_proficient, na.rm - TRUE)) I did the same thing and it didn't work. That summarize line resulted in a tiny table that showed only the total number of students in the districts, and it did not contain any instances of "x2018_19" to remove, so I got an error message. I'm not sure why it worked for you to leave it in but not for me.
Andrew Paquin
April 23, 2023
Actually, disregard that. I just watched the solution video, and you dealt with it there. All good.
Daved Fared
April 26, 2023
Hi David, when I followed the solution guide video and used the same exact code lines that you did, I got a table with race_ethnicity being all NA. Any idea what could have happened?
David Keyes
April 26, 2023
Can you post your code as a gist and share the link so I can review it?
Kiana Robinson
May 15, 2023
This is my code (error message follows): enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("percent")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols= -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students=na_if(number_of_students, "-")) %>% mutate(number_of_students=as.numeric(number_of_students)) %>% mutate(number_of_students=replace_na(number_of_students, 0)) %>% mutate(race_ethnicity=str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity= case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~"Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black", race_ethnicity == "hispanic_latino" ~ "Hispanic", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial" ))
This is the error message: Error in
case_when()
: ! Failed to evaluate the left-hand side of formula 1. Caused by error: ! object 'race_ethnicity' not found Runrlang::last_trace()
to see where the error occurred. >David Keyes
May 15, 2023
Can you please post your entire R script file as a GitHub gist and posting the link? I'm having trouble isolating the issue.