Get access to all lessons in this course.
-
Advanced Data Wrangling
- Downloading and Importing Data
- Overview of Tidy Data
- Tidy Data Rule #1: Every Column is a Variable
- Tidy Data Rule #3: Every Cell is a Single Value
- Tidy Data Rule #2: Every Row is an Observation
- Changing Variable Types
- Dealing with Missing Data
- Advanced Summarizing
- Binding Data Frames
- Functions
- Data Merging
- Exporting Data
- Bring It All Together (Advanced Data Wrangling)
-
Advanced Data Visualization
- Best Practices in Data Visualization
- Tidy Data
- Pipe Data into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Add Descriptive Labels to Your Plots
- Use Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Create a Custom Theme
- Customize Your Fonts
- Try New Plot Types
- Bring it All Together (Advanced Data Visualization)
-
Quarto
- Advanced Markdown
- Advanced YAML and Code Chunk Options
- Tables
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: PDF Edition
- Making Your Reports Shine: HTML Edition
- Presentations
- Dashboards
- Websites
- Publishing Your Work
- Quarto Extensions
- Parameterized Reporting, Part 1
- Parameterized Reporting, Part 2
- Parameterized Reporting, Part 3
- Wrapping up Going Deeper with R
Going Deeper with R
Advanced Variable Creation
This lesson is locked
This lesson is called Advanced Variable Creation, part of the Going Deeper with R course. This lesson is called Advanced Variable Creation, part of the Going Deeper with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Remove the “x2018_19” portion of the
race_ethnicity
variable usingstr_remove()
Convert all instances of the
race_ethnicity
variable to more meaningful observations (e.g. turn “american_indian_alaska_native” into “American Indian/Alaskan Native”) using any of the following:
recode()
if_else()
case_when()
Learn More
Bob Rudis has a semi-complex article on how he uses case_when()
to work with data related to his work on internet security.
You might also find this video by Sharon Machlis on case_when()
helpful.
And, in case you think I’m the only one who loves case_when()
check out this love letter to the function by Matt Kerlogue.
You need to be signed-in to comment on this post. Login.
Atlang Mompe
April 18, 2021
Hi David, when I run this code: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019")), nothing happens?
Abby Isaacson
April 22, 2021
I'm stuck with an error that it can't find my race_ethnicity variable (I see it in my data frame): Error: Problem with
mutate()
inputrace_ethnicity
. x object 'race_ethnicity' not found ℹ Inputrace_ethnicity
isstr_remove(race_ethnicity, "x2018_19")
. Runrlang::last_error()
to see where the error occurred.CODE: enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% summarize(total = sum(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19"))
Harold Stanislaw
April 22, 2021
Is it just me or does the syntax for recode seem backwards to normal programming convention? I would have expected the new value to be left of the equals sign and the old value to be on the right side of the equals sign. Are there other instances of R where we would need to watch out for this? (Or maybe I'm the one who's backwards!)
Harold Stanislaw
April 22, 2021
Judging by the warning messages generated in the video, am I correct in concluding that R is smart enough to exclude the NA entries from being recoded inappropriately (e.g., when using the TRUE option in case_when)? I ask because SPSS isn't so smart, depending upon how the receding is specified. Also, if one wanted to recode the NA entries, the missing = new value option could be used, correct? I'm just checking my understanding of the help page.
Isaac Macha
April 22, 2021
Hello David. I have run the code successfully but it only shows the changes in the tibble and not in the data view. How do I reflect the change in the data frame? What may be the issue
Eduardo Rodriguez
April 28, 2021
Hi David, thanks again for the great instruction. Out of curiosity, is there a function similar to Excel's substitute function? In Excel I would use the substitute function to substitute " " for every "" and wrap a proper function around it to capitalize each word. Something like =proper(substitute(B2, "", " ")). Case_When is great but is potentially time consuming if there are 50 different instances.
Carolyn Ford
April 23, 2022
This code:
enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity = recode(race_ethnicity, "american_indian_alaska_native" = "American Indian/Alaska Native", "asian" = "Asian", "black_african_american" = "Black/African American", "hispanic_latino" = "Hispanic/Latino", "multiracial" = "Multiracial", "native_hawaiian_pacific_islander" = "Native Hawaiian/Pacific Islander", "white" = "White" ))
... produces this error - I don't understand why the arguments are "unused":
Error in
mutate()
: ! Problem while computingrace_ethnicity = recode(...)
. Caused by error inrecode()
: ! unused arguments (american_indian_alaska_native = "American Indian/Alaska Native", asian = "Asian", black_african_american = "Black/African American", hispanic_latino = "Hispanic/Latino", multiracial = "Multiracial", native_hawaiian_pacific_islander = "Native Hawaiian/Pacific Islander", white = "White")Andrew Paquin
April 24, 2023
Hi David, I noticed that you left the following line (from the last lesson) in your code, and that you added the various methods of mutation after it in the pipeline: summarize(number_proficient = sum(number_proficient, na.rm - TRUE)) I did the same thing and it didn't work. That summarize line resulted in a tiny table that showed only the total number of students in the districts, and it did not contain any instances of "x2018_19" to remove, so I got an error message. I'm not sure why it worked for you to leave it in but not for me.
Daved Fared
April 26, 2023
Hi David, when I followed the solution guide video and used the same exact code lines that you did, I got a table with race_ethnicity being all NA. Any idea what could have happened?
Kiana Robinson
May 16, 2023
This is my code (error message follows): enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("percent")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols= -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students=na_if(number_of_students, "-")) %>% mutate(number_of_students=as.numeric(number_of_students)) %>% mutate(number_of_students=replace_na(number_of_students, 0)) %>% mutate(race_ethnicity=str_remove(race_ethnicity, "x2018_19_")) %>% mutate(race_ethnicity= case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~"Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black", race_ethnicity == "hispanic_latino" ~ "Hispanic", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial" ))
This is the error message: Error in
case_when()
: ! Failed to evaluate the left-hand side of formula 1. Caused by error: ! object 'race_ethnicity' not found Runrlang::last_trace()
to see where the error occurred. >