Advanced Summarizing
This lesson is called Advanced Summarizing, part of the R in 3 Months (Fall 2021) course. This lesson is called Advanced Summarizing, part of the R in 3 Months (Fall 2021) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Heads up! I’m using the slice_max()
function in this lesson, which only exists in newer versions of the dplyr
package. To update, just type install.packages("tidyverse")
, which will update dplyr
and all other tidyverse packages.
Your Turn
Create a new variable called pct
that shows each race/ethnicity as a percentage of all students in each district
You’ll need to use group_by()
and mutate()
Don’t forget to ungroup()
at the end!
Learn More
Daniel Carter has a nice walkthrough of using group_by()
and mutate()
.
If you forget to ungroup()
every once in a while, you’re joining an illustrious group.
ugh foiled by a missing ungroup() once again #rstats
— Andrew Heiss (🐘 @[email protected]) (@andrewheiss) November 25, 2019
When in doubt, try ungroup() #rstats
— Ben Casselman (@bencasselman) October 4, 2019
To my #rstats friends: Practice safe stats. Remember to dplyr::ungroup() after you're done with your within-group operations. pic.twitter.com/r4JblvgSjd
— Hlynur Hallgríms (@hlynur) July 19, 2018
Need a cheery reminder to use ungroup()? Here you go!
Don't forget to bring dplyr::ungroup() to the party 🎁🥳 #rstats
— Allison Horst (@allison_horst) November 21, 2019
Thanks to @apreshill for inspiring this one! pic.twitter.com/gsf66KXJ2d
Have any questions? Put them below and we will help you out!
Course Content
134 Lessons
You need to be signed-in to comment on this post. Login.
Atlang Mompe • April 18, 2021
HI David, I still get that race_ethnicity not found?
I am also running your code, but for some reason I may be doing something wrong, everything works until I try to mutate the race/ethnicity using the str_remove, even when I run your code. I also noticed that we may need to add the dates x2019_2019 in quotation marks, currently in your code, it only has one quotation mark: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019)) %>%
I tried to use your code, please see below and I am not sure what is going one, as the race ethnicity object is not found:
enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students))%>% mutate(race_ethnicity = str_remove(race_ethnicity,"x2018_2019")) mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "multiracial" ~ "Multi-Racial", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander", race_ethnicity == "white" ~ "White" )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup()
Thanks, Atty
Allison Brenner • April 21, 2021
I have a question about the "My turn" example. You say that you use summarize in the first part (vs. mutate) because you aren't actually adding a new variable. I'm still having trouble understanding this conceptually, and more broadly, the difference between mutate and summarize for functions that can be called in both. I know you aren't adding a new variable in your example, but you are over-writing/replacing. Could you do the same by using summarize to create a total in a new variable? Would that not add to the data frame?
Abby Isaacson • April 22, 2021
I had several of these problems happen for me too, including console processing code in blue but nothing happening in the data frame to change names (several tries). At one point when I ran David's solution code, all race_ethnicity turned to NA in the entire data frame. I still don't know exactly what happened to make it work, but I do have one small question:
Do the underscores() matter in the removing/renaming? For example with the variable "x2018-19_asian", if I remove x2018-19 (not x2018-19), but then only mutate "asian" (not _asian), what happens to that underscore?
Vuk Sekicki • April 26, 2021
Is there a difference between slice_max() and top_n() ?
David Keyes Founder • October 27, 2021
Huh, very strange. We can definitely discuss more this week!