Skip to content
R for the Rest of Us Logo

Advanced Summarizing

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! I’m using the slice_max() function in this lesson, which only exists in newer versions of the dplyr package. To update, just type install.packages("tidyverse"), which will update dplyr and all other tidyverse packages.

Your Turn

Create a new variable called pct that shows each race/ethnicity as a percentage of all students in each district

You’ll need to use group_by() and mutate()

Don’t forget to ungroup() at the end!

Learn More

Daniel Carter has a nice walkthrough of using group_by() and mutate().

If you forget to ungroup() every once in a while, you’re joining an illustrious group.


Need a cheery reminder to use ungroup()? Here you go!

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Atlang Mompe

Atlang Mompe

April 18, 2021

HI David, I still get that race_ethnicity not found?

I am also running your code, but for some reason I may be doing something wrong, everything works until I try to mutate the race/ethnicity using the str_remove, even when I run your code. I also noticed that we may need to add the dates x2019_2019 in quotation marks, currently in your code, it only has one quotation mark: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019)) %>%

I tried to use your code, please see below and I am not sure what is going one, as the race ethnicity object is not found:

enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students))%>% mutate(race_ethnicity = str_remove(race_ethnicity,"x2018_2019")) mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "multiracial" ~ "Multi-Racial", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander", race_ethnicity == "white" ~ "White" )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup()

Thanks, Atty

Allison Brenner

Allison Brenner

April 21, 2021

I have a question about the "My turn" example. You say that you use summarize in the first part (vs. mutate) because you aren't actually adding a new variable. I'm still having trouble understanding this conceptually, and more broadly, the difference between mutate and summarize for functions that can be called in both. I know you aren't adding a new variable in your example, but you are over-writing/replacing. Could you do the same by using summarize to create a total in a new variable? Would that not add to the data frame?

Abby Isaacson

Abby Isaacson

April 22, 2021

I had several of these problems happen for me too, including console processing code in blue but nothing happening in the data frame to change names (several tries). At one point when I ran David's solution code, all race_ethnicity turned to NA in the entire data frame. I still don't know exactly what happened to make it work, but I do have one small question:

Do the underscores() matter in the removing/renaming? For example with the variable "x2018-19_asian", if I remove x2018-19 (not x2018-19), but then only mutate "asian" (not _asian), what happens to that underscore?

Vuk Sekicki

Vuk Sekicki

April 26, 2021

Is there a difference between slice_max() and top_n() ?

David Keyes

David Keyes Founder

October 27, 2021

Huh, very strange. We can definitely discuss more this week!