Skip to content
R for the Rest of Us Logo

Advanced Summarizing

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! I’m using the slice_max() function in this lesson, which only exists in newer versions of the dplyr package. To update, just type install.packages("tidyverse"), which will update dplyr and all other tidyverse packages.

Your Turn

Create a new variable called pct that shows each race/ethnicity as a percentage of all students in each district

You’ll need to use group_by() and mutate()

Don’t forget to ungroup() at the end!

Learn More

Daniel Carter has a nice walkthrough of using group_by() and mutate().

If you forget to ungroup() every once in a while, you’re joining an illustrious group.


Need a cheery reminder to use ungroup()? Here you go!

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Atlang Mompe

Atlang Mompe

April 18, 2021

HI David, I still get that race_ethnicity not found?

I am also running your code, but for some reason I may be doing something wrong, everything works until I try to mutate the race/ethnicity using the str_remove, even when I run your code. I also noticed that we may need to add the dates x2019_2019 in quotation marks, currently in your code, it only has one quotation mark: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019)) %>%

I tried to use your code, please see below and I am not sure what is going one, as the race ethnicity object is not found:

enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students))%>% mutate(race_ethnicity = str_remove(race_ethnicity,"x2018_2019")) mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "multiracial" ~ "Multi-Racial", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander", race_ethnicity == "white" ~ "White" )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup()

Thanks, Atty

David Keyes

David Keyes

April 18, 2021

Can you please post your code all the way from the beginning? I want to make sure you created the enrollment_by_race_ethnicity_18_19 object correctly from the beginning.

Lucilla Piccari

Lucilla Piccari

April 19, 2021

Same here, it is not giving me an error message, just nothing happens... my entire code: enrollment_by_race_ethnicity_18_19 % select(-contains("percent")) %>% select(-contains("kindergarten")) %>% select(-contains("grade")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019_")) %>% mutate(race_ethnicity = recode(race_ethnicity, "american_indian_alaska_native" = "American Indian/Alaska Native")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup()

Thank you!

David Keyes

David Keyes

April 20, 2021

You don't see anything in the console? Since you should be working in an R script file that's where your results will show up.

Lucilla Piccari

Lucilla Piccari

April 21, 2021

Sorry, that was not very clear: the code appears in the console in blue, no error message, just like when everything works... except there are no changes in the data frame. I will see if I can record a video of this lesson's problem and of the previous one. Thank you!

Erin Guthrie

Erin Guthrie

April 21, 2021

I think this is the issue: x2018_2019_ this should be x2018_19_ (i.e., not 2019). I fixed mine and it worked...then, I looked at the solution and re-ran that with the fix and it worked too :-)

David Keyes

David Keyes

April 21, 2021

Good eye, Erin! I suspect this is likely to be the issue.

Lucilla Piccari

Lucilla Piccari

April 21, 2021

Thank you so much Erin! After fixing the issue with str_remove, now everything else works! Bizarre it didn't give an error message, no? Anyway, thanks again!

Allison Brenner

Allison Brenner

April 21, 2021

I have a question about the "My turn" example. You say that you use summarize in the first part (vs. mutate) because you aren't actually adding a new variable. I'm still having trouble understanding this conceptually, and more broadly, the difference between mutate and summarize for functions that can be called in both. I know you aren't adding a new variable in your example, but you are over-writing/replacing. Could you do the same by using summarize to create a total in a new variable? Would that not add to the data frame?

David Keyes

David Keyes

April 22, 2021

This is a great question! I made a video to show you the difference between the two.

And then after I recorded that I realized I may not have answered your question so here's a second video which should help further! It walks through the example from the course, explaining each step in greater detail.

Allison Brenner

Allison Brenner

April 22, 2021

Yes, thank you, this is very helpful! I'm not sure why I was so confused before, but it is clear now.

Abby Isaacson

Abby Isaacson

April 22, 2021

I had several of these problems happen for me too, including console processing code in blue but nothing happening in the data frame to change names (several tries). At one point when I ran David's solution code, all race_ethnicity turned to NA in the entire data frame. I still don't know exactly what happened to make it work, but I do have one small question:

Do the underscores() matter in the removing/renaming? For example with the variable "x2018-19_asian", if I remove x2018-19 (not x2018-19), but then only mutate "asian" (not _asian), what happens to that underscore?

Abby Isaacson

Abby Isaacson

April 22, 2021

I also see that you did take out the _ in the solution video to the previous lesson.

David Keyes

David Keyes

April 22, 2021

Yep :)

David Keyes

David Keyes

April 22, 2021

Underscores do matter (they're treated as text, just like letters). If you mutate the text "asian" (without the underscore) the underscore will remain.

Vuk Sekicki

Vuk Sekicki

April 26, 2021

Is there a difference between slice_max() and top_n() ?

David Keyes

David Keyes

April 27, 2021

No, they do the same thing, but the tidyverse team writes this on the top_n() help page:

top_n() has been superseded in favour of slice_min()/slice_max(). While it will not be deprecated in the near future, retirement means that we will only perform critical bug fixes, so we recommend moving to the newer alternatives.

David Keyes

David Keyes

October 27, 2021

Huh, very strange. We can definitely discuss more this week!