Advanced Summarizing

This lesson is locked

Get access to all lessons in this course.

This lesson is called Advanced Summarizing, part of the Going Deeper with R course. This lesson is called Advanced Summarizing, part of the Going Deeper with R course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! I’m using the slice_max() function in this lesson, which only exists in newer versions of the dplyr package. To update, just type install.packages("tidyverse"), which will update dplyr and all other tidyverse packages.

Your Turn

Create a new variable called pct that shows each race/ethnicity as a percentage of all students in each district

You’ll need to use group_by() and mutate()

Don’t forget to ungroup() at the end!

Learn More

Daniel Carter has a nice walkthrough of using group_by() and mutate().

If you forget to ungroup() every once in a while, you’re joining an illustrious group.

ugh foiled by a missing ungroup() once again #rstats
— Andrew Heiss (🐘 @[email protected]) (@andrewheiss) November 25, 2019

When in doubt, try ungroup() #rstats
— Ben Casselman (@bencasselman) October 4, 2019

To my #rstats friends: Practice safe stats. Remember to dplyr::ungroup() after you're done with your within-group operations. pic.twitter.com/r4JblvgSjd
— Hlynur Hallgríms (@hlynur) July 19, 2018

Need a cheery reminder to use ungroup()? Here you go!

Don't forget to bring dplyr::ungroup() to the party 🎁🥳 #rstats

Thanks to @apreshill for inspiring this one! pic.twitter.com/gsf66KXJ2d
— Allison Horst (@allison_horst) November 21, 2019

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Atlang Mompe

April 18, 2021

HI David, I still get that race_ethnicity not found?

I am also running your code, but for some reason I may be doing something wrong, everything works until I try to mutate the race/ethnicity using the str_remove, even when I run your code. I also noticed that we may need to add the dates x2019_2019 in quotation marks, currently in your code, it only has one quotation mark: mutate(race_ethnicity = str_remove(race_ethnicity, "x2018_2019)) %>%

I tried to use your code, please see below and I am not sure what is going one, as the race ethnicity object is not found:

enrollment_by_race_ethnicity_18_19 % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students))%>% mutate(race_ethnicity = str_remove(race_ethnicity,"x2018_2019")) mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "multiracial" ~ "Multi-Racial", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander", race_ethnicity == "white" ~ "White" )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup()

Thanks, Atty

Allison Brenner

April 21, 2021

I have a question about the "My turn" example. You say that you use summarize in the first part (vs. mutate) because you aren't actually adding a new variable. I'm still having trouble understanding this conceptually, and more broadly, the difference between mutate and summarize for functions that can be called in both. I know you aren't adding a new variable in your example, but you are over-writing/replacing. Could you do the same by using summarize to create a total in a new variable? Would that not add to the data frame?

Abby Isaacson

April 22, 2021

I had several of these problems happen for me too, including console processing code in blue but nothing happening in the data frame to change names (several tries). At one point when I ran David's solution code, all race_ethnicity turned to NA in the entire data frame. I still don't know exactly what happened to make it work, but I do have one small question:

Do the underscores() matter in the removing/renaming? For example with the variable "x2018-19_asian", if I remove x2018-19 (not x2018-19), but then only mutate "asian" (not _asian), what happens to that underscore?

Vuk Sekicki

April 26, 2021

Is there a difference between slice_max() and top_n() ?

David Keyes Founder

October 27, 2021

Huh, very strange. We can definitely discuss more this week!

Going Deeper with R

Advanced Data Wrangling

Advanced Data Visualization

Quarto