group_by
This lesson is called group_by, part of the R in 3 Months (Fall 2021) course. This lesson is called group_by, part of the R in 3 Months (Fall 2021) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Complete the group_by sections of the data-wrangling-and-analysis-exercises.Rmd file.
Learn More
General Data Wrangling and Analysis Resources
Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:
Chapter 5 of R for Data Science
RStudio Cloud primer on working with data
Tidyverse for Beginners by Danielle Navarro
Learning Statistics with R by Danielle Navarro
Introduction to the Tidyverse by Alison Hill
A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas
Have any questions? Put them below and we will help you out!
Course Content
134 Lessons
You need to be signed-in to comment on this post. Login.
Daniel Sossa • March 14, 2021
Hello, I have a question. how could I get the subtotals by group on the same DF that we obtain when we use de group by + summarise?
We get something like this, but i would like to see on the same table the sub totals (by gender) and the grand total (which should by 10.000)
female Looking 6.940299 135 female NotWorking 7.094077 1732 female Working 6.909353 2086 female NA NaN 1067 male Looking 7.147727 176 male NotWorking 7.101619 1115 male Working 6.736634 2527 male NA 6.000000 1162
David Keyes Founder • March 16, 2021
I made a short video to show how you could do this. You can also find the code that I used here. Hope that helps! If you have other questions, let me know.
David Keyes Founder • September 28, 2021
Good question! This is a newer feature that was added after I recorded this lesson. I made an explanation video to help you understand what's going on. If you have any questions, please let me know.
David Keyes Founder • September 28, 2021
Yeah, I'd say mostly just don't even worry about it. I never set anything, as I've been using R long enough that it wasn't an option when I started so I just don't think in that way. I'd say go with whatever works best for you. And yes, no need to worry about the message.
Alison Opoku Donyina • September 29, 2021
In this lesson, you opted to not use the number_of_observations for one of the calculations but not the other - was there any particular reason behind that?
Kathleen Griesbach • September 30, 2021
Hello! I was doing the exercises without a problem, but all of the sudden at this stage (when I try to use "group by" am getting a repeated error message, "Error: attempt to use zero-length variable name." And I just closed and reopened but now nothing seems to be working. nhanes %>% group_by(gender) %>% Error: attempt to use zero-length variable name
(I know that I'm a bit behind this week, so don't expect a quick answer!)
In my code, also, the variables and such use capital letters rather than underscores. I believe you mentioned this syntax difference in class but cannot remember whether it is an issue. Thank you!
Sara Cifuentes • March 31, 2022
Hi, I follow the instructions (below in line 305). I added the function "filter" because you said " (whether or not respondents are working)"; however, in the solution, you don't use "filter". Maybe I didn't understand the assignment?
We can use
group_by
with multiple groups.Use
group_by
forgender
andwork
(whether or not respondents are working) before calculating mean hours of sleep.Thank you very much for your help.
G Mendez • April 9, 2022
Hi David -
Quick question. Let's say I had state level data with numerical values per state. Let's say I wanted to assign each state to relevant regions, like northeast, pacific, etc. How can I first assign the states to the object, then perform a group by, to then summarize the numerical values aggregated to the regions I assigned them to? Thanks