Skip to content
R for the Rest of Us: A Statistics-Free Introduction comes out June 25th. Or you can read the online version today. Check it out →
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Complete the group_by sections of the data-wrangling-and-analysis-exercises.Rmd file.

Learn More

General Data Wrangling and Analysis Resources

Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:

Chapter 5 of R for Data Science

RStudio Cloud primer on working with data

Tidyverse for Beginners by Danielle Navarro

Learning Statistics with R by Danielle Navarro

Introduction to the Tidyverse by Alison Hill

A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas

Working in the Tidyverse by Desi Quintans and Jeff Powell

Christine Monnier video tutorials on dplyr

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Daniel Sossa

Daniel Sossa

March 14, 2021

Hello, I have a question. how could I get the subtotals by group on the same DF that we obtain when we use de group by + summarise?

We get something like this, but i would like to see on the same table the sub totals (by gender) and the grand total (which should by 10.000)

female Looking 6.940299 135 female NotWorking 7.094077 1732 female Working 6.909353 2086 female NA NaN 1067 male Looking 7.147727 176 male NotWorking 7.101619 1115 male Working 6.736634 2527 male NA 6.000000 1162

David Keyes

David Keyes Founder

March 16, 2021

I made a short video to show how you could do this. You can also find the code that I used here. Hope that helps! If you have other questions, let me know.

David Keyes

David Keyes Founder

September 28, 2021

Good question! This is a newer feature that was added after I recorded this lesson. I made an explanation video to help you understand what's going on. If you have any questions, please let me know.

David Keyes

David Keyes Founder

September 28, 2021

Yeah, I'd say mostly just don't even worry about it. I never set anything, as I've been using R long enough that it wasn't an option when I started so I just don't think in that way. I'd say go with whatever works best for you. And yes, no need to worry about the message.

Alison Opoku Donyina

Alison Opoku Donyina

September 29, 2021

In this lesson, you opted to not use the number_of_observations for one of the calculations but not the other - was there any particular reason behind that?

Kathleen Griesbach

Kathleen Griesbach

September 30, 2021

Hello! I was doing the exercises without a problem, but all of the sudden at this stage (when I try to use "group by" am getting a repeated error message, "Error: attempt to use zero-length variable name." And I just closed and reopened but now nothing seems to be working. nhanes %>% group_by(gender) %>% Error: attempt to use zero-length variable name

(I know that I'm a bit behind this week, so don't expect a quick answer!)

In my code, also, the variables and such use capital letters rather than underscores. I believe you mentioned this syntax difference in class but cannot remember whether it is an issue. Thank you!

Sara Cifuentes

Sara Cifuentes

March 31, 2022

Hi, I follow the instructions (below in line 305). I added the function "filter" because you said " (whether or not respondents are working)"; however, in the solution, you don't use "filter". Maybe I didn't understand the assignment?

We can use group_by with multiple groups.

Use group_by for gender and work (whether or not respondents are working) before calculating mean hours of sleep.

nhanes %>% 
  group_by(gender, work) %>% 
  dplyr::filter (work %in% c("Working", "NotWorking")) %>% 
  summarize(mean_sleep = mean(sleep_hrs_night, na.rm = TRUE))

Thank you very much for your help.

Hi David -

Quick question. Let's say I had state level data with numerical values per state. Let's say I wanted to assign each state to relevant regions, like northeast, pacific, etc. How can I first assign the states to the object, then perform a group by, to then summarize the numerical values aggregated to the regions I assigned them to? Thanks