Get access to all lessons in this course.
-
RMarkdown
- Why Use RMarkdown?
- RMarkdown Overview
- YAML
- Text
- Code Chunks
- Wrapping Up
-
Data Wrangling and Analysis
- Getting Started
- The Tidyverse
- select
- mutate
- filter
- summarize
- group_by
- count
- arrange
- Create a New Data Frame
- Crosstabs
- Wrapping Up
-
Data Visualization
- An Important Workflow Tip
- The Grammar of Graphics
- Scatterplots
- Histograms
- Bar Charts
- color and fill
- scales
- Text and Labels
- Plot Labels
- Themes
- Facets
- Save Plots
- Wrapping Up
-
Wrapping Up
- You Did It!
Fundamentals of R
group_by
This lesson is locked
This lesson is called group_by, part of the Fundamentals of R course. This lesson is called group_by, part of the Fundamentals of R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Complete the group_by sections of the data-wrangling-and-analysis-exercises.Rmd file.
Learn More
General Data Wrangling and Analysis Resources
Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:
Chapter 5 of R for Data Science
RStudio Cloud primer on working with data
Tidyverse for Beginners by Danielle Navarro
Learning Statistics with R by Danielle Navarro
Introduction to the Tidyverse by Alison Hill
A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas
You need to be signed-in to comment on this post. Login.
Daniel Sossa
March 14, 2021
Hello, I have a question. how could I get the subtotals by group on the same DF that we obtain when we use de group by + summarise?
We get something like this, but i would like to see on the same table the sub totals (by gender) and the grand total (which should by 10.000)
female Looking 6.940299 135 female NotWorking 7.094077 1732 female Working 6.909353 2086 female NA NaN 1067 male Looking 7.147727 176 male NotWorking 7.101619 1115 male Working 6.736634 2527 male NA 6.000000 1162
David Keyes
March 15, 2021
I made a short video to show how you could do this. You can also find the code that I used here. Hope that helps! If you have other questions, let me know.
David Keyes
September 28, 2021
Good question! This is a newer feature that was added after I recorded this lesson. I made an explanation video to help you understand what's going on. If you have any questions, please let me know.
David Keyes
September 28, 2021
Yeah, I'd say mostly just don't even worry about it. I never set anything, as I've been using R long enough that it wasn't an option when I started so I just don't think in that way. I'd say go with whatever works best for you. And yes, no need to worry about the message.
Charlie Hadley
September 29, 2021
Hey David and Blayne,
I wanted to chime in on the .groups argument of summarise(). It's currently labelled "lifecycle:experimental" which means theoretically how it works could change, and there's a small chance the feature would be removed. I'm grumpy about this feature and deliberately don't use it, but that's my personal opinion.
Alison Opoku Donyina
September 29, 2021
In this lesson, you opted to not use the number_of_observations for one of the calculations but not the other - was there any particular reason behind that?
David Keyes
September 29, 2021
No particular reason. I was just wondering when I did the analysis!
Kathleen Griesbach
September 30, 2021
Hello! I was doing the exercises without a problem, but all of the sudden at this stage (when I try to use "group by" am getting a repeated error message, "Error: attempt to use zero-length variable name." And I just closed and reopened but now nothing seems to be working. nhanes %>% group_by(gender) %>% Error: attempt to use zero-length variable name
(I know that I'm a bit behind this week, so don't expect a quick answer!)
In my code, also, the variables and such use capital letters rather than underscores. I believe you mentioned this syntax difference in class but cannot remember whether it is an issue. Thank you!
David Keyes
October 1, 2021
It's hard to say without seeing your full code, but I'm guessing this is because you didn't use the
clean_names()
function when you read in your nhanes object and so the variable name is Gender with a capital G. If you want to post your full code as a GitHub Gist and then post the link in response I can confirm if that's the issue.Kathleen Griesbach
October 4, 2021
Thank you, David!
I think the problem might have been me just messing up the ```{r] by accident...I did rerun the clean_names() function and load all my packages again today. And things seem to be working now. Thanks so much.
Sara Cifuentes
March 30, 2022
Hi, I follow the instructions (below in line 305). I added the function "filter" because you said " (whether or not respondents are working)"; however, in the solution, you don't use "filter". Maybe I didn't understand the assignment?
We can use
group_by
with multiple groups.Use
group_by
forgender
andwork
(whether or not respondents are working) before calculating mean hours of sleep.Thank you very much for your help.
Charlie Hadley
March 31, 2022
Hi Sara, In this video David uses group_by() to summarise the data by their working status, meaning that when summarise() is used we calculate the values for all different working statuses. We would need to use filter() if we were interested in only specific values in the work column. Does that help? Cheers, Charlie
Sara Cifuentes
April 5, 2022
Thank you Charlie
Tatiana Bustos
July 27, 2022
Can you explain this more with an example of the code where we would filter work? I am thinking this would be appropriate to add to the last solutions code -- filter (work == "Looking" | work == "Not Working" | work == "Working") Would this go before group by? Before summarize? Does the order matter?
David Keyes
July 28, 2022
That code should work. You can put it before the
group_by()
and/orsummarize()
. If you filter first, the result of thegroup_by()
and/orsummarize()
will only include rows that are not filtered out.G Mendez
April 8, 2022
Hi David -
Quick question. Let's say I had state level data with numerical values per state. Let's say I wanted to assign each state to relevant regions, like northeast, pacific, etc. How can I first assign the states to the object, then perform a group by, to then summarize the numerical values aggregated to the regions I assigned them to? Thanks
Charlie Hadley
April 11, 2022
Hello!
This is a great question. In order to do so you need to decide which states belong to which regions and then combine together left_join() and group_by(). In this video tutorial I made use of the {tigris} package for an authoritative decision on regions and divisions. The code I wrote can be found in this gist. Please let me know if you have any questions.
Cheers,
Charlie
G Mendez
April 12, 2022
Hi! Thanks, Charlie. I will be looking to applying this concept for a summer project.