R for the Rest of Us Community
Public / Community
Public / Community
This is a place to ask questions and get help along the way on your R journey.
In addition to discussions of general questions, you’ll see threads for office hours. These are twice-monthly sessions to help you get unstuck. Ask questions and get live answers from me as well as guest experts. All code for office hours can be found here.
Not yet a member? Create a free account to join now.
Data frame error message
-
Data frame error message
-
Hi everyone,
Here is my code for the “Create a New Data Frame” video:
mental_health_over_30 <- nhanes %>%
filter(age >= 30) %>%
group_by(gender, age_decade) %>%
drop_na(age_decade) %>%
summarise(mean_bad_mental_health_days = round(mean(days_ment_hlth_bad, na.rm = TRUE), 1)) %>%
arrange(desc(mean_bad_mental_health_days))I’m getting an error message that I’m not sure how to de-bug. Thanks for your help!
summarise()
regrouping output by ‘gender’ (override with.groups
argument)-
This discussion was modified 3 months ago by
jordan.trachtenberg.
-
This discussion was modified 3 months ago by
-
Hi Jordan! I haven’t watched the video, but it seems like the message you get is a warning, not an error. If you run this pipe, do you see mental_health_over_30 in your global environment?
summarise()
is really useful, but often drives me crazy because the grouping variables are carried over to other dataframes/tibbles or further steps in a pipe. When you want to override them (e.g., arrange rows by a different variable), then you’ll get warnings/errors like this one. Just in case, you have control over keeping/dropping grouping variables using.group
. -
@diego-catalan-molina , I now see the mental_health_over_30 in my global environment. I’m not sure I fully understand what you mean by summarise() being carried over. Do you mean that summarise() has a default order, and the group_by() is attempting to override it, which causes the warning?
-
Hey Jordan, I made a short video to help you understand what’s going on here. This is something that I’m getting asked about A LOT right now (this warning message is new). Hope this helps!
-
Did you watch David’s video? If so, then you saw how the grouping variables can be “carried over” as hidden information about your df (data frame). This hidden information sorts your rows in a specific way (first by
gender
, then byage_decade
). So when you then try to sort your data in a different way by usingarrange()
, you may be triggering the warning. Not 100% sure though.In the end, it doesn’t really matter because you still get the df that you wanted. But David’s suggesting of using the
.group
argument withinsummarise()
is really useful, especially if you want to merge your new df with other data and you get errors related to your grouping variables. -
Thank you, @dgkeyes . This is very helpful. So when I skim nhanes, it shows me that there are no group variables, but when I skim mental_health_over_30, it shows me that I have 2 group variables (gender, age_decade) based on what I initially set up in my dataframe. I’ll have to play around with the .groups settings to see how they change the grouping.
-
You can play around with that argument. You can also use the
ungroup()
function to remove all grouping. -
Hi David,
Hope you are well. I realize this thread is a few weeks old, but I wanted to clarify I’m interpreting this message correctly after watching your video. If I receive the “regrouping output by” message, it’s essentially reminding me that I’ve grouped my data in a certain way, and that I can use the .groups argument in summarise to override the grouping I’ve set in group_by. Is this accurate? Thanks so much.
-
This reply was modified 2 months ago by
clint.thomson.
-
This reply was modified 2 months ago by
-
@dgkeyes, I also found ungroup() very important to perform before running t-test or ANOVA on existing data frames, otherwise I get a weird error that I don’t have enough observations.
Log in to reply.