I am trying to group my data by two variables and then get the sum within that grouping. However, when I run my code, I get the sum across all rows, not within groups.
Here is an example with the iris dataset. I would like an output table with sum_petal_length to be the sum within the petal_width and species group, but I’m getting sum_petal_length = 563.7 for all groups.
I can try to record a video if this is still unclear. For instance, for grouping of petal_width = 0.1 and species = setosa, I would like sum_petal_length = 1.5+1.4+1.1 = 4, but it’s showing up as 563.7. Similarly for petal_width = 0.3 and species = setosa, sum_petal_length = 1.4 + 1.7 + 1.5 + 1.3 = 5.8, but it’s also showing up as 563.7.
I made a video to explain how my output is different from yours. I think something’s wrong with my packages that I’m loading or something to give me different behavior from your output. Let me know if you have trouble accessing the video.
Oh, that totally makes sense! Yes, try using dplyr::group_by() to make sure you are using dplyr and not plyr. This is, unfortunately, a semi-common thing (see, for example, responses to this tweet). Moving forward, I really don’t think you need to use the plyr package at all if you’re using dplyr so I would just remove it. Let me know if that works!