How to merge data and calculate multilevel summaries

In one of the lessons in Fundamentals of R, I teach people how to use the group_by() and summarize() functions to calculate simple summaries (e.g. if I have population data on all states, what is the total population of the United States?).

But what if you want to group by a group that’s not in the original data frame? We got a question on this recently:

Let’s say I had state level data with numerical values per state. Let’s say I wanted to assign each state to relevant regions, like northeast, pacific, etc. How can I first assign the states to the object, then perform a group by, to then summarize the numerical values aggregated to the regions I assigned them to?

I asked Charlie Hadley to put together a response to this question. It was so thorough that I thought I would share it here as well.

As Charlie shows, you can join a data frame with state population along with another data frame that has regions. Using this merged data frame, you can then calculate the total population by region.

Just one more example of how the tidyverse makes something that seems complicated into something fairly straightforward!

Want articles like this in your email? Sign up for the R for the Rest of Us newsletter.

Have any questions? Put them below and we will help you out!