Skip to content
R for the Rest of Us: A Statistics-Free Introduction is out now! Check it out →
R for the Rest of Us Logo

How to merge data and calculate multilevel summaries

David Keyes David Keyes
June 17th, 2022

In one of the lessons in Fundamentals of R, I teach people how to use the group_by() and summarize() functions to calculate simple summaries (e.g. if I have population data on all states, what is the total population of the United States?).

But what if you want to group by a group that's not in the original data frame? We got a question on this recently:

Let's say I had state level data with numerical values per state. Let's say I wanted to assign each state to relevant regions, like northeast, pacific, etc. How can I first assign the states to the object, then perform a group by, to then summarize the numerical values aggregated to the regions I assigned them to?

I asked Charlie Hadley to put together a response to this question. It was so thorough that I thought I would share it here as well.

As Charlie shows, you can join a data frame with state population along with another data frame that has regions. Using this merged data frame, you can then calculate the total population by region.

Just one more example of how the tidyverse makes something that seems complicated into something fairly straightforward!

Let us know what you think by adding a comment below.

You need to be signed-in to comment on this post. Login.