Skip to content
R in 3 Months Starts March 13. Learn More →
R for the Rest of Us Logo

Fundamentals of R

group_by() and summarize()

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")

# group_by() and summarize() ----------------------------------------------

# summarize() becomes truly powerful when paired with group_by(), 
# which enables us to perform calculations on multiple groups. 

# Calculate the mean bill length for penguins on different islands.

penguins |> 
  group_by(island) |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))

# We can use group_by() with multiple groups.

penguins |> 
  group_by(island, year) |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE)) 

# Another option is to use the .by argument in summarize().

penguins |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
            .by = c(island, year))

# You can count the number of penguins in each group using the n() summary function.

penguins |> 
  group_by(island) |> 
  summarize(number_of_penguins = n())

# But a simpler way do this is with the count() function.

penguins |> 
  count(island)

# You can also use count() with multiple groups.

penguins |> 
  count(island, year)

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data

penguins <- read_csv("penguins.csv")
			
# group_by() and summarize() ----------------------------------------------

# Calculate the weight of the heaviest penguin on each island.

# YOUR CODE HERE

# Calculate the weight of the heaviest penguin on each island for each year.

# YOUR CODE HERE

Learn More

To learn more about the group_by() and summarize() functions, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

gene trevino

gene trevino • January 30, 2025

When I run the following code:

penguins %>% group_by(island, year) %>%
summarize(Heaviest_Penguins = max(body_mass_g, na.rm = TRUE))

I get the following output:

island year Heaviest_Penguins

Why do I get NA for Biscoe and Torgersen ?

Thanks

David Keyes

David Keyes Founder • January 31, 2025

Hmm, that's strange. I see something different:

# A tibble: 9 × 3
# Groups:   island [3]
  island     year Heaviest_Penguins
  <fct>     <int>             <int>
1 Biscoe     2007              6300
2 Biscoe     2008              6000
3 Biscoe     2009              6000
4 Dream      2007              4650
5 Dream      2008              4800
6 Dream      2009              4475
7 Torgersen  2007              4675
8 Torgersen  2008              4700
9 Torgersen  2009              4300

Can you share the code you used to import the CSV file?

Pepper Phillips

Pepper Phillips • February 11, 2025

Can you use drop_na instead of na.rm = TRUE?

David Keyes

David Keyes Founder • February 11, 2025

Yes, absolutely! I do that quite often, in fact.