Skip to content
R for the Rest of Us Logo

group_by() and summarize()

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")

# group_by() and summarize() ----------------------------------------------

# summarize() becomes truly powerful when paired with group_by(), 
# which enables us to perform calculations on multiple groups. 

# Calculate the mean bill length for penguins on different islands.

penguins |> 
  group_by(island) |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))

# We can use group_by() with multiple groups.

penguins |> 
  group_by(island, year) |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE)) 

# Another option is to use the .by argument in summarize().

penguins |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
            .by = c(island, year))

# You can count the number of penguins in each group using the n() summary function.

penguins |> 
  group_by(island) |> 
  summarize(number_of_penguins = n())

# But a simpler way do this is with the count() function.

penguins |> 
  count(island)

# You can also use count() with multiple groups.

penguins |> 
  count(island, year)

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data

penguins <- read_csv("penguins.csv")
			
# group_by() and summarize() ----------------------------------------------

# Calculate the weight of the heaviest penguin on each island.

# YOUR CODE HERE

# Calculate the weight of the heaviest penguin on each island for each year.

# YOUR CODE HERE

Learn More

To learn more about the group_by() and summarize() functions, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.