summarize()

This lesson is called summarize(), part of the Fundamentals of R course. This lesson is called summarize(), part of the Fundamentals of R course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video

# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <-
  read_csv("penguins.csv")

# summarize() -------------------------------------------------------------

# With summarize(), we can go from a complete dataset down to a summary.

# We use any of the summary functions with summarize().
# Here's how we calculate the mean bill length.

penguins |>
  summarize(mean_bill_length = mean(bill_length_mm))

# This doesn't work! Notice what the result is.

# We need to add na.rm = TRUE to tell R to drop NA values.

penguins |>
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))

# Another option is to drop NA values before calling summarize().

penguins |>
  drop_na(bill_length_mm) |>
  summarize(mean_bill_length = mean(bill_length_mm))

# We can have multiple arguments in each usage of summarize().

penguins |>
  summarize(
    mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
    max_bill_depth = max(bill_depth_mm, na.rm = TRUE)
  )

penguins |>
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE)) |>
  summarize(mean_bill_depth = mean(bill_depth_mm, na.rm = TRUE))

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")
			
# Calculate the weight of the heaviest penguin.
# Don't forget to drop NAs!

# YOUR CODE HERE

# Calculate the minimum and maximum weight of penguins in the dataset.

# YOUR CODE HERE

Learn More

To learn more about the summarize() function, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Felipe Coelho • March 17, 2026

I was wondering about one thing:

What is the benefit of using arithmetic functions like sum(), mean(), average(), or count() inside summarize() instead of using these functions on their own?

Is it because summarize() allows you to compute multiple summary statistics at once and organize them more efficiently?

Gracielle Higino Coach • March 19, 2026

Hi Felipe! Yes, that's one of the advantages! With the tidyverse language you have more flexibility to calculate these for groups, for example. Also, the summarise() function creates a separate dataset that you can reuse as a standalone tibble. To use these functions outside of summarise(), you'd need base R notation, though.

# This returns a tibble 1x1
penguins |>
  summarize(max_body_mass = max(body_mass_g, na.rm = TRUE))

# This returns a vector with one element
max(penguins$body_mass_g, na.rm = TRUE)

# This also returns a vector with one element
penguins$body_mass_g |> 
max(na.rm = TRUE)

# This doesn't work because you can't call a variable from inside a function that already has a dataset piped into it using base R
penguins |> 
max(body_mass_g, na.rm = TRUE)

Felipe Coelho • March 17, 2026

Pending approval

Maybe that is happening just to me, but I can see the solution of the exercise in the "your turn" section, while the exercise is in the "see solution" section.