Skip to content
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")

# summarize() -------------------------------------------------------------

# With summarize(), we can go from a complete dataset down to a summary.

# We use any of the summary functions with summarize().
# Here's how we calculate the mean bill length.

penguins |> 
  summarize(mean_bill_length = mean(bill_length_mm))

# This doesn't work! Notice what the result is. 

# We need to add na.rm = TRUE to tell R to drop NA values.

penguins |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))

# Another option is to drop NA values before calling summarize().

penguins |> 
  drop_na(bill_length_mm) |> 
  summarize(mean_bill_length = mean(bill_length_mm))

# We can have multiple arguments in each usage of summarize().

penguins |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
            max_bill_depth = max(bill_depth_mm, na.rm = TRUE))

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data

penguins <- read_csv("penguins.csv")
			
# Calculate the weight of the heaviest penguin.
# Don't forget to drop NAs!

# YOUR CODE HERE

# Calculate the minimum and maximum weight of penguins in the dataset.

# YOUR CODE HERE

Learn More

To learn more about the summarize() function, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Brian Slattery

Brian Slattery

September 20, 2023

I'm just curious, so feel free to ignore if this is covered later. But, you mentioned that piping sequential summarizes into each other doesn't work to get a single table with multiple columns. Is there a way to do that? I didn't know if mutate would be able to handle taking in a tibble from summarize? I was guessing there must be some other way to combine tibbles? For example, if you were getting the mean bill length from the penguins data, but also wanted to get a mean bill length from some other bird dataset, and have these in the same table side by side (I googled it and it looked like there's a merge() function, but I didn't know if that was the best way to go about it in this case)

Gracielle Higino

Gracielle Higino

September 20, 2023

Hi Brian! Thank you for your question, that's very interesting! We'll discuss this in our live session this week, stay tuned!

Brian Slattery

Brian Slattery

September 20, 2023

Is there some way to change the default behavior of summarize so that it ignores NAs without having to specify it specifically? I didn't know if there was something like a global variable that you can set in the R script file, or something within the RStudio environment or installed package?

Gracielle Higino

Gracielle Higino

September 20, 2023

Hey! =D The short answer is: you shouldn't do that! There are some complicated workarounds, but by default, you should make it explicit in your code when NAs are being ignored/dropped. We'll discuss this on Thursday too!