group_by() and summarize()
This lesson is called group_by() and summarize(), part of the Fundamentals of R course.
Transcript
View code shown in video
# Load Packages 
library(tidyverse)
# Import Data 
penguins < read_csv("penguins.csv")
# group_by() and summarize() 
# summarize() becomes truly powerful when paired with group_by(),
# which enables us to perform calculations on multiple groups.
# Calculate the mean bill length for penguins on different islands.
penguins >
group_by(island) >
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))
# We can use group_by() with multiple groups.
penguins >
group_by(island, year) >
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))
# Another option is to use the .by argument in summarize().
penguins >
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
.by = c(island, year))
# You can count the number of penguins in each group using the n() summary function.
penguins >
group_by(island) >
summarize(number_of_penguins = n())
# But a simpler way do this is with the count() function.
penguins >
count(island)
# You can also use count() with multiple groups.
penguins >
count(island, year)
Your Turn
# Load Packages 
# Load the tidyverse package
library(tidyverse)
# Import Data 
# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data
penguins < read_csv("penguins.csv")
# group_by() and summarize() 
# Calculate the weight of the heaviest penguin on each island.
# YOUR CODE HERE
# Calculate the weight of the heaviest penguin on each island for each year.
# YOUR CODE HERE
Learn More
To learn more about the group_by()
and summarize()
functions, check out Chapter 3 of R for Data Science.
