group_by() and summarize()
This lesson is called group_by() and summarize(), part of the Fundamentals of R course. This lesson is called group_by() and summarize(), part of the Fundamentals of R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
# Import Data -------------------------------------------------------------
penguins <- read_csv("penguins.csv")
# group_by() and summarize() ----------------------------------------------
# summarize() becomes truly powerful when paired with group_by(),
# which enables us to perform calculations on multiple groups.
# Calculate the mean bill length for penguins on different islands.
penguins |>
group_by(island) |>
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))
# We can use group_by() with multiple groups.
penguins |>
group_by(island, year) |>
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE))
# Another option is to use the .by argument in summarize().
penguins |>
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
.by = c(island, year))
# You can count the number of penguins in each group using the n() summary function.
penguins |>
group_by(island) |>
summarize(number_of_penguins = n())
# But a simpler way do this is with the count() function.
penguins |>
count(island)
# You can also use count() with multiple groups.
penguins |>
count(island, year)
Your Turn
# Load Packages -----------------------------------------------------------
# Load the tidyverse package
library(tidyverse)
# Import Data -------------------------------------------------------------
# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data
penguins <- read_csv("penguins.csv")
# group_by() and summarize() ----------------------------------------------
# Calculate the weight of the heaviest penguin on each island.
# YOUR CODE HERE
# Calculate the weight of the heaviest penguin on each island for each year.
# YOUR CODE HERE
Learn More
To learn more about the group_by()
and summarize()
functions, check out Chapter 3 of R for Data Science.
Have any questions? Put them below and we will help you out!
Course Content
34 Lessons
1
The Grammar of Graphics
04:39
2
Scatterplots
03:46
3
Histograms
05:47
4
Bar Charts
06:37
5
Setting color and fill Aesthetic Properties
02:39
6
Setting color and fill Scales
05:40
7
Setting x and y Scales
03:09
8
Adding Text to Plots
07:32
9
Plot Labels
03:57
10
Themes
02:19
11
Facets
03:12
12
Save Plots
02:57
13
Bring it All Together (Data Visualization)
06:42
You need to be signed-in to comment on this post. Login.