filter()
This lesson is called filter(), part of the R in 3 Months (Spring 2025) course. This lesson is called filter(), part of the R in 3 Months (Spring 2025) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
# Import Data -------------------------------------------------------------
penguins <- read_csv("penguins.csv")
# filter() ----------------------------------------------------------------
# We use filter() to choose a subset of observations.
# We use == to select all observations that meet the criteria.
penguins |>
filter(species == "Adelie")
# We use != to select all observations that don't meet the criteria.
penguins |>
filter(species != "Adelie")
# We can combine comparisons and logical operators.
penguins |>
filter(species == "Adelie" | species == "Chinstrap")
# We can use %in% to collapse multiple comparisons into one.
penguins |>
filter(species %in% c("Adelie", "Chinstrap"))
# We can chain together multiple filter functions.
# Doing it this way, we don't have create complex logic in one line.
# Complicated version
penguins |>
filter(species %in% c("Adelie", "Chinstrap") & island == "Torgersen")
# Simpler version
penguins |>
filter(species %in% c("Adelie", "Chinstrap")) |>
filter(island == "Torgersen")
# We can use <, >, <=, and >= for numeric data.
penguins |>
filter(body_mass_g > 4000)
# We can drop NAs with !is.na().
penguins |>
filter(!is.na(sex))
# But the double negative is confusing.
# We can also drop NAs with drop_na().
penguins |>
drop_na(sex)
Your Turn
# Load Packages -----------------------------------------------------------
# Load the tidyverse package
library(tidyverse)
# Import Data -------------------------------------------------------------
# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data
penguins <- read_csv("penguins.csv")
# filter() ----------------------------------------------------------------
# Use filter() to only keep female penguins
# YOUR CODE HERE
# Use filter() to only keep penguins NOT on Torgersen island
# YOUR CODE HERE
# Use filter() to only keep penguins on Torgersen island or Biscoe island
# Use the or logical operator (|) to do this
# YOUR CODE HERE
# Rewrite your filter() code above to keep the penguins from Torgersen island or Biscoe island
# This time, though, use the %in% operator
# YOUR CODE HERE
# Use a comparison operator to keep penguins with flipper lengths greater than or equal to 193 millimeters
# YOUR CODE HERE
# Drop any rows that have missing data in the flipper_length_mm variable
# Do this first with !is.na()
# YOUR CODE HERE
# Do this a second time with drop_na()
# YOUR CODE HERE
Learn More
To learn more about the filter()
function, check out Chapter 3 of R for Data Science.
Have any questions? Put them below and we will help you out!
Course Content
127 Lessons
1
Welcome to Getting Started with R
00:57
2
Install R
02:05
3
Install RStudio
02:14
4
Files in R
04:33
5
Projects
07:54
6
Packages
02:38
7
Import Data
05:24
8
Objects and Functions
03:16
9
Examine our Data
12:50
10
Import Our Data Again
07:11
11
Getting Help
07:46
12
Week 1 Live Session (Spring 2025)
1:03:11
1
Welcome to Fundamentals of R
01:36
2
Update Everything
02:45
3
Start a New Project
02:16
4
The Tidyverse
03:34
5
Pipes
04:15
6
select()
07:25
7
mutate()
04:25
8
filter()
10:05
9
summarize()
05:59
10
group_by() and summarize()
05:54
11
arrange()
02:07
12
Create a New Data Frame
03:58
13
Bring it All Together (Data Wrangling)
07:29
14
Week 2 Project Assignment
09:39
15
Week 2 Coworking Session (Spring 2025)
16
Week 2 Live Session (Spring 2025)
1:03:24
1
The Grammar of Graphics
04:39
2
Scatterplots
03:46
3
Histograms
05:47
4
Bar Charts
06:37
5
Setting color and fill Aesthetic Properties
02:39
6
Setting color and fill Scales
05:40
7
Setting x and y Scales
03:09
8
Adding Text to Plots
07:32
9
Plot Labels
03:57
10
Themes
02:19
11
Facets
03:12
12
Save Plots
02:57
13
Bring it All Together (Data Visualization)
06:42
14
Week 3 Project Assignment
03:30
15
Week 3 Coworking Session (Spring 2025)
16
Week 3 Live Session (Spring 2025)
1:02:31
1
Downloading and Importing Data
10:32
2
Overview of Tidy Data
05:50
3
Tidy Data Rule #1: Every Column is a Variable
07:43
4
Tidy Data Rule #3: Every Cell is a Single Value
10:04
5
Tidy Data Rule #2: Every Row is an Observation
04:42
6
Week 6 Coworking Session (Spring 2025)
7
Week 6 Live Session (Spring 2025)
1:02:38
1
Best Practices in Data Visualization
03:44
2
Tidy Data
02:25
3
Pipe Data into ggplot
09:54
4
Reorder Plots to Highlight Findings
03:37
5
Line Charts
04:17
6
Use Color to Highlight Findings
09:16
7
Declutter
08:29
8
Add Descriptive Labels to Your Plots
09:10
9
Use Titles to Highlight Findings
08:14
10
Use Annotations to Explain
07:09
11
Week 9 Coworking Session (Spring 2025)
12
Week 9 Live Session (Spring 2025)
59:09
1
Advanced Markdown
06:43
2
Tables
18:36
3
Advanced YAML and Code Chunk Options
05:53
4
Inline R Code
04:42
5
Making Your Reports Shine: Word Edition
04:30
6
Making Your Reports Shine: PDF Edition
06:11
7
Making Your Reports Shine: HTML Edition
06:06
8
Presentations
10:12
9
Dashboards
05:38
10
Websites
06:43
11
Publishing Your Work
04:38
12
Quarto Extensions
05:50
13
Parameterized Reporting, Part 1
10:57
14
Parameterized Reporting, Part 2
05:11
15
Parameterized Reporting, Part 3
07:47
16
Week 12 Coworking Session (Spring 2025)
17
Week 12 Live Session (Spring 2025)
57:01
You need to be signed-in to comment on this post. Login.
Rachel Udow • March 17, 2024
Hello! Two questions about this lesson:
Linda Thomson • March 24, 2024
Thanks for any clarification on this: How are you viewing the result of your filter in your R script window?
penguins |> filter(sex == "female") view()
Consol: Use
print(n = ...)
to see more rowsLibby Heeren Coach • March 24, 2024
Hi, Linda! You'll need to put a pipe after your filter line in order for it to feed the results of your query to the view function.
Linda Thomson • March 24, 2024
many thanks!!
Derrick Watsala • March 25, 2024
Hi Coach, Thanks for this interesting lesson on the Tidy verse functions. I am Learning a lot! However I need to know how to save the output for reference, say after I run a filter code successfully.
David Keyes Founder • March 25, 2024
You'll learn how to do this in the Create a New Data Frame lesson! If you still have questions after reviewing that lesson, let me know.
Douglas Ndowo • April 2, 2024
Hi, Is it possible to use the !is.na or the drop_na to drop the NA from multiple variables. Let's say I wanted to drop the NAs from both the flipper_length _mm & sex variables. I've tried several codes but still can't figure it out lol
Douglas Ndowo • April 2, 2024
Figured out😀..the drop_na() does this so magically:
penguins |>
drop_na(flipper_length_mm, sex) |>
View () 🎉
Grace Lau • September 26, 2024
Hello,
I have a question about %n%. It's not working for me. This is my code:
penguins |> filter(species %n% c("Adelie", "Chinstrap"))
I get an error message, like so:
Error in
filter()
: ℹ In argument:species %n% c("Adelie", "Chinstrap")
. Caused by error inspecies %n% c("Adelie", "Chinstrap")
: ! could not find function "%n%" Runrlang::last_trace()
to see where the error occurred.Gracielle Higino Coach • September 27, 2024
Hey Grace! I know you got the answer in our live session just now, but just to keep it on record: it's a typo on your %in% operator =D you were missing the "i"
Raouf Kilada • October 8, 2024
why do I get the error message when I use View() Error in is.data.frame(x) : argument "x" is missing, with no default
Gracielle Higino Coach • October 8, 2024
One possibility is that you're running the function without a mandatory argument. To use
View()
, you must designate which dataframe you want to see. So you can write the code like this:Or this:
Replace "dataset" by the name of your data object and it should work!
Raouf Kilada • October 8, 2024
THANK YOU....It worked
Carlos Velez • February 17, 2025
Just a quick correction in the code shown in the video section:
We can use <, >, <=, and = > for numeric data.
Greater to or Equal to should be changed to > =
David Keyes Founder • February 24, 2025
You have a very keen eye, Carlos! I'm adding a note to the video for anyone else who notices this.