Skip to content
R for the Rest of Us: A Statistics-Free Introduction comes out June 25th. Or you can read the online version today. Check it out →
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")

# filter() ----------------------------------------------------------------

# We use filter() to choose a subset of observations.

# We use == to select all observations that meet the criteria.

penguins |> 
  filter(species == "Adelie")

# We use != to select all observations that don't meet the criteria.

penguins |> 
  filter(species != "Adelie")

# We can combine comparisons and logical operators.

penguins |> 
  filter(species == "Adelie" | species == "Chinstrap")

# We can use %in% to collapse multiple comparisons into one.

penguins |> 
  filter(species %in% c("Adelie", "Chinstrap"))

# We can chain together multiple filter functions. 
# Doing it this way, we don't have create complex logic in one line.

# Complicated version

penguins |> 
  filter(species %in% c("Adelie", "Chinstrap") & island == "Torgersen")

# Simpler version

penguins |> 
  filter(species %in% c("Adelie", "Chinstrap")) |> 
  filter(island == "Torgersen")

# We can use <, >, <=, and => for numeric data.

penguins |> 
  filter(body_mass_g > 4000)

# We can drop NAs with !is.na(). 

penguins |> 
  filter(!is.na(sex))

# But the double negative is confusing.
# We can also drop NAs with drop_na().

penguins |> 
  drop_na(sex)

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data

penguins <- read_csv("penguins.csv")
			
# filter() ----------------------------------------------------------------

# Use filter() to only keep female penguins

# YOUR CODE HERE

# Use filter() to only keep penguins NOT on Torgersen island

# YOUR CODE HERE

# Use filter() to only keep penguins on Torgersen island or Biscoe island
# Use the or logical operator (|) to do this

# YOUR CODE HERE

# Rewrite your filter() code above to keep the penguins from Torgersen island or Biscoe island
# This time, though, use the %in% operator

# YOUR CODE HERE

# Use a comparison operator to keep penguins with flipper lengths greater than or equal to 193 millimeters

# YOUR CODE HERE

# Drop any rows that have missing data in the flipper_length_mm variable

# Do this first with !is.na()

# YOUR CODE HERE

# Do this a second time with drop_na()

# YOUR CODE HERE

Learn More

To learn more about the filter() function, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Rachel Udow

Rachel Udow

March 17, 2024

Pending approval

Hello! Two questions about this lesson:

  1. Why is it required to use the summarize() function before using the more specific summary functions (e.g., mean())?
  2. Does the "rm" in "na.rm" stand for anything? Just asking as it might help me remember that argument if so. Thank you!

Linda Thomson

Linda Thomson

March 24, 2024

Thanks for any clarification on this: How are you viewing the result of your filter in your R script window?

penguins |> filter(sex == "female") view()

Consol: Use print(n = ...) to see more rows

view() Error in view() : argument "x" is missing, with no default

Libby Heeren

Libby Heeren Coach

March 24, 2024

Hi, Linda! You'll need to put a pipe after your filter line in order for it to feed the results of your query to the view function.

Linda Thomson

Linda Thomson

March 24, 2024

many thanks!!

Derrick Watsala

Derrick Watsala

March 25, 2024

Hi Coach, Thanks for this interesting lesson on the Tidy verse functions. I am Learning a lot! However I need to know how to save the output for reference, say after I run a filter code successfully.

David Keyes

David Keyes Founder

March 25, 2024

You'll learn how to do this in the Create a New Data Frame lesson! If you still have questions after reviewing that lesson, let me know.

Douglas Ndowo

Douglas Ndowo

April 2, 2024

Hi, Is it possible to use the !is.na or the drop_na to drop the NA from multiple variables. Let's say I wanted to drop the NAs from both the flipper_length _mm & sex variables. I've tried several codes but still can't figure it out lol

Douglas Ndowo

Douglas Ndowo

April 2, 2024

Figured out😀..the drop_na() does this so magically:

penguins |>

drop_na(flipper_length_mm, sex) |>

View () 🎉