filter()
This lesson is called filter(), part of the Fundamentals of R course. This lesson is called filter(), part of the Fundamentals of R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
# Import Data -------------------------------------------------------------
penguins <- read_csv("penguins.csv")
# filter() ----------------------------------------------------------------
# We use filter() to choose a subset of observations.
# We use == to select all observations that meet the criteria.
penguins |>
filter(species == "Adelie")
# We use != to select all observations that don't meet the criteria.
penguins |>
filter(species != "Adelie")
# We can combine comparisons and logical operators.
penguins |>
filter(species == "Adelie" | species == "Chinstrap")
# We can use %in% to collapse multiple comparisons into one.
penguins |>
filter(species %in% c("Adelie", "Chinstrap"))
# We can chain together multiple filter functions.
# Doing it this way, we don't have create complex logic in one line.
# Complicated version
penguins |>
filter(species %in% c("Adelie", "Chinstrap") & island == "Torgersen")
# Simpler version
penguins |>
filter(species %in% c("Adelie", "Chinstrap")) |>
filter(island == "Torgersen")
# We can use <, >, <=, and => for numeric data.
penguins |>
filter(body_mass_g > 4000)
# We can drop NAs with !is.na().
penguins |>
filter(!is.na(sex))
# But the double negative is confusing.
# We can also drop NAs with drop_na().
penguins |>
drop_na(sex)
Your Turn
# Load Packages -----------------------------------------------------------
# Load the tidyverse package
library(tidyverse)
# Import Data -------------------------------------------------------------
# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data
penguins <- read_csv("penguins.csv")
# filter() ----------------------------------------------------------------
# Use filter() to only keep female penguins
# YOUR CODE HERE
# Use filter() to only keep penguins NOT on Torgersen island
# YOUR CODE HERE
# Use filter() to only keep penguins on Torgersen island or Biscoe island
# Use the or logical operator (|) to do this
# YOUR CODE HERE
# Rewrite your filter() code above to keep the penguins from Torgersen island or Biscoe island
# This time, though, use the %in% operator
# YOUR CODE HERE
# Use a comparison operator to keep penguins with flipper lengths greater than or equal to 193 millimeters
# YOUR CODE HERE
# Drop any rows that have missing data in the flipper_length_mm variable
# Do this first with !is.na()
# YOUR CODE HERE
# Do this a second time with drop_na()
# YOUR CODE HERE
Learn More
To learn more about the filter()
function, check out Chapter 3 of R for Data Science.
Have any questions? Put them below and we will help you out!
Course Content
34 Lessons
1
The Grammar of Graphics
04:39
2
Scatterplots
03:46
3
Histograms
05:47
4
Bar Charts
06:37
5
Setting color and fill Aesthetic Properties
02:39
6
Setting color and fill Scales
05:40
7
Setting x and y Scales
03:09
8
Adding Text to Plots
07:32
9
Plot Labels
03:57
10
Themes
02:19
11
Facets
03:12
12
Save Plots
02:57
13
Bring it All Together (Data Visualization)
06:42
You need to be signed-in to comment on this post. Login.
Rachel Udow • March 17, 2024
Hello! Two questions about this lesson:
Linda Thomson • March 24, 2024
Thanks for any clarification on this: How are you viewing the result of your filter in your R script window?
penguins |> filter(sex == "female") view()
Consol: Use
print(n = ...)
to see more rowsLibby Heeren Coach • March 24, 2024
Hi, Linda! You'll need to put a pipe after your filter line in order for it to feed the results of your query to the view function.
Linda Thomson • March 24, 2024
many thanks!!
Derrick Watsala • March 25, 2024
Hi Coach, Thanks for this interesting lesson on the Tidy verse functions. I am Learning a lot! However I need to know how to save the output for reference, say after I run a filter code successfully.
David Keyes Founder • March 25, 2024
You'll learn how to do this in the Create a New Data Frame lesson! If you still have questions after reviewing that lesson, let me know.
Douglas Ndowo • April 2, 2024
Hi, Is it possible to use the !is.na or the drop_na to drop the NA from multiple variables. Let's say I wanted to drop the NAs from both the flipper_length _mm & sex variables. I've tried several codes but still can't figure it out lol
Douglas Ndowo • April 2, 2024
Figured out😀..the drop_na() does this so magically:
penguins |>
drop_na(flipper_length_mm, sex) |>
View () 🎉
Grace Lau • September 26, 2024
Hello,
I have a question about %n%. It's not working for me. This is my code:
penguins |> filter(species %n% c("Adelie", "Chinstrap"))
I get an error message, like so:
Error in
filter()
: ℹ In argument:species %n% c("Adelie", "Chinstrap")
. Caused by error inspecies %n% c("Adelie", "Chinstrap")
: ! could not find function "%n%" Runrlang::last_trace()
to see where the error occurred.Gracielle Higino Coach • September 26, 2024
Hey Grace! I know you got the answer in our live session just now, but just to keep it on record: it's a typo on your %in% operator =D you were missing the "i"
Raouf Kilada • October 8, 2024
why do I get the error message when I use View() Error in is.data.frame(x) : argument "x" is missing, with no default
Gracielle Higino Coach • October 8, 2024
One possibility is that you're running the function without a mandatory argument. To use
View()
, you must designate which dataframe you want to see. So you can write the code like this:Or this:
Replace "dataset" by the name of your data object and it should work!
Raouf Kilada • October 8, 2024
THANK YOU....It worked