Skip to content
New course: Interactive Dashboards with Shiny. Get 50% off with coupon SHINYLAUNCH.
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Complete the filter sections of the data-wrangling-and-analysis-exercises.Rmd file.

Learn More

General Data Wrangling and Analysis Resources

Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:

Chapter 5 of R for Data Science

RStudio Cloud primer on working with data

Tidyverse for Beginners by Danielle Navarro

Learning Statistics with R by Danielle Navarro

Introduction to the Tidyverse by Alison Hill

A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas

Working in the Tidyverse by Desi Quintans and Jeff Powell

Christine Monnier video tutorials on dplyr

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Catherine Roller White

Catherine Roller White

March 27, 2021

There is a line in the data-wrangling-and-analysis-exercises.Rmd file that says: "We can use Use , for numeric data." I think the placement of the equals sign may need to change for "greater or equal than" (i.e., '=>' should be '>=').

David Keyes

David Keyes

March 29, 2021

Thanks! Totally a typo. Fixed now. :)

Vuk Sekicki

Vuk Sekicki

March 27, 2021

nhanes %>%
    filter(drop_na(days_phys_hlth_bad)) %>%
    select(phys_active_days, days_phys_hlth_bad)

returns: Error: Problem with filter() input ..1. x no applicable method for 'drop_na' applied to an object of class "c('double', 'numeric')" ℹ Input ..1 is drop_na(days_phys_hlth_bad). Run rlang::last_error() to see where the error occurred.

David Keyes

David Keyes

March 29, 2021

You've got a slight issue with your code. You don't need the filter() around the drop_na(). You can just do this:

nhanes %>%
drop_na(days_phys_hlth_bad) %>%
select(phys_active_days, days_phys_hlth_bad)

Vuk Sekicki

Vuk Sekicki

March 30, 2021

Silly mistake. Thank you!

Daniel Dunleavy

Daniel Dunleavy

March 28, 2022

I made this mistake too and then realized what was wrong.

Will drop_na only filter out cells that are listed as NA? Or will is also drop empty cells in a given column?

And further, will R automatically consider NA to be empty/blank? Or do we need to specify this beforehand in some way?

Charlie Hadley

Charlie Hadley

March 28, 2022

Hi Daniel!

NA in the R language explicitly means missing. If you have an Excel worksheet with empty cells then these will become NA when imported into R. The drop_na() function will drop all rows which contain an NA, but can target specific columns by naming them , eg drop_na(shipment_date).

Let me know if this doesn't answer your question! Cheers, Charlie

Ellen Wilson

Ellen Wilson

October 6, 2022

This was my question, too. What if Excel has a hyphen for a missing value? Does that also translate to NA in R?

Hana Hanfi

Hana Hanfi

April 7, 2021

I'm getting an error that "drop_na" wasn't found even though I previously loaded "dplyr" into the library. Should I have loaded another function?

David Keyes

David Keyes

April 8, 2021

Did you load dplyr in the same session? If you restarted RStudio at any point you'll need to run library(dplyr) again.

Hana Hanfi

Hana Hanfi

April 13, 2021

Yes that was the issue, thanks!

Zach Tilton

Zach Tilton

August 19, 2021

Received an error code on the last exercise:

Error in drop_na(., days_phys_hlth_bad) : could not find function "drop_na"

dplyr is loaded. Not sure what is happening here. Moving forward with !is.na for the time being. Thanks.

David Keyes

David Keyes

August 19, 2021

That's super weird and I'm not quite sure how to explain it. But glad you figured out a workaround!

jason thompson

jason thompson

May 6, 2022

Im experiencing the exact same issue...so you aren't alone here

Charlie Hadley

Charlie Hadley

May 9, 2022

Hello Jason and Zach,

drop_na() is a function from {tidyr} instead of {dplyr}. If you load the tidyverse package using

library(tidyverse)

This will load both packages (and several others).

Tatiana Bustos

Tatiana Bustos

July 28, 2022

I ran into this same issue and had to load library for tidy verse again. However, I never actually exited out of my exercise sheet. Is it common for R to require installations within the same session even though you haven't stopped? I loaded it in earlier sessions.

David Keyes

David Keyes

July 29, 2022

You shouldn't have to install a package each session, but you do need to load all packages each session using library(tidyverse) (or whatever package you're using).

Ekerette Udoh

Ekerette Udoh

September 30, 2021

nhanes %>% filter(Education == "College Grad") %>% select(Education, MaritalStatus)

When running the code above Education and MaritalStatus displays just fine. But when I add the second filter function as below I get an error message.

nhanes %>% filter(Education == "College Grad") %>% filter(MaritalStatus %n% c("Divorced", "Separated", "Widowed")) %>% select(Education, MaritalStatus)

I think you've got a typo: your code has %n% instead of %in%. Let me know if that fixes it!

Ekerette Udoh

Ekerette Udoh

September 30, 2021

Thanks David, I edited it as you suggested to correct second filter line using %in%.

Nathan Welsch

Nathan Welsch

January 17, 2022

Is there a "rule of thumb" or useful way of remembering what needs quotes?

David Keyes

David Keyes

January 18, 2022

My basic rule of thumb is this:

If it's a variable name and we're using it with tidyverse functions like filter(), then no need to use quotes for variable names. For example, select(variable1, variable2). It gets way more complicated. If you really want to nerd out, check out this article on what's called non-standard evaluation.

If you're looking for text, then you do need quotes. So, in filter(state == "Oregon"), Oregon needs to be in quotes because we're looking for the text Oregon.

Hope that helps!

Nathan Welsch

Nathan Welsch

January 19, 2022

That does help! Thank you, David!

Daniel Dunleavy

Daniel Dunleavy

March 27, 2022

That's helpful! I was having the same question while running through the code this week.

Josh Gutwill

Josh Gutwill

October 4, 2022

I'm wondering if you know why one needs to concatenate multiple observations in the filter function. For example, here's a correct use of filter: filter(marital_status %in% c("Divorced", "Separated", "Widowed")) From what I understand of the concatenate function (from Excel), wouldn't that basically mean R is looking at each observation in marital_status and seeing if that observation can be found anywhere inside "DivorcedSeparatedWidowed"? But when I try it without concatenating by doing this: filter(marital_status %in% ("DivorcedSeparatedWidowed")) I get an error message. Maybe the answer is "That's just how R works," but I'm trying to build a solid mental model for these functions, so if there's anything more to it, I'd love to better understand. Thanks!

David Keyes

David Keyes

October 5, 2022

This a bit complicated to explain in text form, but in your second example, R is only looking for the text "DivorcedSeparatedWidowed." If that exact text doesn't appear in a cell, then it doesn't filter rows. With the correct use, R looks for "Divorced" or "Separated" or "Widowed" in the cells. If any cell has any of those, it filters. Hope that helps!

Ellen Wilson

Ellen Wilson

October 6, 2022

I'm puzzled because in your video, it looks like there are code chunks that say "Your code here" (or something like that), but when I go to the file, the code is already complete for all the code chunks. This has actually been the case for the past few sections, and looks like it is for the next one, too.

Ellen Wilson

Ellen Wilson

October 6, 2022

I figured out that there are different RMD files, and I was in the examples one instead of the exercises one. File management is something I struggle with. It would help a lot if you could say something about how to see your different RMD files and how to navigate. I just went to my finder and found the fundamentals-master folder, and saw that there was an examples and an exercises file, but is there a way to do this within Rstudio?

Charlie Hadley

Charlie Hadley

October 6, 2022

Hi Ellen,

You can use the Files tab inside of RStudio to explore the files and folders inside of the RStudio project you're currently working with. You should use a separate copy of RStudio for each individual RStudio project you're working with. So you should have a separate project for working through the exercises and for the completed code that you've downloaded.

Thanks, Charlie

Julieth Silao

Julieth Silao

November 6, 2022

nhanes %>%
  filter(education == "Collage Grad") %>%
  filter(marital_status %in% c("Divorced","Separated", "Widowed")) %>%
  select(education, marital_status)

A tibble:0 × 2

hello David

when i filter education and marital status , i didnt get any table instead A tibble.0x 2 can i help

David Keyes

David Keyes

November 6, 2022

Check your spelling please. It's "College" (not "Collage").

Julieth Silao

Julieth Silao

November 7, 2022

Doh, thank you

Chelsea Ruder

Chelsea Ruder

March 29, 2023

I can't seem to get the drop an or !is.na functions to work. oregon_respondents % select_all() %>% filter(!is.na(ip_address))

When I run this code, there are still blanks for ip_address.

Chelsea Ruder

Chelsea Ruder

March 29, 2023

Sorry the code didn't copy completely. Here it is.

oregon_respondents % select_all() %>% filter(!is.na(ip_address))

David Keyes

David Keyes

March 29, 2023

I'd suggest trying the drop_na() function instead of !is.na(). Let me know if that helps!