This lesson is called filter, part of the R in 3 Months (Fall 2022) course. This lesson is called filter, part of the R in 3 Months (Fall 2022) course.
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Solution
# filterWeuse`filter`tochooseasubsetofcases.Use`filter`tokeeponlyrespondentswhoaredivorced. Then, use`select`toshowonlythe`marital_status`variable.```{r}nhanes%>%filter(marital_status=="Divorced") %>%select(marital_status)```Use`filter`tokeeponlyrespondentswhoare**not**divorced. Then, use`select`toshowonlythe`marital_status`variable.```{r}nhanes%>%filter(marital_status!="Divorced") %>%select(marital_status)```Use`filter`tokeeponlyrespondentswhoaredivorcedorseparated. Then, use`select`toshowonlythe`marital_status`variable.```{r}nhanes%>%filter(marital_status=="Divorced"|marital_status=="Separated") %>%select(marital_status)```Use`%in%`withinthe`filter`functiontokeeponlythosewhoaredivorced, separated, orwidowed. Then, use`select`toshowonlythe`marital_status`variable.```{r}nhanes%>%filter(marital_status%in%c("Divorced", "Separated", "Widowed")) %>%select(marital_status)```Wecanchaintogethermultiple`filter`functions. Doingitthisway, wedon't have create complex logic in one line.Create a chain that keeps only those are college grads (line #1). Then, `filter` to keep only those who are divorced, separated, or widowed. Finally, use `select` to show only the `education` and `marital_status` variables.```{r}nhanes %>% filter(education == "College Grad") %>% filter(marital_status %in% c("Divorced", "Separated", "Widowed")) %>% select(education, marital_status)```We can use Use `<`, `>`, `<=`, and `=>` for numeric data. Use `filter` to only show those reported at least 5 days of physical activity in the last 30 days (this is the `phys_active_days` variable). Then, use `select` to keep only the `phys_active_days` and the `days_phys_hlth_bad` variables.```{r}nhanes %>% filter(phys_active_days >= 5) %>% select(phys_active_days, days_phys_hlth_bad)```We can drop `NAs` with `!is.na` Do the same thing as above, but drop responses that don'thavearesponsefor`days_phys_hlth_bad`. Then, use`select`tokeeponlythe`phys_active_days`andthe`days_phys_hlth_bad`variables.```{r}nhanes%>%filter(phys_active_days>=5) %>%filter(!is.na(days_phys_hlth_bad)) %>%select(phys_active_days, days_phys_hlth_bad)```Youcanalsodrop`NAs`with`drop_na`Dothesamethingasabove, butuse`drop_na`insteadof`!is.na`. Makesureyougetthesameresult!```{r}nhanes%>%filter(phys_active_days>=5) %>%drop_na(days_phys_hlth_bad) %>%select(phys_active_days, days_phys_hlth_bad)```
Complete the filter sections of the data-wrangling-and-analysis-exercises.Rmd file.
Learn More
General Data Wrangling and Analysis Resources
Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:
There is a line in the data-wrangling-and-analysis-exercises.Rmd file that says:
"We can use Use , for numeric data."
I think the placement of the equals sign may need to change for "greater or equal than" (i.e., '=>' should be '>=').
returns:
Error: Problem with filter() input ..1. x no applicable method for 'drop_na' applied to an object of class "c('double', 'numeric')" ℹ Input ..1 is drop_na(days_phys_hlth_bad). Run rlang::last_error() to see where the error occurred.
I'm wondering if you know why one needs to concatenate multiple observations in the filter function. For example, here's a correct use of filter:
filter(marital_status %in% c("Divorced", "Separated", "Widowed"))
From what I understand of the concatenate function (from Excel), wouldn't that basically mean R is looking at each observation in marital_status and seeing if that observation can be found anywhere inside "DivorcedSeparatedWidowed"? But when I try it without concatenating by doing this:
filter(marital_status %in% ("DivorcedSeparatedWidowed"))
I get an error message. Maybe the answer is "That's just how R works," but I'm trying to build a solid mental model for these functions, so if there's anything more to it, I'd love to better understand. Thanks!
I'm puzzled because in your video, it looks like there are code chunks that say "Your code here" (or something like that), but when I go to the file, the code is already complete for all the code chunks. This has actually been the case for the past few sections, and looks like it is for the next one, too.
You need to be signed-in to comment on this post. Login.
Catherine Roller White • March 27, 2021
There is a line in the data-wrangling-and-analysis-exercises.Rmd file that says: "We can use Use
,
for numeric data." I think the placement of the equals sign may need to change for "greater or equal than" (i.e., '=>' should be '>=').Vuk Sekicki • March 27, 2021
returns: Error: Problem with
filter()
input..1
. x no applicable method for 'drop_na' applied to an object of class "c('double', 'numeric')" ℹ Input..1
isdrop_na(days_phys_hlth_bad)
. Runrlang::last_error()
to see where the error occurred.Hana Hanfi • April 7, 2021
I'm getting an error that "drop_na" wasn't found even though I previously loaded "dplyr" into the library. Should I have loaded another function?
Zach Tilton • August 19, 2021
Received an error code on the last exercise:
Error in drop_na(., days_phys_hlth_bad) : could not find function "drop_na"
dplyr is loaded. Not sure what is happening here. Moving forward with !is.na for the time being. Thanks.
Ekerette Udoh • September 30, 2021
nhanes %>% filter(Education == "College Grad") %>% select(Education, MaritalStatus)
When running the code above Education and MaritalStatus displays just fine. But when I add the second filter function as below I get an error message.
nhanes %>% filter(Education == "College Grad") %>% filter(MaritalStatus %n% c("Divorced", "Separated", "Widowed")) %>% select(Education, MaritalStatus)
Nathan Welsch • January 17, 2022
Is there a "rule of thumb" or useful way of remembering what needs quotes?
Josh Gutwill • October 4, 2022
I'm wondering if you know why one needs to concatenate multiple observations in the filter function. For example, here's a correct use of filter: filter(marital_status %in% c("Divorced", "Separated", "Widowed")) From what I understand of the concatenate function (from Excel), wouldn't that basically mean R is looking at each observation in marital_status and seeing if that observation can be found anywhere inside "DivorcedSeparatedWidowed"? But when I try it without concatenating by doing this: filter(marital_status %in% ("DivorcedSeparatedWidowed")) I get an error message. Maybe the answer is "That's just how R works," but I'm trying to build a solid mental model for these functions, so if there's anything more to it, I'd love to better understand. Thanks!
Ellen Wilson • October 6, 2022
I'm puzzled because in your video, it looks like there are code chunks that say "Your code here" (or something like that), but when I go to the file, the code is already complete for all the code chunks. This has actually been the case for the past few sections, and looks like it is for the next one, too.
Julieth Silao • November 6, 2022
A tibble:0 × 2
hello David
when i filter education and marital status , i didnt get any table instead A tibble.0x 2 can i help
Chelsea Ruder • March 29, 2023
I can't seem to get the drop an or !is.na functions to work. oregon_respondents % select_all() %>% filter(!is.na(ip_address))
When I run this code, there are still blanks for ip_address.