select
This lesson is called select, part of the R in 3 Months (Fall 2021) course. This lesson is called select, part of the R in 3 Months (Fall 2021) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
If you want to see the examples file used for this section of the course, you can take a look at the RMarkdown version as well as the knitted HTML version.
Your Turn
Complete the select sections of the data-wrangling-and-analysis-exercises.Rmd file.
Learn More
To learn more about select helper functions (e.g. contains), check out the Tidyverse website. We only covered a few of them and there are more!
General Data Wrangling and Analysis Resources
Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:
Chapter 5 of R for Data Science
RStudio Cloud primer on working with data
Tidyverse for Beginners by Danielle Navarro
Learning Statistics with R by Danielle Navarro
Introduction to the Tidyverse by Alison Hill
A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas
Have any questions? Put them below and we will help you out!
Course Content
134 Lessons
You need to be signed-in to comment on this post. Login.
Jyoni Shuler • March 24, 2021
Hi David, I'm trying to figure out the keyboard shortcut to run code - for Macs, it says up arrow + Command + and another arrow I cannot figure out. What is that exactly? Thanks!
Lindsey Kenyon • March 24, 2021
Is it possible to 'select' from a row rather than a column? Or does wrangling data in R require data frames to be vertical?
Lindsey Kenyon • March 24, 2021
How do you accommodate using the 'select' function if your table headers are merged?
Abby Isaacson • March 29, 2021
FYI to the group I was looking for the pipe shortcut reminder and came across this link: https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts
Abby Isaacson • March 30, 2021
Also a note for what's worked for me on 'select' section: when I try the suggested code format such as "marital_status", I get an error; but when I format identically to the imported nhanes dataset, "MaritalStatus" works (matches variable name). Same throughout the exercise.
Kathleen Carson • March 30, 2021
Is there an output that tells us all of our columns? I have the parsing notice but that doesn't show all the columns so I am not sure how to do the "select variables from 'health_gen' to end' without looking at the solutions.
Harold Stanislaw • April 1, 2021
Comment rather than a question. When dropping a range of variables, I tired this code: select(-heath_gen:education), which left out a set of parentheses. One solution is to include the parentheses, so the code is select(-(heath_gen:education)). However, I found that this also works: select(-heath_gen:-education).
Naomi Nichols • April 13, 2021
None of the exercises that are evident in your tutorial video are accessible to me in the data-wrangling-and-analysis-exercises RMD file. I just have the code you used.
Marcus Lee • May 16, 2021
Hi David,
Any quick way to select a column to the last column? E.g health_gen to the last column, instead of typing out select(health_gen:smoke_now]?
Matt M • September 27, 2021
Nothing major, but I've noticed that several of the variable names differ between your solutions video and my nhanes (e.g., Height vs height and HealthGen vs. health_gen).
It may be another issue of things changing slightly in the data over time. But it has served as good reminder to be careful about capitalization (and why to avoid it in variable names)
Matt M • October 6, 2021
Thanks for the help. But I don't think that's my issue.
nhanes % select(marital_status)
##I get the error: "Error: Can't subset columns that don't exist. x Column
marital_status
doesn't exist. Runrlang::last_error()
to see where the error occurred." But using the original variable name MaritalStatus, the code runs fine.Chhavi Kotwani • March 18, 2022
Hi David!
I ran clean_names on nhanes and then displayed nhanes to see if it worked - it did. However, when I move on to the select function, it refuses to recognized the cleaned version and still refers to the earlier version. Is there something I missed?
Thanks!
Tatiana Bustos • July 27, 2022
Im getting an error "attempt to use zero-length variable name" when I use the following code:
Any idea what the error message means? It worked for the single select.
Tatiana Bustos • July 28, 2022
Just reflecting on the data wrangling - it looks like the data on the excel (or CSV) sheet has to be set up just right to be able to use these coding exercises. Can you share more about the data file preparation? What practices we should have in place regarding variables, types of inputted data, etc? A lot of my time is spent in data cleaning before actually getting to the analyses. Sorry if I am getting ahead !
Elsa Bailey • October 4, 2022
Could you please go over the use of "quotes" around a term. When are quotes required, and when are they not necessary? For example, quotes are used here - select(contains("hlth_bad")). But no quotes are used here - select(marital_status, education). Thanks!
Alyssa Carr • October 5, 2022
I believe I'm having the same issue as described by Matt M below. I have run the clean_names function and I get an output with the new correct names. However when I run the select functions if I don't put in the previous names I get an error message that says the columns don't exist. If I put in the non-cleaned names it works fine. I see that you have a video below, but I get a message that I don't have permission to view the video. Could you let me know what the solution was?