Fundamentals of R

33 lessons

Welcome to Fundamentals of R
Update Everything
Start a New Project
Data Wrangling and Analysis
The Tidyverse
Pipes
select()
mutate()
filter()
summarize()
group_by() and summarize()
arrange()
Create a New Data Frame
Bring it All Together (Data Wrangling)
Data Visualization
The Grammar of Graphics
Scatterplots
Histograms
Bar Charts
Setting color and fill Aesthetic Properties
Setting color and fill Scales
Setting x and y Scales
Adding Text to Plots
Plot Labels
Themes
Facets
Save Plots
Bring it All Together (Data Visualization)
Quarto
Quarto Overview
YAML
Text
Code Chunks
Tips for Working with Quarto
Bring It All Together (Quarto)
Wrapping Up
An Important Workflow Tip

select

This lesson is locked

Get access to all lessons in this course.

This lesson is called select, part of the Fundamentals of R course. This lesson is called select, part of the Fundamentals of R course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

If you want to see the examples file used for this section of the course, you can take a look at the RMarkdown version as well as the knitted HTML version.

Your Turn

Solution

# select

![](slides/images/select.png)

With `select` we can select variables from the larger data frame. 

Use `select` to show just the `marital_status` variable.

```{r}
nhanes %>% 
  select(marital_status)
```



We can also use `select` for multiple variables. 

Use `select` to show `marital_status` and `education`.

```{r}
nhanes %>%
  select(marital_status, education)
```


Used within `select`, the `contains` function selects variable with certain text in the variable name. 

Use the `contains` function to select variables that ask how many days in the last 30 days the respondent had bad physical and mental health (you should be able to figure out which variables these are from the names). 


```{r}
nhanes %>%
  select(contains("hlth_bad"))
```

Used within `select`, the `starts_with` function selects variable with certain text in the variable name. 

Use the `starts_with` function to select variables that start with the letter h.

```{r}
nhanes %>% 
  select(starts_with("h"))
```



We can `select` a range of columns using the var1:var2 pattern. `select` all the variables from `health_gen` to the end.


```{r}
nhanes %>%
  select(health_gen:smoke_now)
```



We can drop variables using the -var format. Drop the `education` variable.


```{r}
nhanes %>%
  select(-education)
```



We can drop a set of variables using the -(var1:var2) format. Drop the variables from `health_gen` to the end.


```{r}
nhanes %>%
  select(-(health_gen:smoke_now))
```

Complete the select sections of the data-wrangling-and-analysis-exercises.Rmd file.

Learn More

To learn more about select helper functions (e.g. contains), check out the Tidyverse website. We only covered a few of them and there are more!

General Data Wrangling and Analysis Resources

Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:

Chapter 5 of R for Data Science

RStudio Cloud primer on working with data

Tidyverse for Beginners by Danielle Navarro

Learning Statistics with R by Danielle Navarro

Introduction to the Tidyverse by Alison Hill

A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas

Working in the Tidyverse by Desi Quintans and Jeff Powell

Christine Monnier video tutorials on dplyr

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Jyoni Shuler

March 24, 2021

Hi David, I'm trying to figure out the keyboard shortcut to run code - for Macs, it says up arrow + Command + and another arrow I cannot figure out. What is that exactly? Thanks!

Lindsey Kenyon

March 24, 2021

Is it possible to 'select' from a row rather than a column? Or does wrangling data in R require data frames to be vertical?

Lindsey Kenyon

March 24, 2021

How do you accommodate using the 'select' function if your table headers are merged?

Abby Isaacson

March 29, 2021

FYI to the group I was looking for the pipe shortcut reminder and came across this link: https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts

Abby Isaacson

March 30, 2021

Also a note for what's worked for me on 'select' section: when I try the suggested code format such as "marital_status", I get an error; but when I format identically to the imported nhanes dataset, "MaritalStatus" works (matches variable name). Same throughout the exercise.

Kathleen Carson

March 30, 2021

Is there an output that tells us all of our columns? I have the parsing notice but that doesn't show all the columns so I am not sure how to do the "select variables from 'health_gen' to end' without looking at the solutions.

Harold Stanislaw

April 1, 2021

Comment rather than a question. When dropping a range of variables, I tired this code: select(-heath_gen:education), which left out a set of parentheses. One solution is to include the parentheses, so the code is select(-(heath_gen:education)). However, I found that this also works: select(-heath_gen:-education).

Naomi Nichols

April 13, 2021

None of the exercises that are evident in your tutorial video are accessible to me in the data-wrangling-and-analysis-exercises RMD file. I just have the code you used.

Marcus Lee

May 16, 2021

Hi David,

Any quick way to select a column to the last column? E.g health_gen to the last column, instead of typing out select(health_gen:smoke_now]?

Matt M

September 27, 2021

Nothing major, but I've noticed that several of the variable names differ between your solutions video and my nhanes (e.g., Height vs height and HealthGen vs. health_gen).

It may be another issue of things changing slightly in the data over time. But it has served as good reminder to be careful about capitalization (and why to avoid it in variable names)

Matt M

October 6, 2021

Thanks for the help. But I don't think that's my issue.

nhanes % select(marital_status)

##I get the error: "Error: Can't subset columns that don't exist. x Column marital_status doesn't exist. Run rlang::last_error() to see where the error occurred." But using the original variable name MaritalStatus, the code runs fine.

Chhavi Kotwani

March 18, 2022

Hi David!

I ran clean_names on nhanes and then displayed nhanes to see if it worked - it did. However, when I move on to the select function, it refuses to recognized the cleaned version and still refers to the earlier version. Is there something I missed?

Thanks!

Tatiana Bustos

July 27, 2022

Im getting an error "attempt to use zero-length variable name" when I use the following code:

nhanes %&gt;% 
  select(marital_status, education)

Any idea what the error message means? It worked for the single select.

Tatiana Bustos

July 28, 2022

Just reflecting on the data wrangling - it looks like the data on the excel (or CSV) sheet has to be set up just right to be able to use these coding exercises. Can you share more about the data file preparation? What practices we should have in place regarding variables, types of inputted data, etc? A lot of my time is spent in data cleaning before actually getting to the analyses. Sorry if I am getting ahead !

Elsa Bailey

October 4, 2022

Could you please go over the use of "quotes" around a term. When are quotes required, and when are they not necessary? For example, quotes are used here - select(contains("hlth_bad")). But no quotes are used here - select(marital_status, education). Thanks!

Rachel Nicholson

October 5, 2022

I believe I'm having the same issue as described by Matt M below. I have run the clean_names function and I get an output with the new correct names. However when I run the select functions if I don't put in the previous names I get an error message that says the columns don't exist. If I put in the non-cleaned names it works fine. I see that you have a video below, but I get a message that I don't have permission to view the video. Could you let me know what the solution was?

Fundamentals of R

Data Wrangling and Analysis

Data Visualization

Quarto

Wrapping Up

Fundamentals of R

select

This lesson is locked

Transcript

Your Turn

Solution

Learn More

Have any questions? Put them below and we will help you out!

Jyoni Shuler

Lindsey Kenyon

Lindsey Kenyon

Abby Isaacson

Abby Isaacson

Kathleen Carson

Harold Stanislaw

Naomi Nichols

Marcus Lee

Matt M

Matt M

Chhavi Kotwani

Tatiana Bustos

Tatiana Bustos

Elsa Bailey

Rachel Nicholson