Skip to content
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")

# select() ----------------------------------------------------------------

penguins

# With select() we can select variables from the larger data frame.

penguins |> 
  select(bill_length_mm)

# We can also use select() for multiple variables:

penguins |>
  select(bill_length_mm, bill_depth_mm)

# select() has several helper functions for selecting variables.

# The contains() function finds any variable with certain text 
# in the variable name:

penguins |>
  select(contains("bill"))

# The starts_with() function allows us to select variables 
# that start with certain text:

penguins |> 
  select(starts_with("bill"))

# The ends_with() function allows us to select variables that end with certain text:

penguins |> 
  select(ends_with("mm"))

# We can select a range of columns using the var1:var2 pattern

penguins |> 
  select(species:bill_length_mm)

# We can drop variables using the -var format:

penguins |> 
  select(-bill_length_mm)

# We can drop a set of variables using the -(var1:var2) format:

penguins |> 
  select(-(bill_length_mm:flipper_length_mm))

Your Turn

Copy the code below into your R script file and complete the exercises.

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data

penguins <- read_csv("penguins.csv")

# select() ----------------------------------------------------------------

# Use select() to keep only the sex variable

# YOUR CODE HERE

# Use select() to keep the island and sex variables

# YOUR CODE HERE

# Use one of the select() helper functions to keep all variables that have the letter s in their names

# YOUR CODE HERE

# Use one of the select() helper functions to keep all variables that start with the letter b

# YOUR CODE HERE

# Use select() to keep the variables from island to the end

# YOUR CODE HERE

# Use the dropping syntax with - to keep the same variables as above (island to the end)

# YOUR CODE HERE

# Drop all variables from bill_length_mm to body_mass_g
			
# YOUR CODE HERE

Learn More

To learn more about the select() function, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

This may be covered in a future lesson, but I'm wondering about toggling variables on/off in the data view instead of using the select() function. I understand that the select() function will allow me to view the data for selected variables in the console, though I don't find this very useful (or maybe am just not used to it?), as it's providing such a small snippet of the data (i.e. only the first 10 rows, also not sure how this would look if I were asking to a dataset that is 40 vars wide). Currently, when I'm reviewing my data in Stata, I open up the data viewer, which looks similar to the data tab in R Studio. I can then select any variables to show/hide directly in this panel, and then may highlight a particular observation of interest and toggle additional variables on or off. I'm not sure that the select() function in R Studio would allow me to explore my data as seamlessly -- but, like I said, maybe this will come in a future lesson, or I will get used to viewing my data in the console :P

As a response to my own question, I saw on the filter() lesson that I can use view() to open up the selected variables in a new data pane! I imagine this being less "click-and-pointy" than my current workflow in Stata, as I imagine I would need to return to my R script/console many times to be toggling variables on and off as I explore my data -- but this already gets me much closer to what I'm trying to do!

Gracielle Higino

Gracielle Higino

September 20, 2023

Hi Olivia! That's a great question, I also sometimes wish RStudio/posit had a more intuitive and exploratory view function, but you totally get used to doing everything on the command line! I think this solution using view() is the most straightforward. You can use the following code to check only the first three columns, for example:

penguins |>
  select(1:3) |>
  view()