Skip to content
What's New in R is a weekly email to help you up your R game. Sign up →
R for the Rest of Us Logo

Data Cleaning with R

Regex in R

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Match the following regular expressions against the test vector below using str_detect. Can you explain the matches?

Regular expressions

  1. ^dog

  2. ^[a-z]+$

  3. \\d

test_vector <- c("Those dogs are small.","dogs and cats",
                 "34","(34)","rat","watchdog","placemat",
                 "BABY","2011_April","mice")

Learn More

To learn more about the stringr package, check out the documentation website. There is also a stringr cheatsheet. You also might check out Chapter 14 of R for Data Science as well as this blog post by Hugo Toscano on working with strings in R.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Alberto Cabrera

Alberto Cabrera • January 6, 2024

Need your help figuring out how the following regex works : "\[.+\]|\?". It was developed by Albert Rapp as part of a mutate program to pull years form the following strings. "1973 [YR1973]" "1974 [YR1974]" "1975 [YR1975]" "1976 [YR1976]"

wd_data %>% mutate( year = year %>% str_remove("\[.+\]|\?") %>% str_to_title()) %>% pull(year)

It is not clear to me how this regular expression worked in extracting just the first 4 characters associated with year. Thanks

David Keyes

David Keyes Founder • January 7, 2024

I'm sorry but I'm not sure we can answer this for you. The code you provided doesn't work for me. You might consider reaching out to Albert directly.

Alberto Cabrera

Alberto Cabrera • January 7, 2024

Thanks!