Get access to all lessons in this course.
-
Welcome to Data Cleaning with R
- What is Data Cleaning?
- Course Logistics and Materials
-
Data Organization
- Data Organization Best Practices
- Tidy Data
- Grouping and Indicator Variables
- NA and Empty Values
- Data Sharing Best Practices
-
Restructuring Data
- Tidyverse Refresher
- Working with Columns with across()
- Pivoting Data
- coalesce() and fill()
-
Regular Expressions
- What are Regular Expressions?
- Understanding and Testing Regular Expressions
- Literal Characters and Metacharacters
- Metacharacters: Quantifiers
- Metacharacters: Alternation, Special Sequences, and Escapes
- Combining Metacharacters
- Regex in R
- Regular Expressions and Data Cleaning, Part 1
- Regular Expressions and Data Cleaning, Part 2
-
Common Issues
- Common Issues in Data Cleaning
- Unusable Variable Names
- Whitespace
- Letter Case
- Missing, Implicit, or Misplaced Grouping Variables
- Compound Values
- Duplicated Values
- Broken Values
- Empty Rows and Columns
- Parsing Numbers
- Putting Everything Together
Data Cleaning with R
Regex in R
This lesson is locked
This lesson is called Regex in R, part of the Data Cleaning with R course. This lesson is called Regex in R, part of the Data Cleaning with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Match the following regular expressions against the test vector below using str_detect
. Can you explain the matches?
Regular expressions
^dog
^[a-z]+$
\\d
test_vector <- c("Those dogs are small.","dogs and cats",
"34","(34)","rat","watchdog","placemat",
"BABY","2011_April","mice")
Learn More
To learn more about the stringr
package, check out the documentation website. There is also a stringr
cheatsheet. You also might check out Chapter 14 of R for Data Science as well as this blog post by Hugo Toscano on working with strings in R.
You need to be signed-in to comment on this post. Login.
Alberto Cabrera
January 7, 2024
Need your help figuring out how the following regex works : "\[.+\]|\?". It was developed by Albert Rapp as part of a mutate program to pull years form the following strings. "1973 [YR1973]" "1974 [YR1974]" "1975 [YR1975]" "1976 [YR1976]"
wd_data %>% mutate( year = year %>% str_remove("\[.+\]|\?") %>% str_to_title()) %>% pull(year)
It is not clear to me how this regular expression worked in extracting just the first 4 characters associated with year. Thanks
David Keyes Founder
January 8, 2024
I'm sorry but I'm not sure we can answer this for you. The code you provided doesn't work for me. You might consider reaching out to Albert directly.
Alberto Cabrera
January 8, 2024
Thanks!