Regex in R
This lesson is called Regex in R, part of the Data Cleaning with R course. This lesson is called Regex in R, part of the Data Cleaning with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Match the following regular expressions against the test vector below using str_detect
. Can you explain the matches?
Regular expressions
^dog
^[a-z]+$
\\d
test_vector <- c("Those dogs are small.","dogs and cats",
"34","(34)","rat","watchdog","placemat",
"BABY","2011_April","mice")
Learn More
To learn more about the stringr
package, check out the documentation website. There is also a stringr
cheatsheet. You also might check out Chapter 14 of R for Data Science as well as this blog post by Hugo Toscano on working with strings in R.
Have any questions? Put them below and we will help you out!
Course Content
32 Lessons
1
What are Regular Expressions?
03:48
2
Understanding and Testing Regular Expressions
03:51
3
Literal Characters and Metacharacters
06:16
4
Metacharacters: Quantifiers
01:33
5
Metacharacters: Alternation, Special Sequences, and Escapes
02:53
6
Combining Metacharacters
05:18
7
Regex in R
02:58
8
Regular Expressions and Data Cleaning, Part 1
04:15
9
Regular Expressions and Data Cleaning, Part 2
12:00
1
Common Issues in Data Cleaning
03:17
2
Unusable Variable Names
10:11
3
Whitespace
11:10
4
Letter Case
06:52
5
Missing, Implicit, or Misplaced Grouping Variables
11:19
6
Compound Values
10:09
7
Duplicated Values
08:49
8
Broken Values
09:52
9
Empty Rows and Columns
11:30
10
Parsing Numbers
12:02
11
Putting Everything Together
25:50
You need to be signed-in to comment on this post. Login.
Alberto Cabrera • January 6, 2024
Need your help figuring out how the following regex works : "\[.+\]|\?". It was developed by Albert Rapp as part of a mutate program to pull years form the following strings. "1973 [YR1973]" "1974 [YR1974]" "1975 [YR1975]" "1976 [YR1976]"
wd_data %>% mutate( year = year %>% str_remove("\[.+\]|\?") %>% str_to_title()) %>% pull(year)
It is not clear to me how this regular expression worked in extracting just the first 4 characters associated with year. Thanks
David Keyes Founder • January 7, 2024
I'm sorry but I'm not sure we can answer this for you. The code you provided doesn't work for me. You might consider reaching out to Albert directly.
Alberto Cabrera • January 7, 2024
Thanks!