Duplicated Values
This lesson is called Duplicated Values, part of the Data Cleaning with R course. This lesson is called Duplicated Values, part of the Data Cleaning with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Load the messy Age of Empires units dataset bundled with unheadr
(AOEunits_raw) and keep only units of Type “Cavalry”.
Identify duplicated records across all variables.
Remove duplicated records across all variables.
Learn More
Kaggle ran a data cleaning challenge focused on deduplicating data. Their code has examples of ways to deduplicate using R.
Have any questions? Put them below and we will help you out!
Course Content
32 Lessons
1
What are Regular Expressions?
03:48
2
Understanding and Testing Regular Expressions
03:51
3
Literal Characters and Metacharacters
06:16
4
Metacharacters: Quantifiers
01:33
5
Metacharacters: Alternation, Special Sequences, and Escapes
02:53
6
Combining Metacharacters
05:18
7
Regex in R
02:58
8
Regular Expressions and Data Cleaning, Part 1
04:15
9
Regular Expressions and Data Cleaning, Part 2
12:00
1
Common Issues in Data Cleaning
03:17
2
Unusable Variable Names
10:11
3
Whitespace
11:10
4
Letter Case
06:52
5
Missing, Implicit, or Misplaced Grouping Variables
11:19
6
Compound Values
10:09
7
Duplicated Values
08:49
8
Broken Values
09:52
9
Empty Rows and Columns
11:30
10
Parsing Numbers
12:02
11
Putting Everything Together
25:50
You need to be signed-in to comment on this post. Login.