Get access to all lessons in this course.
Data Cleaning with R
Welcome to Data Cleaning with R
- What is Data Cleaning?
- Course Logistics and Materials
- Data Organization Best Practices
- Tidy Data
- Grouping and Indicator Variables
- NA and Empty Values
- Data Sharing Best Practices
- Tidyverse Refresher
- Working with Columns with across()
- Pivoting Data
- coalesce() and fill()
- What are Regular Expressions?
- Understanding and Testing Regular Expressions
- Literal Characters and Metacharacters
- Metacharacters: Quantifiers
- Metacharacters: Alternation, Special Sequences, and Escapes
- Combining Metacharacters
- Regex in R
- Regular Expressions and Data Cleaning, Part 1
- Regular Expressions and Data Cleaning, Part 2
- Common Issues in Data Cleaning
- Unusable Variable Names
- Letter Case
- Missing, Implicit, or Misplaced Grouping Variables
- Compound Values
- Duplicated Values
- Broken Values
- Empty Rows and Columns
- Parsing Numbers
- Putting Everything Together
What is Data Cleaning?
This lesson is locked
This lesson is called What is Data Cleaning?, part of the Data Cleaning with R course. This lesson is called What is Data Cleaning?, part of the Data Cleaning with R course.
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Reflect on the amount (if any) of data cleaning you perform for your day-to-day work.
Randy Au's article, Data Cleaning is Analysis, Not Grunt Work, makes a similar point.
The tweet thread below (click here for the original) argues that data cleaning is part of the analysis process.
Data scientists often complain that the bulk of their work is data cleaning.— Data Science Fact (@DataSciFact) January 12, 2021
But if you see data cleaning as the work, not just an obstacle to the work, it can be interesting.
You could think of it as data pathology, a kind of analysis before the intended analysis.