Skip to content

Data Cleaning with R

What is Data Cleaning?

This lesson is called What is Data Cleaning?, part of the Data Cleaning with R course. This lesson is called What is Data Cleaning?, part of the Data Cleaning with R course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Read this short blog post by John D. Cook
Reflect on the amount (if any) of data cleaning you perform for your day-to-day work.

Learn More

This article by Katie Rawson and Trevor Muñoz helps to recontextualize the entire process of data cleaning.

Randy Au's article, Data Cleaning is Analysis, Not Grunt Work, makes a similar point.

The tweet thread below (click here for the original) argues that data cleaning is part of the analysis process.

Data scientists often complain that the bulk of their work is data cleaning.

But if you see data cleaning as the work, not just an obstacle to the work, it can be interesting.

You could think of it as data pathology, a kind of analysis before the intended analysis.
— Data Science Fact (@DataSciFact) January 12, 2021

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Course Content

32 Lessons

What is Data Cleaning?

Course Logistics and Materials

Data Organization Best Practices

Grouping and Indicator Variables

NA and Empty Values

Data Sharing Best Practices

Tidyverse Refresher

Working with Columns with across()

coalesce() and fill()

What are Regular Expressions?

Understanding and Testing Regular Expressions

Literal Characters and Metacharacters

Metacharacters: Quantifiers

Metacharacters: Alternation, Special Sequences, and Escapes

Combining Metacharacters

Regular Expressions and Data Cleaning, Part 1

Regular Expressions and Data Cleaning, Part 2

Common Issues in Data Cleaning

Unusable Variable Names

Missing, Implicit, or Misplaced Grouping Variables

Compound Values

Duplicated Values

Empty Rows and Columns

Parsing Numbers

Putting Everything Together

Wrapping Up Data Cleaning with R