Data Cleaning Tips in R

I recently came across a set of data cleaning tips in Excel from EvaluATE, which provides support for people looking to improve their evaluation practice.

Screenshot of the Excel Data Cleaning Tips

As I looked through the tips, I realized that I could show how to do each of the five tips listed in the document in R. Many people come to R from Excel so having a set of R to Excel equivalents (also see this post on a similar topic) is helpful.

The tips are not intended to be comprehensive, but they do show some common things that people do when cleaning messy data. I did a live stream recently where I took each tip listed in the document and showed its R equivalent.

As I mention at the end of the video, while you can certainly do data cleaning in Excel, switching to R enables you to make your work reproducible. Say you have some surveys that need cleaning today. You write your code and save it. Then, when you get 10 new surveys next week, you can simply rerun your code, saving you countless Excel points and clicks.

You can watch the full video at the very bottom or go each tip by using the videos immediately below. I hope it’s helpful in giving an overview of data cleaning in R!

Tip #1: Identify all cells that contain a specific word or (short) phrase in a column with open-ended text

Tip #2: Identify and remove duplicate data

Tip #3: Identify the outliers within a data set

Tip #4: Separate data from a single column into two or more column

Tip #5: Categorize data in a column, such as class assignments or subject groups

Full Video

The R for the Rest of Us community is live! Join regular office hours, ask questions in the forum, and more!