Skip to content
R for the Rest of Us Logo

A Chat About Tidy Data

David Keyes David Keyes
January 3rd, 2022

Tidy data is one of the most complex concepts for participants in R in 3 Months. Even before they get to the actual coding involved in data tidying, many struggle with what tidy data is exactly.

I introduce tidy data to R in 3 Months participants through this lesson (which comes from the Going Deeper with R course). In it, I try to give an overview of what tidy data is and why it's beneficial to use it.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

But, for many people, this isn't enough. Understanding tidy data takes several weeks — or longer. In the Fall 2021 cohort of R in 3 Months, one participant who was struggling with the concept of tidy data reached out to say he was having trouble. Matt Makel, who is an Associate Research Scientist for the Johns Hopkins School of Education, said in an email he just really couldn't wrap his mind around tidy data. So, to help him, I asked if he'd be willing to jump on Zoom with me. He agreed and we talked for an hour about tidy data.

Matt was generous enough to allow me to record our conversation, which you can watch below. In the hour or so we talked, we covered a wide range of topics related to tidy data.

We discussed the fact that there is nothing inherently superior about tidy data, it just makes it easier to do things in R, especially when using the tidyverse (this is where the "tidy" comes from).

We also talked about how the tidyness of your data depends on your unit of analysis. I gave the example (at 6:45 in the video below) of a "select all that apply" type question, the answers to which come in a single cell (see below). If we care about counting respondents, then this format is tidy; if we care about responses to the question, it's not.

Watch to the end of the video and you'll see that I summarize the reasons I find tidy data useful: Tidy approach is more concise, requires less copying and pasting (and is thus less error prone), and more flexible.

If you're learning about the idea of tidy data, I hope this conversation might be helpful. Matt brings up a lot of questions people who are learning about tidy data have so maybe it will answer a few of yours!

Let us know what you think by adding a comment below.

You need to be signed-in to comment on this post. Login.