Get access to all lessons in this course.
-
Advanced Data Wrangling and Analysis
- Overview
- Importing Data
- Tidy Data
- Reshaping Data
- Dealing with Missing Data
- Changing Variable Types
- Advanced Variable Creation
- Advanced Summarizing
- Binding Data Frames
- Functions
- Merging Data
- Renaming Variables
- Quick Interlude to Reorganize our Code
- Exporting Data
-
Advanced Data Visualization
- Data Visualization Best Practices
- Tidy Data
- Pipe Data Into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Use the scales Package for Nicely Formatted Values
- Use Direct Labeling
- Use Axis Text Wisely
- Use Titles to Highlight Findings
- Use Color in Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Customize Your Theme
- Customize Your Fonts
- Try New Plot Types
-
Advanced RMarkdown
- Advanced Markdown Text Formatting
- Tables
- Advanced YAML
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: HTML Edition
- Making Your Reports Shine: PDF Edition
- Presentations
- Dashboards
- Other Formats
-
Wrapping Up
- You Did It!
Going Deeper with R
Tidy Data
This lesson is locked
This lesson is called Tidy Data, part of the Going Deeper with R course. This lesson is called Tidy Data, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Read the Tidy Data vignette
Take a look at your data and see which principles of tidy data it violates
Learn More
In the video, I only talk about two types of data tidying: each variable forming a column and each type of observational unit forming a table. If you want to see examples of the third type (each observation forming a row), check out the tidy data vignette from the tidyr
package.
The workflow diagram I talked about is from Chapter 1 of R for Data Science.

One small note unrelated to the main content of this lesson: I recorded it before dplyr
1.0 was released. If you have this version of dplyr
installed, you have access to the across()
function, which enables you to do summaries across rows. My example of finding it challenging to summarize German speakers data across rows would be much easier using the across()
function. However, I still think that in most cases, it is easier to tidy your data and work with it in that format.
You need to be signed-in to comment on this post. Login.
Vuk Sekicki
April 19, 2021
Hello David,
Could you help me out understanding this: names_pattern = "(.)(.+)"
Specifically what is "(.)(.+)"
Thanks.
David Keyes
April 19, 2021
Oh man, this is an area I struggle with. With that caveat, this is a way to flexibly create variable names on the fly when you pivot. It involves using regular expressions, which, again, I'm not very good at. You can read a bit more about this here.
Vuk Sekicki
April 20, 2021
https://towardsdatascience.com/a-gentle-introduction-to-regular-expressions-with-r-df5e897ca432
https://towardsdatascience.com/anchors-away-more-regex-concepts-in-r-f00fe7f07d52 datacamp: String Manipulation with stringr in R
Just putting it out there if anyone needs it in future. I see this is a science by it self that requires weeks to master. I will look into this in future. Too much for now. TNX!!
David Keyes
April 20, 2021
Yes, it's very complex! Not to be too salesy but I actually have an in-depth data cleaning course coming out later this year. It has a whole section on regular expressions.
Vuk Sekicki
April 22, 2021
Not at all, looking forward!
Matt M
November 8, 2021
I see you re-worded the 3 rules of tidy data from the vignette. Although I think I understand conceptually what is being sought, I'm not sure I follow what each rule means (i.e., what I need to do to make sure that I'm complying with the rule) and what a violation of each rule looks like (the third rule in particular)
David Keyes
November 10, 2021
The wording of the 3 rules of tidy data have changed over time. I hope some of the additional resources I shared help to make sense of the rules. Here they are for others:
https://twitter.com/juliesquid/status/1315710359404113920 https://betterleftsaid.medium.com/intro-to-data-structure-by-way-of-a-calming-spring-scene-a43aa1664922 https://www.youtube.com/watch?v=QB8AdKO4RNc&t=360s