Going Deeper with R
-
Welcome to Going Deeper with R
-
Advanced Data Wrangling and AnalysisOverview
-
Importing Data
-
Tidy Data
-
Reshaping Data
-
Dealing with Missing Data
-
Changing Variable Types
-
Advanced Variable Creation
-
Advanced Summarizing
-
Binding Data Frames
-
Functions
-
Merging Data
-
Renaming Variables
-
Quick Interlude to Reorganize our Code
-
Exporting Data
-
Advanced Data VisualizationData Visualization Best Practices
-
Tidy Data
-
Pipe Data Into ggplot
-
Reorder Plots to Highlight Findings
-
Line Charts
-
Use Color to Highlight Findings
-
Declutter
-
Use the scales Package for Nicely Formatted Values
-
Use Direct Labeling
-
Use Axis Text Wisely
-
Use Titles to Highlight Findings
-
Use Color in Titles to Highlight Findings
-
Use Annotations to Explain
-
Tweak Spacing
-
Customize Your Theme
-
Customize Your Fonts
-
Try New Plot Types
-
Advanced RMarkdownAdvanced Markdown Text Formatting
-
Tables
-
Advanced YAML
-
Inline R Code
-
Making Your Reports Shine: Word Edition
-
Making Your Reports Shine: HTML Edition
-
Making Your Reports Shine: PDF Edition
-
Presentations
-
Dashboards
-
Other Formats
-
Wrapping UpYou Did It!
Your Turn
- Start with the
enrollment_18_19
data frame select()
thedistrict_id
variable as well as those about number of students by race/ethnicity and get rid of all others (hint: use thecontains()
helper function withinselect()
)- Use
pivot_longer()
to convert all of the race/ethnicity variables into one variable - Within
pivot_longer()
, use the names_to argument to call that variablerace_ethnicity
- Within
pivot_longer()
, use the values_to argument to call that variablenumber_of_students
Solutions
Learn More
The best place to learn more about pivot_longer()
and pivot_wider()
is the pivoting vignette from the tidyr
package.
There’s also a nice article by Gavin Simpson of University College, London about pivoting. That article includes the animations below, made by Garrick Aden-Buie and Mara Averick, that gave a visual demonstration of pivoting.

RStudio has a nice primer on reshaping data, complex with a few exercises.
Finally, a heads up: if you ever see references to the functions gather()
and spread()
, these are the previous iterations of the pivot
functions. They still work (as the tweet below from tidyverse developer Hadley Wickham indicates), but the pivot
functions are, in my view (and the view of many others), much easier to use.
You may have heard a rumour that gather/spread are going away. This is simply not true (they’ll stay around forever) but I am working on better replacements which you can learn about at https://t.co/sU2GzWeBaf. Now is a great time for feedback! #rstats
— Hadley Wickham (@hadleywickham) March 19, 2019
Have any questions? Put them below and we will help you out!
You must be logged in to post a comment.
Hi David, I typed the following code, but the new data frame still the original structure. What could be the problem?
enrollment_by_race_ethnicity_18_19 %
select(-contains(“grade”)) %>%
select(-contains(“kindergarten”)) %>%
select(-contains(“percent”)) %>%
pivot_longer(cols = “district_id”,
names_to = “race_ethnicity”,
values_to = “number_of_students”)
Hi IIbrah. Try changing
pivot_longer(cols = “district_id”,
to
pivot_longer(-district_id ,
The code you entered was pivoting the district ID in addition to the other columns.
I used select(!contains (“percent”)) instead of select(-contains (“percent”)), mainly because the helper page listed the exclamation option rather than the minus sign. Are there any differences between the two?
I don’t know the answer to that! Does it give the same result?
Looks like it gives the same answer.
Ok, I guess it does do the same thing then. Thanks for teaching me something new!
I only have enrollment_17_18 and enrollment_18_19 files. Where did the math scores files come from that is shown at 5:40 of this lesson? My code matches what’s in the Solutions for the Importing Data lesson and I don’t think there was anything we downloaded in the Tidy Data lesson.
I now see that we don’t need for (Y)our turn. was just trying to follow along
Hi David, is there a way to enter the name of each column we want to remove within one select(-contains(” “)) argument rather than writing each one separately?
Do you mean like doing
select(-c(column1, column2))
?Yes! Thank you, that worked.
Hi David! by using pivot_longer it means that it will always arrange the data by 3 columns?
I’m not sure I quite understand your question. Can you explain a bit more?
What I understand from pivot_longer documentation page is that it will rearrange the data by decreasing the number of columns, so my question is: if pivot longer will always tidy the data into 3 columns? In this case there is just one column for “the values”, one column for “the names or columns that have characters” and “id column” (this last one we kept it by adding the “-“).
It will be 3 columns if you only have 1 column that you are not pivoting (in this case, the id column). However, if you are not pivoting multiple columns then you will have more than 3. Does that make sense?