Lesson 5 of 43
In Progress

Reshaping Data

Your Turn
  1. Start with the enrollment_18_19 data frame
  2. select() the district_id variable as well as those about number of students by race/ethnicity and get rid of all others (hint: use the contains() helper function within select())
  3. Use pivot_longer() to convert all of the race/ethnicity variables into one variable
  4. Within pivot_longer(), use the names_to argument to call that variable race_ethnicity
  5. Within pivot_longer(), use the values_to argument to call that variable number_of_students
Learn More

The best place to learn more about pivot_longer() and pivot_wider() is the pivoting vignette from the tidyr package.

There’s also a nice article by Gavin Simpson of University College, London about pivoting. That article includes the animations below, made by Garrick Aden-Buie and Mara Averick, that gave a visual demonstration of pivoting.

RStudio has a nice primer on reshaping data, complex with a few exercises.

Finally, a heads up: if you ever see references to the functions gather() and spread(), these are the previous iterations of the pivot functions. They still work (as the tweet below from tidyverse developer Hadley Wickham indicates), but the pivot functions are, in my view (and the view of many others), much easier to use.

Have any questions? Put them below.

  1. Hi David, I typed the following code, but the new data frame still the original structure. What could be the problem?
    enrollment_by_race_ethnicity_18_19 %
    select(-contains(“grade”)) %>%
    select(-contains(“kindergarten”)) %>%
    select(-contains(“percent”)) %>%
    pivot_longer(cols = “district_id”,
    names_to = “race_ethnicity”,
    values_to = “number_of_students”)

  2. I used select(!contains (“percent”)) instead of select(-contains (“percent”)), mainly because the helper page listed the exclamation option rather than the minus sign. Are there any differences between the two?