Skip to content
R for the Rest of Us Logo

Going Deeper with R

Bring It All Together (Advanced Data Wrangling)

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(janitor)

# Import Data -------------------------------------------------------------

survey_data_raw <- read_tsv("data-raw/2020-combined-survey-final.tsv") |> 
  clean_names() |> 
  mutate(id = row_number())


# Exploration -------------------------------------------------------------

survey_data_raw |> 
  glimpse()

survey_data_raw
count(qr_learning_path) |> 
  arrange(desc(n))


# Tidying -----------------------------------------------------------------

other_coding_languages <- 
  survey_data_raw |> 
  select(id, qcoding_languages) |> 
  separate_longer_delim(qcoding_languages,
                        delim = ", ")

demographics <- survey_data_raw |> 
  select(id, qyear_born:qcountry)


# Export ------------------------------------------------------------------

other_coding_languages |> 
  write_rds("data/other_coding_languages.rds")

demographics |> 
  write_rds("data/demographics.rds")

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Marina Gimenez

Marina Gimenez • April 4, 2025

I do not understand by instead of working with the whole file you first make 2 files out of it to bring them back together? Or was it just for the sake of practicing? Or because the file was so big that indeed you just only picked what you needed, but then still you could have picked it to make a single file?

Gracielle Higino

Gracielle Higino Coach • April 4, 2025

Hi Marina! All of these are good reasons to split your dataset! Very often, splitting a dataset makes it clearer and easier to transpose the parts, when you need to pivot longer or wider. Then you can use joins to combine the pivoted datasets, using only the variables you want.

That's what David is demonstrating here: notice how the raw data in this example contains 53 variables, while the selected and cleaned datasets only have 6 and 2 variables each. He cleaned and selected the data he'd need, and then used them separately and joined as he needed.

Course Content

44 Lessons