Working with labelled data
This lesson is called Working with labelled data, part of the R in 3 Months (Spring 2025) course. This lesson is called Working with labelled data, part of the R in 3 Months (Spring 2025) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(zip)
library(haven)
library(labelled)
# Download Data -----------------------------------------------------------
# download.file("https://gss.norc.org/content/dam/gss/get-the-data/documents/stata/2022_stata.zip",
# destfile = "data-raw/gss-2022.zip"
# )
#
# unzip(
# zipfile = "data-raw/gss-2022.zip",
# exdir = "data-raw",
# junkpaths = TRUE
# )
# Work with Labelled Data -------------------------------------------------
gss_marital_status <-
read_dta("data-raw/GSS2022.dta") |>
select(marital)
gss_marital_status
gss_marital_status |>
count(marital) |>
ggplot(
aes(
x = n,
y = marital
)
) +
geom_col()
# Convert to Factor -------------------------------------------------------
gss_marital_status |>
generate_dictionary()
gss_marital_status |>
as_factor()
gss_marital_status_factor <-
gss_marital_status |>
as_factor()
gss_marital_status_factor |>
count(marital) |>
ggplot(
aes(
x = n,
y = marital
)
) +
geom_col()
Have any questions? Put them below and we will help you out!
Course Content
127 Lessons
1
Welcome to Getting Started with R
00:57
2
Install R
02:05
3
Install RStudio
02:14
4
Files in R
04:33
5
Projects
07:54
6
Packages
02:38
7
Import Data
05:24
8
Objects and Functions
03:16
9
Examine our Data
12:50
10
Import Our Data Again
07:11
11
Getting Help
07:46
12
Week 1 Live Session (Spring 2025)
1:03:11
1
Welcome to Fundamentals of R
01:36
2
Update Everything
02:45
3
Start a New Project
02:16
4
The Tidyverse
03:34
5
Pipes
04:15
6
select()
07:25
7
mutate()
04:25
8
filter()
10:05
9
summarize()
05:59
10
group_by() and summarize()
05:54
11
arrange()
02:07
12
Create a New Data Frame
03:58
13
Bring it All Together (Data Wrangling)
07:29
14
Week 2 Project Assignment
09:39
15
Week 2 Coworking Session (Spring 2025)
16
Week 2 Live Session (Spring 2025)
1:03:24
1
The Grammar of Graphics
04:39
2
Scatterplots
03:46
3
Histograms
05:47
4
Bar Charts
06:37
5
Setting color and fill Aesthetic Properties
02:39
6
Setting color and fill Scales
05:40
7
Setting x and y Scales
03:09
8
Adding Text to Plots
07:32
9
Plot Labels
03:57
10
Themes
02:19
11
Facets
03:12
12
Save Plots
02:57
13
Bring it All Together (Data Visualization)
06:42
14
Week 3 Project Assignment
03:30
15
Week 3 Coworking Session (Spring 2025)
16
Week 3 Live Session (Spring 2025)
1:02:31
1
Downloading and Importing Data
10:32
2
Overview of Tidy Data
05:50
3
Tidy Data Rule #1: Every Column is a Variable
07:43
4
Tidy Data Rule #3: Every Cell is a Single Value
10:04
5
Tidy Data Rule #2: Every Row is an Observation
04:42
6
Week 6 Coworking Session (Spring 2025)
7
Week 6 Live Session (Spring 2025)
1:02:38
1
Best Practices in Data Visualization
03:44
2
Tidy Data
02:25
3
Pipe Data into ggplot
09:54
4
Reorder Plots to Highlight Findings
03:37
5
Line Charts
04:17
6
Use Color to Highlight Findings
09:16
7
Declutter
08:29
8
Add Descriptive Labels to Your Plots
09:10
9
Use Titles to Highlight Findings
08:14
10
Use Annotations to Explain
07:09
11
Week 9 Coworking Session (Spring 2025)
12
Week 9 Live Session (Spring 2025)
59:09
1
Advanced Markdown
06:43
2
Tables
18:36
3
Advanced YAML and Code Chunk Options
05:53
4
Inline R Code
04:42
5
Making Your Reports Shine: Word Edition
04:30
6
Making Your Reports Shine: PDF Edition
06:11
7
Making Your Reports Shine: HTML Edition
06:06
8
Presentations
10:12
9
Dashboards
05:38
10
Websites
06:43
11
Publishing Your Work
04:38
12
Quarto Extensions
05:50
13
Parameterized Reporting, Part 1
10:57
14
Parameterized Reporting, Part 2
05:11
15
Parameterized Reporting, Part 3
07:47
16
Week 12 Coworking Session (Spring 2025)
17
Week 12 Live Session (Spring 2025)
57:01
You need to be signed-in to comment on this post. Login.
Odile DOREUS • October 3, 2024
Hi David, I was trying to use the "labeled" package with a multiple-choice options survey question and encountered an issue. I realized that I needed to tidy my question first before using the function; otherwise, it would not work. Is there a difference when dealing with a single-choice question versus a multiple-choice one?
David Keyes Founder • October 3, 2024
I'm not really sure because I haven't worked with labelled data much. However, I can imagine that having multiple choice data in a single variable could make the labelled package not work right. Happy to look at an example if you have some code you can share.
Odile DOREUS • October 3, 2024
Here you go David. Thanks.
David Keyes Founder • October 3, 2024
Can you make this fully reproducible so I can run your code? I'd need to have the data importing step work so I can see what your data looks like.
Hilde Karlsen • October 10, 2024
Thank you for taking the time to create this video, David! I have been using both the labelled() and haven() packages before, basically doing that you are showing us here. However, what I would love is to do it the way you do it, not the way I have done it (earlier, using stata). Do you think you could create a similar video, where you show the process you would use, working with the same data and the same variable and then creating a meaningful plot?
David Keyes Founder • October 11, 2024
I'm happy to do this, but I'm not quite sure what you mean. Do you want me to make a different plot than the one I show in the video?
Hilde Karlsen • October 11, 2024
I am sorry for being vague. What I meant is: Can you use data in the form that you would normally use it (i.e. not labelled data from spss/stata etc), and then create a similar plot that you did in the video (i.e. the video where you used the labelled data by transforming the variable into a factor)? What I would love to do is create plots/graphs etc that have meaningful text in it, so that the plot/graph is intuitive for the reader of the report. In order to do that - don´t you often need some type of text string that explains what we are looking at? Even if we don´t call it "a variable label" and "value labels" it is often more meaningful to have variables show up with text (such as in the penguins data, where Islands and Penguins have character string names, they are not calles 1, 2, 3 (for the three different penguin types), for example. I really want to understand what is the best practice of dealing with data in R, and it seems a bit time consuming to always have to specify in a plot that what the x and y labels/values are.
Gracielle Higino Coach • October 11, 2024
If I might chime in here, I think what you'll need is a combination of a tailored function and the use of the
fct()
function. You can have a vector containing your "labels" and assign them to another variable values when mutating them into a factor usingfct()
. We have an extra lesson on factors, we'll put it up so you can take a look! Also, we'll discuss some of that a bit later, after we go through advanced data wrangling and data viz! =DDavid Keyes Founder • October 11, 2024
Gracielle's suggestion is probably your best bet. Since she knows your data too, maybe she can put something together for you based on your data.
Sara Parisi • October 11, 2024
Hi David,
Thanks for recording this extra lesson! Can you add a "View code shown in video" section here so we can copy and paste your code?
David Keyes Founder • October 11, 2024
Oh yes, I meant to do this! Will add it now.
Hilde Karlsen • October 11, 2024
Gracielle, what you are mentioning about using the combination of a tailored function and the use of the fct() function, so that I can have a vector containing your "labels" and assign them to another variable values when mutating them into a factor using fct() - sounds very much what I would like to do. I am really looking forward to understanding how I can do it, because I am able to to somthing similar, but I don´t really understand what is happening under the hood. If you have an extra lesson on factors, I´ll jump right into it and take a look. It will be sooooo cool to be able to do this in a more programmatic fashion, rather than labelling the data in advande (which is what I want to un-learn) :-)
Gracielle Higino Coach • October 12, 2024
Awesome! The extra lesson is already up here: https://rfortherestofus.com/courses/r-in-3-months-fall-2024/lessons/factors
I'll try a couple of things with your data and then we can book a 1:1 to go through it =D