Get access to all lessons in this course.
-
Advanced Data Wrangling and Analysis
- Overview
- Importing Data
- Tidy Data
- Reshaping Data
- Dealing with Missing Data
- Changing Variable Types
- Advanced Variable Creation
- Advanced Summarizing
- Binding Data Frames
- Functions
- Merging Data
- Renaming Variables
- Quick Interlude to Reorganize our Code
- Exporting Data
-
Advanced Data Visualization
- Data Visualization Best Practices
- Tidy Data
- Pipe Data Into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Use the scales Package for Nicely Formatted Values
- Use Direct Labeling
- Use Axis Text Wisely
- Use Titles to Highlight Findings
- Use Color in Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Customize Your Theme
- Customize Your Fonts
- Try New Plot Types
-
Advanced RMarkdown
- Advanced Markdown Text Formatting
- Tables
- Advanced YAML
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: HTML Edition
- Making Your Reports Shine: PDF Edition
- Presentations
- Dashboards
- Other Formats
-
Wrapping Up
- You Did It!
Going Deeper with R
Dealing with Missing Data
This lesson is locked
This lesson is called Dealing with Missing Data, part of the Going Deeper with R course. This lesson is called Dealing with Missing Data, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Heads up! The code that I demonstrate in the video worked when I wrote it, but no longer does. The reason why is a bit complicated so I recommend watching both videos below to understand why this happened (info on the change to tidyr
that I mention in the video is here).
Your Turn
Convert all of the missing values in the
number_of_students
variable to NA usingna_if()
Convert all of the NA values you just made to 0 using
replace_na()
.
Learn More
The best place to learn about replace_na()
is on the tidyr
website, which has an excellent documentation page about the function.na_if()
comes from the dplyr
package so check it out there.
I referenced the read_csv()
function having an na argument early on in the video. The read_excel()
function has an na argument as well. The na arguments in these two functions can help you deal with missing data on import. However, replace_na()
and na_if()
are good to know because you don’t always have this option!
If you want to go deep on exploring missing data in your datasets in R, there is a package called naniar
that will help. Allison Horst also has a really nice tutorial on using it here.
You need to be signed-in to comment on this post. Login.
Erin Guthrie
April 20, 2021
Anyone have a solution for the following error message? I know I am likely doing something very stupid... Error in length_x %in% c(1L, n) : object 'number_of_students' not found
David Keyes
April 20, 2021
Can you identify where in your pipeline you're getting this error? My go-to way to debug is to run it line by line until I figure out where the issue is.
Erin Guthrie
April 20, 2021
Interestingly, when I do as you say and run it line by line, I do not get the error?! Is that a thing? It was coming after the na_if line:
mutate(number_of_students = na_if(number_of_students, "-")) Error in length_x %in% c(1L, n) : object 'number_of_students' not found
Abby Isaacson
April 21, 2021
Why is the last mutate line in your code added in the solution code (not the video)? What would that do? Does this have to do with making sure the variable is numeric?
Abby Isaacson
April 21, 2021
Ah never mind this is addressed in the next lesson!
Oindrila Bhattacharyya
April 22, 2021
I received the same error message as Erin. However, when I ran the codes from beginning till the end selecting all together, it worked fine!
Louise Blight
May 6, 2021
Looks like the last line of code here belongs in the next lesson instead: mutate(number_of_students = as.numeric(number_of_students))
David Keyes
May 7, 2021
Fixed, thanks!
Elan Sykes
December 27, 2021
is there a data maintenance reason not to replace all "-" with 0s directly in 1 step instead of 2? Or is it code legibility?
David Keyes
December 28, 2021
Good question! I did it this way to demonstrate both the
na_if()
andreplace_na()
functions, but you could totally do it in 1 step.Carolyn Ford
April 22, 2022
For line 9 - when I put in the number 0 I get error message: Error in
mutate()
: ! Problem while computingnumber_of_students = replace_na(number_of_students, 0)
. Caused by error invec_assign()
: ! Can't convertreplace
to match type ofdata
. Runrlang::last_error()
to see where the error occurred.But when I offset the 0 as a character, everything works: enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = replace_na(number_of_students, "0"))
JULIO VERA DE LEON
April 22, 2022
I'm getting the same problem.
Error in
mutate()
: ! Problem while computingnumber_of_students = replace_na(number_of_students, 0)
. Caused by error invec_assign()
: ! Can't convertreplace
to match type ofdata
.David Keyes
April 22, 2022
Sorry about that! Just added this note to this lesson:
JULIO VERA DE LEON
April 22, 2022
Thanks David.
David Keyes
April 22, 2022
Sorry about that! Just added this note to this lesson:
Josh Gutwill
November 3, 2022
Why replace the dashes with NAs and then replace the NAs with zeros? Why not skip a step and replace the dashes with zeros? Is there no way to find/replace in the values of a variable?
Josh Gutwill
November 3, 2022
Now that I've watched Advanced Variable Creation, I see that there are lots of ways to do this! So no need to answer my question above.
David Keyes
November 3, 2022
No worries! You're right that what I did is sort of duplicative, but I did it for teaching purposes. And yes, you're right: lots of ways to do things in R!