Dealing with Missing Data
This lesson is called Dealing with Missing Data, part of the Going Deeper with R course. This lesson is called Dealing with Missing Data, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Heads up! The code that I demonstrate in the video worked when I wrote it, but no longer does. The reason why is a bit complicated so I recommend watching both videos below to understand why this happened (info on the change to tidyr
that I mention in the video is here).
Your Turn
Convert all of the missing values in the
number_of_students
variable to NA usingna_if()
Convert all of the NA values you just made to 0 using
replace_na()
.
Learn More
The best place to learn about replace_na()
is on the tidyr
website, which has an excellent documentation page about the function.na_if()
comes from the dplyr
package so check it out there.
I referenced the read_csv()
function having an na argument early on in the video. The read_excel()
function has an na argument as well. The na arguments in these two functions can help you deal with missing data on import. However, replace_na()
and na_if()
are good to know because you don’t always have this option!
If you want to go deep on exploring missing data in your datasets in R, there is a package called naniar
that will help. Allison Horst also has a really nice tutorial on using it here.
Have any questions? Put them below and we will help you out!
Course Content
44 Lessons
You need to be signed-in to comment on this post. Login.
Erin Guthrie • April 20, 2021
Anyone have a solution for the following error message? I know I am likely doing something very stupid... Error in length_x %in% c(1L, n) : object 'number_of_students' not found
Abby Isaacson • April 21, 2021
Why is the last mutate line in your code added in the solution code (not the video)? What would that do? Does this have to do with making sure the variable is numeric?
Oindrila Bhattacharyya • April 22, 2021
I received the same error message as Erin. However, when I ran the codes from beginning till the end selecting all together, it worked fine!
Louise Blight • May 7, 2021
Looks like the last line of code here belongs in the next lesson instead: mutate(number_of_students = as.numeric(number_of_students))
Elan Sykes • December 28, 2021
is there a data maintenance reason not to replace all "-" with 0s directly in 1 step instead of 2? Or is it code legibility?
Carolyn Ford • April 22, 2022
For line 9 - when I put in the number 0 I get error message: Error in
mutate()
: ! Problem while computingnumber_of_students = replace_na(number_of_students, 0)
. Caused by error invec_assign()
: ! Can't convertreplace
to match type ofdata
. Runrlang::last_error()
to see where the error occurred.But when I offset the 0 as a character, everything works: enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = replace_na(number_of_students, "0"))
Josh Gutwill • November 4, 2022
Why replace the dashes with NAs and then replace the NAs with zeros? Why not skip a step and replace the dashes with zeros? Is there no way to find/replace in the values of a variable?