Skip to content
R for the Rest of Us Logo

Dealing with Missing Data

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! The code that I demonstrate in the video worked when I wrote it, but no longer does. The reason why is a bit complicated so I recommend watching both videos below to understand why this happened (info on the change to tidyr that I mention in the video is here).

Your Turn

  1. Convert all of the missing values in the number_of_students variable to NA using na_if()

  2. Convert all of the NA values you just made to 0 using replace_na().

Learn More

The best place to learn about replace_na() is on the tidyr website, which has an excellent documentation page about the function.na_if() comes from the dplyr package so check it out there.

I referenced the read_csv() function having an na argument early on in the video. The read_excel() function has an na argument as well. The na arguments in these two functions can help you deal with missing data on import. However, replace_na() and na_if() are good to know because you don’t always have this option!

If you want to go deep on exploring missing data in your datasets in R, there is a package called naniar that will help. Allison Horst also has a really nice tutorial on using it here.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Erin Guthrie

Erin Guthrie

April 20, 2021

Anyone have a solution for the following error message? I know I am likely doing something very stupid... Error in length_x %in% c(1L, n) : object 'number_of_students' not found

Abby Isaacson

Abby Isaacson

April 21, 2021

Why is the last mutate line in your code added in the solution code (not the video)? What would that do? Does this have to do with making sure the variable is numeric?

Oindrila Bhattacharyya

Oindrila Bhattacharyya

April 22, 2021

I received the same error message as Erin. However, when I ran the codes from beginning till the end selecting all together, it worked fine!

Louise Blight

Louise Blight

May 7, 2021

Looks like the last line of code here belongs in the next lesson instead: mutate(number_of_students = as.numeric(number_of_students))

is there a data maintenance reason not to replace all "-" with 0s directly in 1 step instead of 2? Or is it code legibility?

Carolyn Ford

Carolyn Ford

April 22, 2022

For line 9 - when I put in the number 0 I get error message: Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, 0). Caused by error in vec_assign(): ! Can't convert replace to match type of data . Run rlang::last_error() to see where the error occurred.

But when I offset the 0 as a character, everything works: enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = replace_na(number_of_students, "0"))

Josh Gutwill

Josh Gutwill

November 4, 2022

Why replace the dashes with NAs and then replace the NAs with zeros? Why not skip a step and replace the dashes with zeros? Is there no way to find/replace in the values of a variable?