Skip to content
R for the Rest of Us Logo

Dealing with Missing Data

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! The code that I demonstrate in the video worked when I wrote it, but no longer does. The reason why is a bit complicated so I recommend watching both videos below to understand why this happened (info on the change to tidyr that I mention in the video is here).

Your Turn

  1. Convert all of the missing values in the number_of_students variable to NA using na_if()

  2. Convert all of the NA values you just made to 0 using replace_na().

Learn More

The best place to learn about replace_na() is on the tidyr website, which has an excellent documentation page about the function.na_if() comes from the dplyr package so check it out there.

I referenced the read_csv() function having an na argument early on in the video. The read_excel() function has an na argument as well. The na arguments in these two functions can help you deal with missing data on import. However, replace_na() and na_if() are good to know because you don’t always have this option!

If you want to go deep on exploring missing data in your datasets in R, there is a package called naniar that will help. Allison Horst also has a really nice tutorial on using it here.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Erin Guthrie

Erin Guthrie

April 20, 2021

Anyone have a solution for the following error message? I know I am likely doing something very stupid... Error in length_x %in% c(1L, n) : object 'number_of_students' not found

David Keyes

David Keyes

April 20, 2021

Can you identify where in your pipeline you're getting this error? My go-to way to debug is to run it line by line until I figure out where the issue is.

Erin Guthrie

Erin Guthrie

April 20, 2021

Interestingly, when I do as you say and run it line by line, I do not get the error?! Is that a thing? It was coming after the na_if line:
mutate(number_of_students = na_if(number_of_students, "-")) Error in length_x %in% c(1L, n) : object 'number_of_students' not found

Abby Isaacson

Abby Isaacson

April 21, 2021

Why is the last mutate line in your code added in the solution code (not the video)? What would that do? Does this have to do with making sure the variable is numeric?

Abby Isaacson

Abby Isaacson

April 21, 2021

Ah never mind this is addressed in the next lesson!

Oindrila Bhattacharyya

Oindrila Bhattacharyya

April 22, 2021

I received the same error message as Erin. However, when I ran the codes from beginning till the end selecting all together, it worked fine!

Louise Blight

Louise Blight

May 6, 2021

Looks like the last line of code here belongs in the next lesson instead: mutate(number_of_students = as.numeric(number_of_students))

David Keyes

David Keyes

May 7, 2021

Fixed, thanks!

is there a data maintenance reason not to replace all "-" with 0s directly in 1 step instead of 2? Or is it code legibility?

Good question! I did it this way to demonstrate both the na_if() and replace_na() functions, but you could totally do it in 1 step.

Carolyn Ford

Carolyn Ford

April 22, 2022

For line 9 - when I put in the number 0 I get error message: Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, 0). Caused by error in vec_assign(): ! Can't convert replace to match type of data . Run rlang::last_error() to see where the error occurred.

But when I offset the 0 as a character, everything works: enrolment_by_race_ethnicity % select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students,"-")) %>% mutate(number_of_students = replace_na(number_of_students, "0"))

JULIO VERA DE LEON

JULIO VERA DE LEON

April 22, 2022

I'm getting the same problem.

Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, 0). Caused by error in vec_assign(): ! Can't convert replace to match type of data .

David Keyes

David Keyes

April 22, 2022

Sorry about that! Just added this note to this lesson:

Heads up! The code that I demonstrate in the video apparently worked when I wrote it, but no longer does. I'm not sure why. 🤷🏽 In any case, you'll need to put the 0 in the last line in quotes to get your code to work. The gist below has been updated.

JULIO VERA DE LEON

JULIO VERA DE LEON

April 22, 2022

Thanks David.

David Keyes

David Keyes

April 22, 2022

Sorry about that! Just added this note to this lesson:

Heads up! The code that I demonstrate in the video apparently worked when I wrote it, but no longer does. I'm not sure why. 🤷🏽 In any case, you'll need to put the 0 in the last line in quotes to get your code to work. The gist below has been updated.

Josh Gutwill

Josh Gutwill

November 3, 2022

Why replace the dashes with NAs and then replace the NAs with zeros? Why not skip a step and replace the dashes with zeros? Is there no way to find/replace in the values of a variable?

Josh Gutwill

Josh Gutwill

November 3, 2022

Now that I've watched Advanced Variable Creation, I see that there are lots of ways to do this! So no need to answer my question above.

David Keyes

David Keyes

November 3, 2022

No worries! You're right that what I did is sort of duplicative, but I did it for teaching purposes. And yes, you're right: lots of ways to do things in R!