Get access to all lessons in this course.
Going Deeper with R
Advanced Data Wrangling and Analysis
- Importing Data
- Tidy Data
- Reshaping Data
- Dealing with Missing Data
- Changing Variable Types
- Advanced Variable Creation
- Advanced Summarizing
- Binding Data Frames
- Merging Data
- Renaming Variables
- Quick Interlude to Reorganize our Code
- Exporting Data
Advanced Data Visualization
- Data Visualization Best Practices
- Tidy Data
- Pipe Data Into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Use the scales Package for Nicely Formatted Values
- Use Direct Labeling
- Use Axis Text Wisely
- Use Titles to Highlight Findings
- Use Color in Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Customize Your Theme
- Customize Your Fonts
- Try New Plot Types
- Advanced Markdown Text Formatting
- Advanced YAML
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: HTML Edition
- Making Your Reports Shine: PDF Edition
- Other Formats
- You Did It!
Dealing with Missing Data
This lesson is locked
This lesson is called Dealing with Missing Data, part of the Going Deeper with R course. This lesson is called Dealing with Missing Data, part of the Going Deeper with R course.
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Heads up! The code that I demonstrate in the video worked when I wrote it, but no longer does. The reason why is a bit complicated so I recommend watching both videos below to understand why this happened (info on the change to
tidyr that I mention in the video is here).
FYI, I’ve updated the solutions code to reflect what I demonstrated in the update video above. However, I’m not going to update all of the solution code on subsequent lessons. If you get an error about data types, follow the approach laid out in the update video (that is, convert your
number_of_students variable to numeric before attempting to use the
replace_na() function. If you have any questions, please ask them below the lesson!
enrollment_by_race_ethnicity_18_19 <- enrollment_18_19 %>% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0))
Convert all of the missing values in the
number_of_studentsvariable to NA using
Convert all of the NA values you just made to 0 using
I referenced the
read_csv() function having an na argument early on in the video. The
read_excel() function has an na argument as well. The na arguments in these two functions can help you deal with missing data on import. However,
na_if() are good to know because you don’t always have this option!