Skip to content
R for the Rest of Us Logo

Quick Interlude to Reorganize our Code

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Reorganize your code so that you only create the enrollment_by_race_ethnicity data frame in one place.

Learn More

I haven’t found many resources that give recommendations for organizing code. I think it’s a) idiosyncratic to individuals, and b) the kind of thing that people who have used R for a while do without even thinking about it. The one resource I’ve found is called R Best Practices by Krista DeStasio.

My general practice is this:

Load packages at the top of my files. This ensures that you have access to all functions throughout your files.

Only create objects once. This avoids the issue we encountered where you don’t know what state your object is in.

Create as few objects as possible. I’ve found that by doing all of my data cleaning and tidying before beginning analysis enables me to create just a few objects, which I can then easily manipulate with a few lines of code to show a wide range of results.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

you said that you don't "need" to include enrollment_by_race_ethnicity as the x in left_join() but when I try to include it, the code does not run. as soon as I deleted it, it runs. Is that we cannot include it here because it has not been created yet?

Jordan Helms

Jordan Helms

May 10, 2022

I ran into this issue during the Functions lesson. When I try to run the code, even copy what's in the solution, I get this error message:

"Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, 0). Caused by error in vec_assign(): ! Can't convert replace to match type of data ."

It happens with the 2018-2019 data. This section: "enrollment_by_race_ethnicity_18_19 <- clean_enrollment_data(raw_data = enrollment_18_19, data_year = "2018-2019", race_ethnicity_remove_text = "x2018_19_")"

I can run the code with no problems with the 2017-2018 data. When I don't use the function and do two separate code chunks, I don't get the error message. Unsure what's going on.

Charlie Hadley

Charlie Hadley

May 10, 2022

Hello Jordan,

This is due to change in how replace_na() works, and David will be adding an update to the course soon about this.

To fix this issue you need to ensure that replace_na() is inserting the same type of data as currently in the column. Because the column is a character column at the time the code is run you'll need to wrap the 0 in quotation marks, eg

replace_na(number_of_students, "0")

In case you’re interested, you can see the documentation for this change in the NEWS.md file for the package. But please note the language used here is quite technical.

Cheers,

Charlie

Jordan Helms

Jordan Helms

May 11, 2022

Hi Charlie, I'm still getting the same error message even with the brackets. Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, "0"). Caused by error in vec_assign(): ! Can't convert replace to match type of data .

link to code on github: https://github.com/jhelms345/jdh_rin3_spring2022/blob/master/R%20in%203%20Week%207%20example.R

Jordan Helms

Jordan Helms

May 11, 2022

I meant quotation marks.

Charlie Hadley

Charlie Hadley

May 11, 2022

Oh - boo! It's because in one of the datasets the column is parsed as a character and in the other a numeric.

The solution that is most consistent with the goal of this lesson is to convert the column into a character immediately beforehand like so:

mutate(number_of_students = as.character(number_of_students),
           number_of_students = replace_na(number_of_students, "0"))

Cheers,

Charlie

Niger Sultana

Niger Sultana

May 17, 2022

Hi I know David sent us the solution of debugging (replace _na), I cannot find the solution about code below, sorry probably lost the e-mail, could you please help me to show where is this to solve the code?

clean_enrollment_data %

  •   select(-contains("grade")) %&gt;% 
    
  •   select(-contains("kindergarten")) %&gt;% 
    
  •   select(-contains("percent")) %&gt;% 
    
  •   pivot_longer(cols = -district_id,
    
  •                names_to = "race_ethnicity",
    
  •                values_to = "number_of_students") %&gt;% 
    
  •   mutate(number_of_students = na_if(number_of_students, "-")) %&gt;% 
    
  •   mutate(number_of_students = as.character(number_of_students),
    
  •   number_of_students = replace_na(number_of_students, "0"))
    
  •   mutate(number_of_students = as.numeric(number_of_students)) %&gt;% 
    
  •   mutate(race_ethnicity = str_remove(race_ethnicity, race_ethnicity_remove_text)) %&gt;% 
    
  •   mutate(race_ethnicity = case_when(
    
  •     race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
    
  •     race_ethnicity == "asian" ~ "Asian",
    
  •     race_ethnicity == "black_african_american" ~ "Black/African American",
    
  •     race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
    
  •     race_ethnicity == "multiracial" ~ "Multi-Racial",
    
  •     race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander",
    
  •     race_ethnicity == "white" ~ "White"
    
  •   )) %&gt;% 
    
  •   group_by(district_id) %&gt;% 
    
  •   mutate(pct = number_of_students / sum(number_of_students)) %&gt;% 
    
  •   ungroup() %&gt;% 
    
  •   mutate(year = data_year)
    
  • } > View(clean_enrollment_data) > enrollment_by_race_ethnicity_18_19 <- clean_enrollment_data(raw_data = enrollment_18_19,
  •                                                           data_year = &quot;2018-2019&quot;,
    
  •                                                           race_ethnicity_remove_text = &quot;x2018_19_&quot;)
    

Error in mutate(number_of_students = as.numeric(number_of_students)) : object 'number_of_students' not found

Charlie Hadley

Charlie Hadley

May 18, 2022

Hello Niger,

When copying code please try not to copy code from the console as it's difficult to understand which part of the code you're having difficulty with, and it adds all those + characters you can see.

It looks like you've made a mistake when you start to create your function. You are piping "clean_enrollment_data" into select() but that's the name of the function you're trying to create instead of the name of an object you've created. This code will work:

library(tidyverse)
library(readxl)
library(janitor)


enrollment_18_19 <- read_excel(path = "data-raw/enrollment-18-19.xlsx")

enrollment_17_18 <- read_excel(path = "data-raw/enrollment-17-18.xlsx")

clean_enrollment_data <- function(raw_data, data_year, race_ethnicity_remove_text) {
  raw_data %>% 
    select(-contains("grade")) %>% 
    select(-contains("kindergarten")) %>% 
    select(-contains("percent")) %>% 
    pivot_longer(cols = -district_id,
                 names_to = "race_ethnicity",
                 values_to = "number_of_students") %>% 
    mutate(number_of_students = na_if(number_of_students, "-")) %>% 
    mutate(number_of_students = as.character(number_of_students)) %>% 
    mutate(number_of_students = replace_na(number_of_students, "0")) %>% 
    mutate(number_of_students = as.numeric(number_of_students)) %>% 
    mutate(race_ethnicity = str_remove(race_ethnicity, race_ethnicity_remove_text)) %>% 
    mutate(race_ethnicity = case_when(
      race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
      race_ethnicity == "asian" ~ "Asian",
      race_ethnicity == "black_african_american" ~ "Black/African American",
      race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
      race_ethnicity == "multiracial" ~ "Multi-Racial",
      race_ethnicity == "native_hawaiian_pacific_islander" ~ "Pacific Islander",
      race_ethnicity == "white" ~ "White"
    )) %>% 
    group_by(district_id) %>% 
    mutate(pct = number_of_students / sum(number_of_students)) %>% 
    ungroup() %>% 
    mutate(year = data_year)
}


clean_enrollment_data(
  raw_data = enrollment_18_19,
  data_year = "2018-2019",
  race_ethnicity_remove_text = "x2018_19_"
)