Functions
This lesson is called Functions, part of the R in 3 Months (Fall 2022) course. This lesson is called Functions, part of the R in 3 Months (Fall 2022) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Heads up! There were some changes to R after I made this lesson. If you're having trouble with getting it to work, check out the solutions section for a video that explains what might be going on for you.
Your Turn
Create a function to clean each year of enrollment data, then use bind_rows()
to bind them together
Arguments you’ll need to use:
Data year
Text to remove in the
str_remove()
line
Learn More
The best place to start learning more about creating your own functions is Chapter 19 of R for Data Science. The materials for the Stat 545 course also has a nice section on writing functions, as does this lesson from Kelly Bodwin.
Have any questions? Put them below and we will help you out!
Course Content
142 Lessons
You need to be signed-in to comment on this post. Login.
Abby Isaacson • April 25, 2021
Darn, my race and ethnicity column/variable now lists only NAs, where as the non-function code worked fine:
clean_enrollment_data % select(-contains("percent")) %>% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, r_e_text_remove)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "_american_indian_alaska_native" ~ "AI/AN", race_ethnicity == "_asian" ~ "Asian", race_ethnicity == "_native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "_black_african_american" ~ "Black/African American", race_ethnicity == "_hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "_white" ~ "White", race_ethnicity == "_multiracial" ~ "Multi-racial")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(data_year) }
enrollment_by_race_ethnicity_17_18 <- clean_enrollment_data(data_raw = enrollment_17_18, data_year = "2017-18", r_e_text_remove = "x2017-18")
enrollment_by_race_ethnicity_18_19 <- clean_enrollment_data(data_raw = enrollment_18_19, data_year = "2018-19", r_e_text_remove = "x2018-19")
Abby Isaacson • April 29, 2021
Yes, the solution was literally an underscore where I had a dash! Check your syntax:)
Megan Ruxton • April 30, 2021
Having the same issue with NAs, and my syntax seems to be right. I'll post in a couple of comment chunks since there seems to be a limit.
clean_enrollment_data% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students")%>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric((number_of_students))) %>% mutate(race_ethnicity = str_remove(race_ethnicity, remove_text)) %>% mutate(race_ethnicity = case_when(race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial")) %>%
Matt M • December 2, 2021
Extremely basic question that I'm sure you've explained before but I've forgotten. What keystrokes are you using when you are running code in these videos? Example: in the Solutions video, you say "let me run that" and then run only a few lines of code. How are you doing that (you aren't selecting lines and then clicking Run)?
Related: when David is doing this, he is seeing output (e.g., a tybble) in the Console. What setting needs to be changed for this?
Sara Kidd • February 15, 2022
Let's say you want to run the function for 20 years worth of data - so you have 20 input tables and have to run the function 20 times. Can you create a function of functions? How would you pass the arguments? I know that I want to replace the year argument with every year between 2000 and 2019 - how does that work? I'm sure it can be done somehow.
JULIO VERA DE LEON • April 30, 2022
Unfortunately I'm getting the same error with the replace_na() function:
Error in
mutate()
: ! Problem while computingnumber_of_students = replace_na(number_of_students, 0)
. Caused by error invec_assign()
: ! Can't convertreplace
to match type ofdata
.I'm guessing it has to be related with the readxl package and how it interprets the values of some variables.
The value for 2017-2018 has to be 0, and for the 2018-2019 dataset has to be "0".
Niger Sultana • May 6, 2022
Hi I have trouble for practicing code, which was posted in the link (https://rfortherestofus.com/2018/09/making-small-multiples-in-r/). I do not know how to download the data and code from github. It might be very silly. But if you could please show a video how to download data and code posted in GitHub for understanding how code works, will be helpful for me.
Cheers Niger
Delia Ayled Serna Guerrero • May 14, 2022
Hallo! When I try to convert it to function it tells me that there is an error in mutate() that it can't convert 'replace' to match type of "data"
But this worked without problems when not in function form.
Julia Nee • November 4, 2022
I have two questions (which I'm happy to bring up in an OH or live session, but am writing here so I don't forget):
(1) is it right that when we make a function, we'd never have a character string as an argument? For instance, example_function <- function(raw_data, name_of_column), but not example_function <- function("raw_data", "name_of_column"). If we want to have arguments that run as strings in the code, we'd use "" when we actually introduced those arguments while using the function, right? As in, example_function(raw_data_filename, "This is a column name and it's a string.")?
(2) I tried to add a fourth argument to my function that assigned the function's output to a new named dataframe, but it didn't work. Can you assign something within the function, or is that not possible? Here's what I had tried:
import_enrollment_by_year <- function(data_to_clean, xyear_yr_, year_year, dataframe_name) { dataframe_name % select(-contains(c("percent","grade","kindergarten"))) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students,0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, xyear_yr_)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaskan Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "white" ~ "White", race_ethnicity == "hispanic_latino" ~"Hispanic/Latino", race_ethnicity == "multiracial" ~"Multiracial", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian or Pacific Islander", TRUE ~ race_ethnicity )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(year = year_year) %>% arrange(district_id, race_ethnicity) }
Josh Gutwill • November 8, 2022
I keep getting this error: Error in FUN(left) : invalid argument to unary operator
Here's my code: clean_enrollment_data % select(!contains("grade")) %>% select(!contains("percent")) %>% select(!contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, string_to_remove)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "Native American or Alaskan", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Hawaiian or Pacific Islander", race_ethnicity == "black_african_american" ~ "Black or African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latinx", race_ethnicity == "white" ~ "White", TRUE ~ "Multiracial")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(year = data_year) }
enrollment_by_race_ethnicity_18_19 < - clean_enrollment_data(raw_data = enrollment_18_19, string_to_remove = "x2018_19_", data_year = "2018-2019")
When I copy and paste David's code, I also get an error, though it's a different one: Error in
mutate()
: ! Problem while computingnumber_of_students = replace_na(number_of_students, 0)
. Caused by error invec_assign()
: ! Can't convertreplace
to match type ofdata
. Runrlang::last_error()
to see where the error occurred.Kirstin O'Dell • November 15, 2022
I thought in a prior video we were told to tidy our data in script and only use markdown for the output/reporting we want to do. I'm seeing that for this we're using markdown to tidy the data inside of the function. I'm confused as to which to be using. Can functions created in script be used in markdown files?
Andrew Paquin • April 30, 2023
Hi David, I watched the "Heads Up" video above. If I understand it correctly, the function we created for the 18-19 data can't be used for the 17-18 data because of 1) the changes to the na_if command, and 2) the fact that the columns are in different formats (dbl and chr) in the two datasets. Is that what's happening?
Kiana Robinson • May 17, 2023
What does this line of code do?
race_ethnicity_remove_text = "x2018_19_"
Why was it included in the function statement? This is confusing.
Zain Asaf • May 22, 2023
Hi Charlie and Dan, I am having the same issues as a couple of people with the replace_na line:
I get the following message Error in
mutate()
: ℹ In argument:number_of_students = replace_na(number_of_students, "0")
. Caused by error invec_assign()
: ! Can't convertreplace
to match type ofdata
.However, I have used the following code for that line: mutate(number_of_students = replace_na(number_of_students, "0")) i.e. I have put the "0" in quotation marks, as suggested. Here is the code I have used. Note, I just labeled the column "ethnicity" not "race_ethnicity" clean_enrollment_data%
select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-" )) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, "0")) %>% mutate(ethnicity = str_remove(ethnicity, ethnicity_remove_text)) %>% mutate(ethnicity = case_when( ethnicity == "american_indian_alaska_native" ~ "native american", ethnicity == "native_hawaiian_pacific_islander" ~"pacific islander", ethnicity == "hispanic_latino" ~ "latino", ethnicity == "black_african_american" ~ "african american", ethnicity == "white" ~ "white", ethnicity == "asian" ~ "Asian", ethnicity == "multiracial" ~ "Multiracial" )) %>% group_by(district_id)%>% mutate(pct = number_of_students / sum(number_of_students)*100) %>% ungroup() %>% mutate(year = "data_year")