Skip to content
R for the Rest of Us: A Statistics-Free Introduction is out now! Check it out →

Functions

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! There were some changes to R after I made this lesson. If you're having trouble with getting it to work, check out the solutions section for a video that explains what might be going on for you.

Your Turn

Create a function to clean each year of enrollment data, then use `bind_rows()` to bind them together

Arguments you’ll need to use:

• Data year

• Text to remove in the `str_remove()` line

Learn More

The best place to start learning more about creating your own functions is Chapter 19 of R for Data Science. The materials for the Stat 545 course also has a nice section on writing functions, as does this lesson from Kelly Bodwin.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Abby Isaacson

Darn, my race and ethnicity column/variable now lists only NAs, where as the non-function code worked fine:

clean_enrollment_data % select(-contains("percent")) %>% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, r_e_text_remove)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "_american_indian_alaska_native" ~ "AI/AN", race_ethnicity == "_asian" ~ "Asian", race_ethnicity == "_native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "_black_african_american" ~ "Black/African American", race_ethnicity == "_hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "_white" ~ "White", race_ethnicity == "_multiracial" ~ "Multi-racial")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(data_year) }

enrollment_by_race_ethnicity_17_18 <- clean_enrollment_data(data_raw = enrollment_17_18, data_year = "2017-18", r_e_text_remove = "x2017-18")

enrollment_by_race_ethnicity_18_19 <- clean_enrollment_data(data_raw = enrollment_18_19, data_year = "2018-19", r_e_text_remove = "x2018-19")

Abby Isaacson

Yes, the solution was literally an underscore where I had a dash! Check your syntax:)

Megan Ruxton

Having the same issue with NAs, and my syntax seems to be right. I'll post in a couple of comment chunks since there seems to be a limit.

clean_enrollment_data% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students")%>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric((number_of_students))) %>% mutate(race_ethnicity = str_remove(race_ethnicity, remove_text)) %>% mutate(race_ethnicity = case_when(race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial")) %>%

Matt M

Extremely basic question that I'm sure you've explained before but I've forgotten. What keystrokes are you using when you are running code in these videos? Example: in the Solutions video, you say "let me run that" and then run only a few lines of code. How are you doing that (you aren't selecting lines and then clicking Run)?

Related: when David is doing this, he is seeing output (e.g., a tybble) in the Console. What setting needs to be changed for this?

Sara Kidd

Let's say you want to run the function for 20 years worth of data - so you have 20 input tables and have to run the function 20 times. Can you create a function of functions? How would you pass the arguments? I know that I want to replace the year argument with every year between 2000 and 2019 - how does that work? I'm sure it can be done somehow.

JULIO VERA DE LEON

Unfortunately I'm getting the same error with the replace_na() function:

Error in `mutate()`: ! Problem while computing `number_of_students = replace_na(number_of_students, 0)`. Caused by error in `vec_assign()`: ! Can't convert `replace` to match type of `data` .

I'm guessing it has to be related with the readxl package and how it interprets the values of some variables.

The value for 2017-2018 has to be 0, and for the 2018-2019 dataset has to be "0".

Niger Sultana

Hi I have trouble for practicing code, which was posted in the link (https://rfortherestofus.com/2018/09/making-small-multiples-in-r/). I do not know how to download the data and code from github. It might be very silly. But if you could please show a video how to download data and code posted in GitHub for understanding how code works, will be helpful for me.

Cheers Niger

Delia Ayled Serna Guerrero

Hallo! When I try to convert it to function it tells me that there is an error in mutate() that it can't convert 'replace' to match type of "data"

But this worked without problems when not in function form.

Julia Nee

I have two questions (which I'm happy to bring up in an OH or live session, but am writing here so I don't forget):

(1) is it right that when we make a function, we'd never have a character string as an argument? For instance, example_function <- function(raw_data, name_of_column), but not example_function <- function("raw_data", "name_of_column"). If we want to have arguments that run as strings in the code, we'd use "" when we actually introduced those arguments while using the function, right? As in, example_function(raw_data_filename, "This is a column name and it's a string.")?

(2) I tried to add a fourth argument to my function that assigned the function's output to a new named dataframe, but it didn't work. Can you assign something within the function, or is that not possible? Here's what I had tried:

import_enrollment_by_year <- function(data_to_clean, xyear_yr_, year_year, dataframe_name) { dataframe_name % select(-contains(c("percent","grade","kindergarten"))) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students,0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, xyear_yr_)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaskan Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "white" ~ "White", race_ethnicity == "hispanic_latino" ~"Hispanic/Latino", race_ethnicity == "multiracial" ~"Multiracial", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian or Pacific Islander", TRUE ~ race_ethnicity )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(year = year_year) %>% arrange(district_id, race_ethnicity) }

Josh Gutwill

I keep getting this error: Error in FUN(left) : invalid argument to unary operator

Here's my code: clean_enrollment_data % select(!contains("grade")) %>% select(!contains("percent")) %>% select(!contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, string_to_remove)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "Native American or Alaskan", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Hawaiian or Pacific Islander", race_ethnicity == "black_african_american" ~ "Black or African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latinx", race_ethnicity == "white" ~ "White", TRUE ~ "Multiracial")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(year = data_year) }

enrollment_by_race_ethnicity_18_19 < - clean_enrollment_data(raw_data = enrollment_18_19, string_to_remove = "x2018_19_", data_year = "2018-2019")

When I copy and paste David's code, I also get an error, though it's a different one: Error in `mutate()`: ! Problem while computing `number_of_students = replace_na(number_of_students, 0)`. Caused by error in `vec_assign()`: ! Can't convert `replace` to match type of `data` . Run `rlang::last_error()` to see where the error occurred.

Kirstin O'Dell

I thought in a prior video we were told to tidy our data in script and only use markdown for the output/reporting we want to do. I'm seeing that for this we're using markdown to tidy the data inside of the function. I'm confused as to which to be using. Can functions created in script be used in markdown files?

Andrew Paquin

Hi David, I watched the "Heads Up" video above. If I understand it correctly, the function we created for the 18-19 data can't be used for the 17-18 data because of 1) the changes to the na_if command, and 2) the fact that the columns are in different formats (dbl and chr) in the two datasets. Is that what's happening?

Kiana Robinson

What does this line of code do?

race_ethnicity_remove_text = "x2018_19_"

Why was it included in the function statement? This is confusing.

Zain Asaf

Hi Charlie and Dan, I am having the same issues as a couple of people with the replace_na line:

I get the following message Error in `mutate()`: ℹ In argument: `number_of_students = replace_na(number_of_students, "0")`. Caused by error in `vec_assign()`: ! Can't convert `replace` to match type of `data` .

However, I have used the following code for that line: mutate(number_of_students = replace_na(number_of_students, "0")) i.e. I have put the "0" in quotation marks, as suggested. Here is the code I have used. Note, I just labeled the column "ethnicity" not "race_ethnicity" clean_enrollment_data%
select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-" )) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, "0")) %>% mutate(ethnicity = str_remove(ethnicity, ethnicity_remove_text)) %>% mutate(ethnicity = case_when( ethnicity == "american_indian_alaska_native" ~ "native american", ethnicity == "native_hawaiian_pacific_islander" ~"pacific islander", ethnicity == "hispanic_latino" ~ "latino", ethnicity == "black_african_american" ~ "african american", ethnicity == "white" ~ "white", ethnicity == "asian" ~ "Asian", ethnicity == "multiracial" ~ "Multiracial" )) %>% group_by(district_id)%>% mutate(pct = number_of_students / sum(number_of_students)*100) %>% ungroup() %>% mutate(year = "data_year")