Skip to content
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Heads up! There were some changes to R after I made this lesson. If you're having trouble with getting it to work, check out the solutions section for a video that explains what might be going on for you.

Your Turn

Create a function to clean each year of enrollment data, then use bind_rows() to bind them together

Arguments you’ll need to use:

  • Data year

  • Text to remove in the str_remove() line

Learn More

The best place to start learning more about creating your own functions is Chapter 19 of R for Data Science. The materials for the Stat 545 course also has a nice section on writing functions, as does this lesson from Kelly Bodwin.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Abby Isaacson

Abby Isaacson

April 25, 2021

Darn, my race and ethnicity column/variable now lists only NAs, where as the non-function code worked fine:

clean_enrollment_data % select(-contains("percent")) %>% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, r_e_text_remove)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "_american_indian_alaska_native" ~ "AI/AN", race_ethnicity == "_asian" ~ "Asian", race_ethnicity == "_native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "_black_african_american" ~ "Black/African American", race_ethnicity == "_hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "_white" ~ "White", race_ethnicity == "_multiracial" ~ "Multi-racial")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(data_year) }

enrollment_by_race_ethnicity_17_18 <- clean_enrollment_data(data_raw = enrollment_17_18, data_year = "2017-18", r_e_text_remove = "x2017-18")

enrollment_by_race_ethnicity_18_19 <- clean_enrollment_data(data_raw = enrollment_18_19, data_year = "2018-19", r_e_text_remove = "x2018-19")

Abby Isaacson

Abby Isaacson

April 29, 2021

Yes, the solution was literally an underscore where I had a dash! Check your syntax:)

Megan Ruxton

Megan Ruxton

April 30, 2021

Having the same issue with NAs, and my syntax seems to be right. I'll post in a couple of comment chunks since there seems to be a limit.

clean_enrollment_data% select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students")%>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(number_of_students = as.numeric((number_of_students))) %>% mutate(race_ethnicity = str_remove(race_ethnicity, remove_text)) %>% mutate(race_ethnicity = case_when(race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaska Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian/Pacific Islander", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino", race_ethnicity == "white" ~ "White", race_ethnicity == "multiracial" ~ "Multiracial")) %>%

Extremely basic question that I'm sure you've explained before but I've forgotten. What keystrokes are you using when you are running code in these videos? Example: in the Solutions video, you say "let me run that" and then run only a few lines of code. How are you doing that (you aren't selecting lines and then clicking Run)?

Related: when David is doing this, he is seeing output (e.g., a tybble) in the Console. What setting needs to be changed for this?

Let's say you want to run the function for 20 years worth of data - so you have 20 input tables and have to run the function 20 times. Can you create a function of functions? How would you pass the arguments? I know that I want to replace the year argument with every year between 2000 and 2019 - how does that work? I'm sure it can be done somehow.

JULIO VERA DE LEON

JULIO VERA DE LEON

April 30, 2022

Unfortunately I'm getting the same error with the replace_na() function:

Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, 0). Caused by error in vec_assign(): ! Can't convert replace to match type of data .

I'm guessing it has to be related with the readxl package and how it interprets the values of some variables.

The value for 2017-2018 has to be 0, and for the 2018-2019 dataset has to be "0".

Niger Sultana

Niger Sultana

May 6, 2022

Hi I have trouble for practicing code, which was posted in the link (https://rfortherestofus.com/2018/09/making-small-multiples-in-r/). I do not know how to download the data and code from github. It might be very silly. But if you could please show a video how to download data and code posted in GitHub for understanding how code works, will be helpful for me.

Cheers Niger

Delia Ayled Serna Guerrero

Delia Ayled Serna Guerrero

May 14, 2022

Hallo! When I try to convert it to function it tells me that there is an error in mutate() that it can't convert 'replace' to match type of "data"

But this worked without problems when not in function form.

I have two questions (which I'm happy to bring up in an OH or live session, but am writing here so I don't forget):

(1) is it right that when we make a function, we'd never have a character string as an argument? For instance, example_function <- function(raw_data, name_of_column), but not example_function <- function("raw_data", "name_of_column"). If we want to have arguments that run as strings in the code, we'd use "" when we actually introduced those arguments while using the function, right? As in, example_function(raw_data_filename, "This is a column name and it's a string.")?

(2) I tried to add a fourth argument to my function that assigned the function's output to a new named dataframe, but it didn't work. Can you assign something within the function, or is that not possible? Here's what I had tried:

import_enrollment_by_year <- function(data_to_clean, xyear_yr_, year_year, dataframe_name) { dataframe_name % select(-contains(c("percent","grade","kindergarten"))) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students,0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, xyear_yr_)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "American Indian/Alaskan Native", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "white" ~ "White", race_ethnicity == "hispanic_latino" ~"Hispanic/Latino", race_ethnicity == "multiracial" ~"Multiracial", race_ethnicity == "black_african_american" ~ "Black/African American", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian or Pacific Islander", TRUE ~ race_ethnicity )) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(year = year_year) %>% arrange(district_id, race_ethnicity) }

Josh Gutwill

Josh Gutwill

November 8, 2022

I keep getting this error: Error in FUN(left) : invalid argument to unary operator

Here's my code: clean_enrollment_data % select(!contains("grade")) %>% select(!contains("percent")) %>% select(!contains("kindergarten")) %>% pivot_longer(cols = -district_id, names_to = "race_ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-")) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, 0)) %>% mutate(race_ethnicity = str_remove(race_ethnicity, string_to_remove)) %>% mutate(race_ethnicity = case_when( race_ethnicity == "american_indian_alaska_native" ~ "Native American or Alaskan", race_ethnicity == "asian" ~ "Asian", race_ethnicity == "native_hawaiian_pacific_islander" ~ "Hawaiian or Pacific Islander", race_ethnicity == "black_african_american" ~ "Black or African American", race_ethnicity == "hispanic_latino" ~ "Hispanic/Latinx", race_ethnicity == "white" ~ "White", TRUE ~ "Multiracial")) %>% group_by(district_id) %>% mutate(pct = number_of_students / sum(number_of_students)) %>% ungroup() %>% mutate(year = data_year) }

enrollment_by_race_ethnicity_18_19 < - clean_enrollment_data(raw_data = enrollment_18_19, string_to_remove = "x2018_19_", data_year = "2018-2019")

When I copy and paste David's code, I also get an error, though it's a different one: Error in mutate(): ! Problem while computing number_of_students = replace_na(number_of_students, 0). Caused by error in vec_assign(): ! Can't convert replace to match type of data . Run rlang::last_error() to see where the error occurred.

Kirstin O'Dell

Kirstin O'Dell

November 15, 2022

I thought in a prior video we were told to tidy our data in script and only use markdown for the output/reporting we want to do. I'm seeing that for this we're using markdown to tidy the data inside of the function. I'm confused as to which to be using. Can functions created in script be used in markdown files?

Andrew Paquin

Andrew Paquin

April 30, 2023

Hi David, I watched the "Heads Up" video above. If I understand it correctly, the function we created for the 18-19 data can't be used for the 17-18 data because of 1) the changes to the na_if command, and 2) the fact that the columns are in different formats (dbl and chr) in the two datasets. Is that what's happening?

Kiana Robinson

Kiana Robinson

May 17, 2023

What does this line of code do?

race_ethnicity_remove_text = "x2018_19_"

Why was it included in the function statement? This is confusing.

Zain Asaf

Zain Asaf

May 22, 2023

Hi Charlie and Dan, I am having the same issues as a couple of people with the replace_na line:

I get the following message Error in mutate(): ℹ In argument: number_of_students = replace_na(number_of_students, "0"). Caused by error in vec_assign(): ! Can't convert replace to match type of data .

However, I have used the following code for that line: mutate(number_of_students = replace_na(number_of_students, "0")) i.e. I have put the "0" in quotation marks, as suggested. Here is the code I have used. Note, I just labeled the column "ethnicity" not "race_ethnicity" clean_enrollment_data%
select(-contains("grade")) %>% select(-contains("kindergarten")) %>% select(-contains("percent")) %>% pivot_longer(cols = -district_id, names_to = "ethnicity", values_to = "number_of_students") %>% mutate(number_of_students = na_if(number_of_students, "-" )) %>% mutate(number_of_students = as.numeric(number_of_students)) %>% mutate(number_of_students = replace_na(number_of_students, "0")) %>% mutate(ethnicity = str_remove(ethnicity, ethnicity_remove_text)) %>% mutate(ethnicity = case_when( ethnicity == "american_indian_alaska_native" ~ "native american", ethnicity == "native_hawaiian_pacific_islander" ~"pacific islander", ethnicity == "hispanic_latino" ~ "latino", ethnicity == "black_african_american" ~ "african american", ethnicity == "white" ~ "white", ethnicity == "asian" ~ "Asian", ethnicity == "multiracial" ~ "Multiracial" )) %>% group_by(district_id)%>% mutate(pct = number_of_students / sum(number_of_students)*100) %>% ungroup() %>% mutate(year = "data_year")