Skip to content
R for the Rest of Us Logo

Changing Variable Types

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window


Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------


# Create Directories ------------------------------------------------------


# Download Data -----------------------------------------------------------


# download.file("",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")

# Import Data -------------------------------------------------------------

math_scores_2021_2022 <-
  read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |> 

# Tidy and Clean Data -----------------------------------------------------

third_grade_math_proficiency_2021_2022 <-
  math_scores_2021_2022 |> 
  filter(student_group == "Total Population (All Students)") |> 
  filter(grade_level == "Grade 3") |> 
  select(academic_year, school_id, contains("number_level")) |> 
  pivot_longer(cols = starts_with("number_level"),
               names_to = "proficiency_level",
               values_to = "number_of_students") |> 
  mutate(proficiency_level = case_when(
    proficiency_level == "number_level_4" ~ "4",
    proficiency_level == "number_level_3" ~ "3",
    proficiency_level == "number_level_2" ~ "2",
    proficiency_level == "number_level_1" ~ "1"

third_grade_math_proficiency_2021_2022 |> 
  mutate(number_of_students = as.numeric(number_of_students)) |> 
  group_by(proficiency_level) |> 
  summarize(total_students = sum(number_of_students, na.rm = TRUE))

third_grade_math_proficiency_2021_2022 |> 
  mutate(number_of_students = parse_number(number_of_students)) |> 
  group_by(proficiency_level) |> 
  summarize(total_students = sum(number_of_students, na.rm = TRUE))

Your Turn

  1. Convert the number_of_students variable to numeric by using as.numeric() and parse_number().

  2. Make sure you can use your number_of_students variable to count the total number of students in Oregon.

Use the following starter code to help you:

# Load Packages -----------------------------------------------------------


# Create Directories ------------------------------------------------------


# Download Data -----------------------------------------------------------


# download.file("",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20222023.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20212022.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20202021.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20192020.xlsx")
# download.file("",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20182019.xlsx")

# Import Data -------------------------------------------------------------

enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
                                   sheet = "School 2022-23") |> 

# Tidy and Clean Data -----------------------------------------------------

enrollment_by_race_ethnicity_2022_2023 <-
  enrollment_2022_2023 |> 
  select(district_institution_id, school_institution_id,
         x2022_23_american_indian_alaska_native:x2022_23_multi_racial) |> 
  select(-contains("percent")) |> 
  pivot_longer(cols = -c(district_institution_id, school_institution_id),
               names_to = "race_ethnicity",
               values_to = "number_of_students") |> 
  mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |> 
  mutate(race_ethnicity = case_when(
    race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
    race_ethnicity == "asian" ~ "Asian",
    race_ethnicity == "black_african_american" ~ "Black/African American",
    race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
    race_ethnicity == "multiracial" ~ "Multi-Racial",
    race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
    race_ethnicity == "white" ~ "White",
    race_ethnicity == "multi_racial" ~ "Multiracial"

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

I noticed this a couple videos back but I wanted to ask if it was correct and intentional or is it was a mistake - I see that in your case_when statement, you have two lines for multiracial:

race_ethnicity == "multiracial" ~ "Multi-Racial",


race_ethnicity == "multi_racial" ~ "Multiracial"

I haven't noticed any instances in the original dataset of race_ethnicity == "multiracial" but perhaps I missed them?

David Keyes

David Keyes Founder

November 2, 2023

That's totally just a mistake on my end!