Changing Variable Types

This lesson is called Changing Variable Types, part of the Going Deeper with R course. This lesson is called Changing Variable Types, part of the Going Deeper with R course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video

# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx

# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")


# Import Data -------------------------------------------------------------

math_scores_2021_2022 <-
  read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |> 
  clean_names()


# Tidy and Clean Data -----------------------------------------------------

third_grade_math_proficiency_2021_2022 <-
  math_scores_2021_2022 |> 
  filter(student_group == "Total Population (All Students)") |> 
  filter(grade_level == "Grade 3") |> 
  select(academic_year, school_id, contains("number_level")) |> 
  pivot_longer(cols = starts_with("number_level"),
               names_to = "proficiency_level",
               values_to = "number_of_students") |> 
  mutate(proficiency_level = case_when(
    proficiency_level == "number_level_4" ~ "4",
    proficiency_level == "number_level_3" ~ "3",
    proficiency_level == "number_level_2" ~ "2",
    proficiency_level == "number_level_1" ~ "1"
  )) 

third_grade_math_proficiency_2021_2022 |> 
  mutate(number_of_students = as.numeric(number_of_students)) |> 
  group_by(proficiency_level) |> 
  summarize(total_students = sum(number_of_students, na.rm = TRUE))

third_grade_math_proficiency_2021_2022 |> 
  mutate(number_of_students = parse_number(number_of_students)) |> 
  group_by(proficiency_level) |> 
  summarize(total_students = sum(number_of_students, na.rm = TRUE))

Your Turn

Solution

# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx

# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20222023.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20212022.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20202021.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20192020.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20182019.xlsx")

# Import Data -------------------------------------------------------------

enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
                                   sheet = "School 2022-23") |> 
  clean_names()

# Tidy and Clean Data -----------------------------------------------------

enrollment_by_race_ethnicity_2022_2023 <-
  enrollment_2022_2023 |> 
  select(district_institution_id, x2022_23_american_indian_alaska_native:x2022_23_percent_multi_racial) |>  
  select(-contains("percent")) |> 
  pivot_longer(cols = -district_institution_id,
               names_to = "race_ethnicity",
               values_to = "number_of_students") |> 
  mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |> 
  mutate(race_ethnicity = case_when(
    race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
    race_ethnicity == "asian" ~ "Asian",
    race_ethnicity == "black_african_american" ~ "Black/African American",
    race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
    race_ethnicity == "multi_racial" ~ "Multiracial",
    race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
    race_ethnicity == "white" ~ "White"
  ))

enrollment_by_race_ethnicity_2022_2023 |> 
  mutate(number_of_students = as.numeric(number_of_students)) |> 
  summarize(total_students = sum(number_of_students, na.rm = TRUE))

enrollment_by_race_ethnicity_2022_2023 |> 
  mutate(number_of_students = parse_number(number_of_students)) |> 
  summarize(total_students = sum(number_of_students, na.rm = TRUE))

Convert the number_of_students variable to numeric by using as.numeric() and parse_number().
Make sure you can use your number_of_students variable to count the total number of students in Oregon.

Use the following starter code to help you:

# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx

# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20222023.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20212022.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20202021.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20192020.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20182019.xlsx")

# Import Data -------------------------------------------------------------

enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
                                   sheet = "School 2022-23") |> 
  clean_names()

# Tidy and Clean Data -----------------------------------------------------

enrollment_by_race_ethnicity_2022_2023 <-
  enrollment_2022_2023 |> 
  select(district_institution_id, school_institution_id,
         x2022_23_american_indian_alaska_native:x2022_23_multi_racial) |> 
  select(-contains("percent")) |> 
  pivot_longer(cols = -c(district_institution_id, school_institution_id),
               names_to = "race_ethnicity",
               values_to = "number_of_students") |> 
  mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |> 
  mutate(race_ethnicity = case_when(
    race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
    race_ethnicity == "asian" ~ "Asian",
    race_ethnicity == "black_african_american" ~ "Black/African American",
    race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
    race_ethnicity == "multiracial" ~ "Multi-Racial",
    race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
    race_ethnicity == "white" ~ "White",
    race_ethnicity == "multi_racial" ~ "Multiracial"
  ))

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Pa Thao • November 2, 2023

I noticed this a couple videos back but I wanted to ask if it was correct and intentional or is it was a mistake - I see that in your case_when statement, you have two lines for multiracial:

race_ethnicity == "multiracial" ~ "Multi-Racial",

and

race_ethnicity == "multi_racial" ~ "Multiracial"

I haven't noticed any instances in the original dataset of race_ethnicity == "multiracial" but perhaps I missed them?

David Keyes Founder • November 2, 2023

That's totally just a mistake on my end!

Mike LeVan • April 21, 2025

Hi. Quick question. I have setup the beginning of my script as you recommend above, adjusting for the years of the data set. When I run

head(enrollment_by_race_ethnicity_2024_2025)

I see the "number_of_students" variable is char.

When I run

enrollment_by_race_ethnicity_2024_2025 |> mutate(number_of_students = as.numeric(number_of_students))

the output says the column is now a double. However, when I run the head command again, the variable "number_of_students" is listed as a character.

To make the change persistent do I need to save the changes back to the variable after the mutate command? You don't do this in the video so I'm a tad confused. Thanks,

Gracielle Higino Coach • April 21, 2025

Hi Mike! To store any transformation or new information added to your dataframe, you need to assign the changed dataframe to an object (either new or pre-existing). In this case, you'd need to assign the changed dataframe back to the object enrollment_by_race_ethnicity_2024_2025, like this:

enrollment_by_race_ethnicity_2024_2025 <-
	enrollment_by_race_ethnicity_2024_2025 |> 
		mutate(number_of_students = as.numeric(number_of_students))

Without the assignment operator, no operation/transformation is permanent.

Course Content

44 Lessons

Downloading and Importing Data