Changing Variable Types
This lesson is called Changing Variable Types, part of the Going Deeper with R course. This lesson is called Changing Variable Types, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
#
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
# mode = "wb",
# destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
# Import Data -------------------------------------------------------------
math_scores_2021_2022 <-
read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |>
clean_names()
# Tidy and Clean Data -----------------------------------------------------
third_grade_math_proficiency_2021_2022 <-
math_scores_2021_2022 |>
filter(student_group == "Total Population (All Students)") |>
filter(grade_level == "Grade 3") |>
select(academic_year, school_id, contains("number_level")) |>
pivot_longer(cols = starts_with("number_level"),
names_to = "proficiency_level",
values_to = "number_of_students") |>
mutate(proficiency_level = case_when(
proficiency_level == "number_level_4" ~ "4",
proficiency_level == "number_level_3" ~ "3",
proficiency_level == "number_level_2" ~ "2",
proficiency_level == "number_level_1" ~ "1"
))
third_grade_math_proficiency_2021_2022 |>
mutate(number_of_students = as.numeric(number_of_students)) |>
group_by(proficiency_level) |>
summarize(total_students = sum(number_of_students, na.rm = TRUE))
third_grade_math_proficiency_2021_2022 |>
mutate(number_of_students = parse_number(number_of_students)) |>
group_by(proficiency_level) |>
summarize(total_students = sum(number_of_students, na.rm = TRUE))
Your Turn
Convert the
number_of_students
variable to numeric by usingas.numeric()
andparse_number().
Make sure you can use your
number_of_students
variable to count the total number of students in Oregon.
Use the following starter code to help you:
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20222023.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20212022.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20202021.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20192020.xlsx")
#
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
# mode = "wb",
# destfile = "data-raw/fallmembershipreport_20182019.xlsx")
# Import Data -------------------------------------------------------------
enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
sheet = "School 2022-23") |>
clean_names()
# Tidy and Clean Data -----------------------------------------------------
enrollment_by_race_ethnicity_2022_2023 <-
enrollment_2022_2023 |>
select(district_institution_id, school_institution_id,
x2022_23_american_indian_alaska_native:x2022_23_multi_racial) |>
select(-contains("percent")) |>
pivot_longer(cols = -c(district_institution_id, school_institution_id),
names_to = "race_ethnicity",
values_to = "number_of_students") |>
mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |>
mutate(race_ethnicity = case_when(
race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
race_ethnicity == "asian" ~ "Asian",
race_ethnicity == "black_african_american" ~ "Black/African American",
race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
race_ethnicity == "multiracial" ~ "Multi-Racial",
race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
race_ethnicity == "white" ~ "White",
race_ethnicity == "multi_racial" ~ "Multiracial"
))
Have any questions? Put them below and we will help you out!
Course Content
44 Lessons
1
Downloading and Importing Data
10:32
2
Overview of Tidy Data
05:50
3
Tidy Data Rule #1: Every Column is a Variable
07:43
4
Tidy Data Rule #3: Every Cell is a Single Value
10:04
5
Tidy Data Rule #2: Every Row is an Observation
04:42
6
Changing Variable Types
04:51
7
Dealing with Missing Data
04:55
8
Advanced Summarizing
06:25
9
Binding Data Frames
07:17
10
Functions
15:06
11
Data Merging
09:27
12
Exporting Data
04:38
13
Bring It All Together (Advanced Data Wrangling)
13:03
1
Best Practices in Data Visualization
03:44
2
Tidy Data
02:25
3
Pipe Data into ggplot
09:54
4
Reorder Plots to Highlight Findings
03:37
5
Line Charts
04:17
6
Use Color to Highlight Findings
09:16
7
Declutter
08:29
8
Add Descriptive Labels to Your Plots
09:10
9
Use Titles to Highlight Findings
08:14
10
Use Annotations to Explain
07:09
11
Tweak Spacing
05:11
12
Create a Custom Theme
03:47
13
Customize Your Fonts
08:32
14
Try New Plot Types
03:24
15
Bring it All Together (Advanced Data Visualization)
14:30
1
Advanced Markdown
06:43
2
Advanced YAML and Code Chunk Options
05:53
3
Tables
18:36
4
Inline R Code
04:42
5
Making Your Reports Shine: Word Edition
04:30
6
Making Your Reports Shine: PDF Edition
06:11
7
Making Your Reports Shine: HTML Edition
06:06
8
Presentations
10:12
9
Dashboards
05:38
10
Websites
06:43
11
Publishing Your Work
04:38
12
Quarto Extensions
05:50
13
Parameterized Reporting, Part 1
10:57
14
Parameterized Reporting, Part 2
05:11
15
Parameterized Reporting, Part 3
07:47
16
Wrapping up Going Deeper with R
You need to be signed-in to comment on this post. Login.
Pa Thao • November 2, 2023
I noticed this a couple videos back but I wanted to ask if it was correct and intentional or is it was a mistake - I see that in your case_when statement, you have two lines for multiracial:
and
I haven't noticed any instances in the original dataset of race_ethnicity == "multiracial" but perhaps I missed them?
David Keyes Founder • November 2, 2023
That's totally just a mistake on my end!