You’ll be working with data on Oregon school enrollment by race/ethnicity.
Create a new project. Make sure you put it somewhere you’ll be able to find it again later!
Download the two files using the download.file() function into a data-raw folder (which you’ll need to create).
- 2018-2019 Enrollment by Race/Ethnicity Data File: https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx
- 2017-2018 Enrollment by Race/Ethnicity Data File: https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-17-18.xlsx
Create a new R script file where you’ll do all of your data cleaning work
Import the two spreadsheets into two data frames (
You can read about all of the arguments for the
download.file() function here.
To learn more about importing Excel files, check out the
readxl package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!
I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.