Downloading and Importing Data
This lesson is called Downloading and Importing Data, part of the Going Deeper with R course. This lesson is called Downloading and Importing Data, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
# Import Data -------------------------------------------------------------
math_scores_2021_2022 <-
read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |>
clean_names()
Your Turn
You’ll be working with data on Oregon school enrollment by race/ethnicity.
Create a new project. Make sure you put it somewhere you’ll be able to find it again later!
Create a new R script file where you’ll do all of your data downloading, cleaning, and importing work.
Download the five most recent Fall Membership Report files using the
download.file()
function into adata-raw
folder (which you’ll need to create).Import the 2022-2023 spreadsheet into a data frame called
enrollment_2022_2023
, using theclean_names()
function from thejanitor
package to make our variable names easy to work with.
Learn More
You can read about all of the arguments for the download.file()
function here. If you're very interested in reading in data from the internet, June Choe has put together a very comprehensive article called Read files on the web into R.
To learn about the fs
package, check out its documentation.
To learn more about importing Excel files, check out the readxl
package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!
Other packages I mention in the video are:
I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.
And, finally, if you want a really deep dive on data cleaning, check out the course Data Cleaning with R.
Have any questions? Put them below and we will help you out!
Course Content
44 Lessons
You need to be signed-in to comment on this post. Login.
Brian Slattery • October 25, 2023
Is there a way that you can give read_excel an index for the sheet, rather than a name? Or is that not possible? (I tried looking on the help page and on Stack Overflow and couldn't find the answer)
David Keyes Founder • October 25, 2023
Yes, you can give it the number of the sheet like so:
That would read in the first sheet. I actually show this in the Functions lesson solutions video.
The only danger with doing this is that, if you change the order of sheets, your code will fail. Something to keep in mind!
Eric Juskewitz • January 17, 2024
Hello there!
It seems like the provided URL/link for the data is timed out ("server not found" for oregon.gov). Could I get the raw files in another way?
David Keyes Founder • January 18, 2024
Do you mind trying again? I checked right now and it is working for me. I'm wondering if it was maybe just down earlier today?
Eric Juskewitz • January 18, 2024
It worked now :)
Edith Namuleme • April 17, 2024
hi David, On following the instructions of this lesson and after downloading the data (fall membership report) and creating an object, it is only showing the first sheet of the excel document which has no data of interest. how can i create an object that gets all the sheets needed or that of my interest?
David Keyes Founder • April 17, 2024
Take a look at the solution code. :) You need to use the
sheet
argument to ensure you get the right data.Edith Namuleme • April 18, 2024
Got it, thanks
Gabby Bachhuber • April 27, 2024
Do you recommend downloading data one file at a time, or alternatively, write code to download all files and then run once to add them into the directory?
I downloaded all at once, but I had a few errors in my code, so a few didn't download until I fixed the code. Perhaps I could have avoided that by downloading one at a time?
David Keyes Founder • April 29, 2024
If it were me, I'd write code to download them all. This might involve testing along the way and running code file by file to make sure it works, but in the end, I'd much rather have code that I can run to download everything.
David Keyes • September 3, 2024
Just testing
Vivian KAWANAMI • October 20, 2024
Hi! Hmm it was funny that when I added sheet = "School 2022-23" as in the solution, I received the message below:
Error: Sheet 'School 2022-23' not found
But it's ok since sheet=4 worked when loading the data ;)
Gracielle Higino Coach • October 22, 2024
Hi Vivian! That odd... I couldn't reproduce your error, the code is working here on my end. Did this happen when importing the
fallmembershipreport_20222023.xlsx
file?Raouf Kilada • October 24, 2024
The oregon site does not look like the one that you are using in the video..... I have hard time to find the files that you downloded
Gracielle Higino Coach • October 25, 2024
Hi Raouf! The link in the "Your turn" section is different from the one in the lesson demonstration. You're supposed to use a different dataset for the assignment. In the "View code shown in video" section above, you can get the link to the dataset that David is using in the video.
Tosan Okome • November 7, 2024
Hi. I need some help, please.
I ran this code for the final part of the Your Turn instructions:
And I keep getting this error message:
Any idea why?
Gracielle Higino Coach • November 8, 2024
Hi Tosan! This could be many things... Sometimes you get this error if the Excel file is open, but it can also be a path problem. Do you want to book a time to debug together?
Tosan Okome • November 11, 2024
Sure. I'll be in your mailbox. Thank you!
gene trevino • January 9, 2025
when I run the code I get download had nonzero exit status
what is the problem ?
David Keyes Founder • January 14, 2025
Can you post the exact code you ran and the exact error message you received?