Skip to content
R for the Rest of Us Logo

Going Deeper with R

Downloading and Importing Data

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx

download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")


# Import Data -------------------------------------------------------------

math_scores_2021_2022 <-
  read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |> 
  clean_names()

Your Turn

You’ll be working with data on Oregon school enrollment by race/ethnicity.

  1. Create a new project. Make sure you put it somewhere you’ll be able to find it again later!

  2. Create a new R script file where you’ll do all of your data downloading, cleaning, and importing work.

  3. Download the five most recent Fall Membership Report files using the download.file() function into a data-raw folder (which you’ll need to create).

  4. Import the 2022-2023 spreadsheet into a data frame called enrollment_2022_2023, using the clean_names() function from the janitor package to make our variable names easy to work with.

Learn More

You can read about all of the arguments for the download.file() function here. If you're very interested in reading in data from the internet, June Choe has put together a very comprehensive article called Read files on the web into R.

To learn about the fs package, check out its documentation.

To learn more about importing Excel files, check out the readxl package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!

Other packages I mention in the video are:

I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.

And, finally, if you want a really deep dive on data cleaning, check out the course Data Cleaning with R.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Brian Slattery

Brian Slattery • October 25, 2023

Is there a way that you can give read_excel an index for the sheet, rather than a name? Or is that not possible? (I tried looking on the help page and on Stack Overflow and couldn't find the answer)

David Keyes

David Keyes Founder • October 25, 2023

Yes, you can give it the number of the sheet like so:

read_excel(path = "data.xlsx", sheet = 1)

That would read in the first sheet. I actually show this in the Functions lesson solutions video.

The only danger with doing this is that, if you change the order of sheets, your code will fail. Something to keep in mind!

Eric Juskewitz

Eric Juskewitz • January 17, 2024

Hello there!

It seems like the provided URL/link for the data is timed out ("server not found" for oregon.gov). Could I get the raw files in another way?

David Keyes

David Keyes Founder • January 18, 2024

Do you mind trying again? I checked right now and it is working for me. I'm wondering if it was maybe just down earlier today?

Eric Juskewitz

Eric Juskewitz • January 18, 2024

It worked now :)

Edith Namuleme

Edith Namuleme • April 17, 2024

hi David, On following the instructions of this lesson and after downloading the data (fall membership report) and creating an object, it is only showing the first sheet of the excel document which has no data of interest. how can i create an object that gets all the sheets needed or that of my interest?

David Keyes

David Keyes Founder • April 17, 2024

Take a look at the solution code. :) You need to use the sheet argument to ensure you get the right data.

Edith Namuleme

Edith Namuleme • April 18, 2024

Got it, thanks

Gabby Bachhuber

Gabby Bachhuber • April 27, 2024

Do you recommend downloading data one file at a time, or alternatively, write code to download all files and then run once to add them into the directory?

I downloaded all at once, but I had a few errors in my code, so a few didn't download until I fixed the code. Perhaps I could have avoided that by downloading one at a time?

David Keyes

David Keyes Founder • April 29, 2024

If it were me, I'd write code to download them all. This might involve testing along the way and running code file by file to make sure it works, but in the end, I'd much rather have code that I can run to download everything.

David Keyes

David Keyes • September 3, 2024

Pending approval

Just testing

Vivian KAWANAMI

Vivian KAWANAMI • October 20, 2024

Hi! Hmm it was funny that when I added sheet = "School 2022-23" as in the solution, I received the message below:

Error: Sheet 'School 2022-23' not found

But it's ok since sheet=4 worked when loading the data ;)

Gracielle Higino

Gracielle Higino Coach • October 22, 2024

Hi Vivian! That odd... I couldn't reproduce your error, the code is working here on my end. Did this happen when importing the fallmembershipreport_20222023.xlsx file?

Raouf Kilada

Raouf Kilada • October 24, 2024

The oregon site does not look like the one that you are using in the video..... I have hard time to find the files that you downloded

Gracielle Higino

Gracielle Higino Coach • October 25, 2024

Hi Raouf! The link in the "Your turn" section is different from the one in the lesson demonstration. You're supposed to use a different dataset for the assignment. In the "View code shown in video" section above, you can get the link to the dataset that David is using in the video.

Tosan Okome

Tosan Okome • November 7, 2024

Hi. I need some help, please.

I ran this code for the final part of the Your Turn instructions:

enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx") |> 
  clean_names()

And I keep getting this error message:

Error in utils::unzip(zip_path, list = TRUE) : 
  error -103 with zipfile in unzGetCurrentFileInfo

Any idea why?

Gracielle Higino

Gracielle Higino Coach • November 8, 2024

Hi Tosan! This could be many things... Sometimes you get this error if the Excel file is open, but it can also be a path problem. Do you want to book a time to debug together?

Tosan Okome

Tosan Okome • November 11, 2024

Sure. I'll be in your mailbox. Thank you!

gene trevino

gene trevino • January 9, 2025

when I run the code I get download had nonzero exit status

what is the problem ?

David Keyes

David Keyes Founder • January 14, 2025

Can you post the exact code you ran and the exact error message you received?

Course Content

44 Lessons