Skip to content
R for the Rest of Us Logo

Downloading and Importing Data

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx

download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")

download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
              mode = "wb",
              destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")


# Import Data -------------------------------------------------------------

math_scores_2021_2022 <-
  read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |> 
  clean_names()

Your Turn

You’ll be working with data on Oregon school enrollment by race/ethnicity.

  1. Create a new project. Make sure you put it somewhere you’ll be able to find it again later!

  2. Create a new R script file where you’ll do all of your data downloading, cleaning, and importing work.

  3. Download the five most recent Fall Membership Report files using the download.file() function into a data-raw folder (which you’ll need to create).

  4. Import the 2022-2023 spreadsheet into a data frame called enrollment_2022_2023, using the clean_names() function from the janitor package to make our variable names easy to work with.

Learn More

You can read about all of the arguments for the download.file() function here.

To learn about the fs package, check out its documentation.

To learn more about importing Excel files, check out the readxl package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!

Other packages I mention in the video are:

I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.

And, finally, if you want a really deep dive on data cleaning, check out the course Data Cleaning with R.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Brian Slattery

Brian Slattery

October 25, 2023

Is there a way that you can give read_excel an index for the sheet, rather than a name? Or is that not possible? (I tried looking on the help page and on Stack Overflow and couldn't find the answer)

David Keyes

David Keyes Founder

October 25, 2023

Yes, you can give it the number of the sheet like so:

read_excel(path = "data.xlsx", sheet = 1)

That would read in the first sheet. I actually show this in the Functions lesson solutions video.

The only danger with doing this is that, if you change the order of sheets, your code will fail. Something to keep in mind!

Eric Juskewitz

Eric Juskewitz

January 17, 2024

Hello there!

It seems like the provided URL/link for the data is timed out ("server not found" for oregon.gov). Could I get the raw files in another way?

David Keyes

David Keyes Founder

January 18, 2024

Do you mind trying again? I checked right now and it is working for me. I'm wondering if it was maybe just down earlier today?

Eric Juskewitz

Eric Juskewitz

January 18, 2024

It worked now :)

Edith Namuleme

Edith Namuleme

April 17, 2024

hi David, On following the instructions of this lesson and after downloading the data (fall membership report) and creating an object, it is only showing the first sheet of the excel document which has no data of interest. how can i create an object that gets all the sheets needed or that of my interest?

David Keyes

David Keyes Founder

April 17, 2024

Take a look at the solution code. :) You need to use the sheet argument to ensure you get the right data.

Edith Namuleme

Edith Namuleme

April 18, 2024

Got it, thanks

Gabby Bachhuber

Gabby Bachhuber

April 27, 2024

Do you recommend downloading data one file at a time, or alternatively, write code to download all files and then run once to add them into the directory?

I downloaded all at once, but I had a few errors in my code, so a few didn't download until I fixed the code. Perhaps I could have avoided that by downloading one at a time?