Lesson 3 of 43
In Progress

Importing Data

Your Turn

You’ll be working with data on Oregon school enrollment by race/ethnicity.

Create a new project. Make sure you put it somewhere you’ll be able to find it again later!

Download the two files using the download.file() function into a data-raw folder (which you’ll need to create).

Create a new R script file where you’ll do all of your data cleaning work

Import the two spreadsheets into two data frames (enrollment_17_18 and enrollment_18_19)

Heads Up!

If you have issues opening the spreadsheet, note that you may need to add the argument mode = "wb" to the download.file() function, as follows (read more about why here). You can see the full code in the solutions section below if necessary.

Learn More

You can read about all of the arguments for the download.file() function here.

To learn more about importing Excel files, check out the readxl package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!

I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.

Have any questions? Put them below.

  1. fyi – I was able to download both .xlsx files, but when creating objects with the 17-18 data I got this error message:
    > download.file(url = “https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-17-18.xlsx”,
    + destfile = “data-raw/enrollment-17-18.xlsx”)
    trying URL ‘https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-17-18.xlsx’
    Content type ‘application/octet-stream’ length 47088 bytes (45 KB)
    downloaded 45 KB

    > enrollment_17_18 %
    + clean_names()
    Error: Evaluation error: zip file ‘data-raw/enrollment-17-18.xlsx’ cannot be opened.

    I was able to download the file to my computer, then copy it into the ‘data-raw’ folder manually and turn it into an object with R script successfully. So I’m moving on with the assignment but thought you would want to know.

        1. looking back at my script… I did use mode = “wb” for the 18-19 data download, but not for the 17-18 dataset. So that is probably why one worked and the other didn’t.

  2. I am having a problem just downloading the files. My code (with or without the mode = “wb”,):

    download.file(url = “https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx”,
    destfile = “data-raw-gd/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx”)

    and I get these two errors:

    Warning messages:
    1: In download.file(url = “https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx”, :
    URL https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx: cannot open destfile ‘data-raw/enrollment-18-19.xlsx’, reason ‘No such file or directory’
    2: In download.file(url = “https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx”, :
    download had nonzero exit status

    This also happens with my own project file links (from Tidy Tuesday).

    1. Do you have a folder called data-raw-gd? That’s where the download.file() function is trying to put the file it downloads, but if that folder doesn’t exist, it won’t work.

        1. Actually, take a look back at your destfile argument:

          destfile = “data-raw-gd/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx”

          It’s going to try to put the data in the data-raw-gd/going-deeper/raw/master/data-raw/ folder. Try this instead and let me know if it works:

          download.file(url = "https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx",
          destfile = "enrollment-18-19.xlsx")

          That will put the file in the root of the project you’re working in. You did create a completely new project for this course, correct?

          1. ah! I was copying too much of the url including forward slashes that were not right. Simple but frustrating, thanks!

          2. this is what worked:
            download.file(url=”https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx”,
            destfile= “data-raw-gd/enrollment-18-19.xlsx”)

          3. And I went through the steps to create a new project in a new Directory, but I am still saving everything in an “R in 3 mo” folder. I think I have some troubleshooting yet to do about keeping projects truly separate.

          4. Yeah, I would recommend one RStudio project for each distinct project. It’s a bit artificial now because you’re working on “projects” that are actually just materials for each course. We can discuss more this week in our live session!

  3. The excel file I am downloading is not getting opened in excel. The following dialog box appears-“file format or file extension not valid. Verify the file has not been corrupted and that file extension matches the format”.