Get access to all lessons in this course.
-
Advanced Data Wrangling
- Downloading and Importing Data
- Overview of Tidy Data
- Tidy Data Rule #1: Every Column is a Variable
- Tidy Data Rule #3: Every Cell is a Single Value
- Tidy Data Rule #2: Every Row is an Observation
- Changing Variable Types
- Dealing with Missing Data
- Advanced Summarizing
- Binding Data Frames
- Functions
- Data Merging
- Exporting Data
- Bring It All Together (Advanced Data Wrangling)
-
Advanced Data Visualization
- Best Practices in Data Visualization
- Tidy Data
- Pipe Data into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Add Descriptive Labels to Your Plots
- Use Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Create a Custom Theme
- Customize Your Fonts
- Try New Plot Types
- Bring it All Together (Advanced Data Visualization)
-
Quarto
- Advanced Markdown
- Advanced YAML and Code Chunk Options
- Tables
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: PDF Edition
- Making Your Reports Shine: HTML Edition
- Presentations
- Dashboards
- Websites
- Publishing Your Work
- Quarto Extensions
- Parameterized Reporting, Part 1
- Parameterized Reporting, Part 2
- Parameterized Reporting, Part 3
- Wrapping up Going Deeper with R
Going Deeper with R
Importing Data
This lesson is locked
This lesson is called Importing Data, part of the Going Deeper with R course. This lesson is called Importing Data, part of the Going Deeper with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
You’ll be working with data on Oregon school enrollment by race/ethnicity.
Create a new project. Make sure you put it somewhere you’ll be able to find it again later!
Download the two files using the download.file() function into a data-raw folder (which you’ll need to create).
2018-2019 Enrollment by Race/Ethnicity Data File: https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx
2017-2018 Enrollment by Race/Ethnicity Data File: https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-17-18.xlsx
Create a new R script file where you’ll do all of your data cleaning work
Import the two spreadsheets into two data frames (enrollment_17_18
and enrollment_18_19
)
Heads Up!
Learn More
You can read about all of the arguments for the download.file()
function here.
To learn more about importing Excel files, check out the readxl
package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!
I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.
You need to be signed-in to comment on this post. Login.
Jody Oconnor
April 20, 2021
fyi - I was able to download both .xlsx files, but when creating objects with the 17-18 data I got this error message: > download.file(url = "https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-17-18.xlsx",
trying URL 'https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-17-18.xlsx' Content type 'application/octet-stream' length 47088 bytes (45 KB) downloaded 45 KB
> enrollment_17_18 %
I was able to download the file to my computer, then copy it into the 'data-raw' folder manually and turn it into an object with R script successfully. So I'm moving on with the assignment but thought you would want to know.
Abby Isaacson
April 20, 2021
I am having a problem just downloading the files. My code (with or without the mode = "wb",):
download.file(url = "https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx", destfile = "data-raw-gd/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx")
and I get these two errors:
Warning messages: 1: In download.file(url = "https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx", : URL https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx: cannot open destfile 'data-raw/enrollment-18-19.xlsx', reason 'No such file or directory' 2: In download.file(url = "https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-18-19.xlsx", : download had nonzero exit status
This also happens with my own project file links (from Tidy Tuesday).
Nithin Pradeep
October 8, 2021
The excel file I am downloading is not getting opened in excel. The following dialog box appears-"file format or file extension not valid. Verify the file has not been corrupted and that file extension matches the format".
Matt M
November 1, 2021
You recommended data cleaning in R script but reporting in RMarkdown. What if I want my report to be 100% reproducible by someone else? Should I then do all my data cleaning within RMarkdown?
Matt M
November 1, 2021
At about 9:38 you said “let me highlight all of these” because keyboard shortcuts are faster. Did you use CTRL A to select all or something else to select just that chunk/those lines?
Niger Sultana
April 28, 2022
Hi David, What was your key at 12:38 second, that hash tag showed up before two lines R code ?
Niger Sultana
April 29, 2022
Hi I was trying to import data from two excels(field_data_AA and predictor season13July) where 4 spread sheets (name are Autumn, Spring, Winter and Summer. I found following messages. I first run code with Spring and data imported then run code with all seasons? It did not work. Can you suggest me please how can I fix cell reference in excel files ?
> field_data_2018
> field_data_2019 <- read_excel(path = "Data-raw/predictor season13July.xlsx",
Error: Can't guess format of this cell reference: Spring In addition: Warning message: Cell reference follows neither the A1 nor R1C1 format. Example: Spring NAs generated.
Cheers Niger
Esther Okoye
May 5, 2022
Hello whenever i try to run my code it keeps giving me an error mesage;
source("~/.active-rstudio-document", echo=TRUE) Error in source("~/.active-rstudio-document", echo = TRUE) : ~/.active-rstudio-document:6:107: unexpected symbol 5: 6: download.file(url="https://github.com/rfortherestofus/going-deeper/raw/master/data-raw/enrollment-mode = "wb ^
Julia Nee
October 31, 2022
What's the difference between the "destfile" argument and "dest" in read_excel, which seems to have also put my downloads into the folder I directed them to? Are they two ways of doing the same thing?
Kirstin O'Dell
November 2, 2022
I'm having trouble reading in the files. This is the error message:
> enrollment_18_19 <- read_excel(path = "data-raw/enrollment-18-19.xlsx",
Error in utils::unzip(zip_path, list = TRUE) : error -103 with zipfile in unzGetCurrentFileInfo
Amanda Krantz
November 2, 2022
I was able to download the files, but when I go to view them before import, I receive an error that "the file format or extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file."
Amanda Krantz
November 2, 2022
Disregard! I had noticed the wb code addition in the comments, but I realized I needed to add it to the download not the import.
Michelle Gichuru
November 22, 2022
Would we use the same process to download a Google Sheet file?
Hatem Kotb
January 12, 2023
Thanks for the note on Markdown vs Script, I thought it was only me 😁. Agree with your suggestion, makes sense 🙏🏼