Get access to all lessons in this course.
-
Advanced Data Wrangling
- Downloading and Importing Data
- Overview of Tidy Data
- Tidy Data Rule #1: Every Column is a Variable
- Tidy Data Rule #3: Every Cell is a Single Value
- Tidy Data Rule #2: Every Row is an Observation
- Changing Variable Types
- Dealing with Missing Data
- Advanced Summarizing
- Binding Data Frames
- Functions
- Data Merging
- Exporting Data
- Bring It All Together (Advanced Data Wrangling)
-
Advanced Data Visualization
- Best Practices in Data Visualization
- Tidy Data
- Pipe Data into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Add Descriptive Labels to Your Plots
- Use Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Create a Custom Theme
- Customize Your Fonts
- Try New Plot Types
- Bring it All Together (Advanced Data Visualization)
-
Quarto
- Advanced Markdown
- Advanced YAML and Code Chunk Options
- Tables
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: PDF Edition
- Making Your Reports Shine: HTML Edition
- Presentations
- Dashboards
- Websites
- Publishing Your Work
- Quarto Extensions
- Parameterized Reporting, Part 1
- Parameterized Reporting, Part 2
- Parameterized Reporting, Part 3
- Wrapping up Going Deeper with R
Going Deeper with R
Downloading and Importing Data
This lesson is locked
This lesson is called Downloading and Importing Data, part of the Going Deeper with R course. This lesson is called Downloading and Importing Data, part of the Going Deeper with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
library(readxl)
library(janitor)
# Create Directories ------------------------------------------------------
dir_create("data-raw")
# Download Data -----------------------------------------------------------
# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx
download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
mode = "wb",
destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")
# Import Data -------------------------------------------------------------
math_scores_2021_2022 <-
read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |>
clean_names()
Your Turn
You’ll be working with data on Oregon school enrollment by race/ethnicity.
Create a new project. Make sure you put it somewhere you’ll be able to find it again later!
Create a new R script file where you’ll do all of your data downloading, cleaning, and importing work.
Download the five most recent Fall Membership Report files using the
download.file()
function into adata-raw
folder (which you’ll need to create).Import the 2022-2023 spreadsheet into a data frame called
enrollment_2022_2023
, using theclean_names()
function from thejanitor
package to make our variable names easy to work with.
Learn More
You can read about all of the arguments for the download.file()
function here.
To learn about the fs
package, check out its documentation.
To learn more about importing Excel files, check out the readxl
package documentation. You’ll see, for example, ways to download only certain ranges of cells, which can be helpful when you have messy Excel data!
Other packages I mention in the video are:
I’ve also written an article about cleaning messy data in R. There are many packages to deal with messy data (which often comes in the form of Excel spreadsheets), and I go through several in the post.
And, finally, if you want a really deep dive on data cleaning, check out the course Data Cleaning with R.
You need to be signed-in to comment on this post. Login.
Brian Slattery
October 25, 2023
Is there a way that you can give read_excel an index for the sheet, rather than a name? Or is that not possible? (I tried looking on the help page and on Stack Overflow and couldn't find the answer)
David Keyes Founder
October 25, 2023
Yes, you can give it the number of the sheet like so:
That would read in the first sheet. I actually show this in the Functions lesson solutions video.
The only danger with doing this is that, if you change the order of sheets, your code will fail. Something to keep in mind!
Eric Juskewitz
January 17, 2024
Hello there!
It seems like the provided URL/link for the data is timed out ("server not found" for oregon.gov). Could I get the raw files in another way?
David Keyes Founder
January 18, 2024
Do you mind trying again? I checked right now and it is working for me. I'm wondering if it was maybe just down earlier today?
Eric Juskewitz
January 18, 2024
It worked now :)
Edith Namuleme
April 17, 2024
hi David, On following the instructions of this lesson and after downloading the data (fall membership report) and creating an object, it is only showing the first sheet of the excel document which has no data of interest. how can i create an object that gets all the sheets needed or that of my interest?
David Keyes Founder
April 17, 2024
Take a look at the solution code. :) You need to use the
sheet
argument to ensure you get the right data.Edith Namuleme
April 18, 2024
Got it, thanks
Gabby Bachhuber
April 27, 2024
Do you recommend downloading data one file at a time, or alternatively, write code to download all files and then run once to add them into the directory?
I downloaded all at once, but I had a few errors in my code, so a few didn't download until I fixed the code. Perhaps I could have avoided that by downloading one at a time?