Get access to all lessons in this course.
Getting Started with R
Import Data
This lesson is locked
This lesson is called Import Data, part of the Getting Started with R course. This lesson is called Import Data, part of the Getting Started with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Create a new R script file and save it as
import.R
Add the line
library(tidyverse)
at the top of your R script file and run it to load the tidyverse package.Use the
read_csv()
function (notread.csv()
) to import thepenguins_data.csv
file
You need to be signed-in to comment on this post. Login.
S. Revi Sterling
March 17, 2021
> faketucky <- read.csv(data/faketucky.csv) Error in read.table(file = file, header = header, sep = sep, quote = quote, : object 'faketucky.csv' not found
S. Revi Sterling
March 17, 2021
i reloaded the packages... so confused!
David Keyes
March 17, 2021
Two things here:
faketucky <- read.csv("data/faketucky.csv")
Atlang Mompe
March 29, 2021
Hi David,
In your example you have double quotes around your syntax, but it wont work on my computer (using windows), unless I have single quotes, is that normal? This is the code that works for me: faketucky <-read_csv ('data/faketucky.csv')
David Keyes
March 29, 2021
My guess is that when you use double quotes they are being converted into "smart quotes". Do double quotes work in other places for you? If you try to install the tidyverse, for example, does this work?
install.packages("tidyverse")
Faythe Aiken
March 30, 2021
Hi David - I'm unable to load the read_csv function from tidyverse. When trying to install the tidyverse package, I get the following failure to download either the binary or source files. What's puzzling is I can download them directly in my browser but in R Studio. > install.packages("vctrs", type="binary") Installing package into ‘\pdcnt19/AikenF$/My Documents/R-local’ (as ‘lib’ is unspecified)
There is a binary version available (and will be installed) but the source version is later: binary source vctrs 0.3.6 0.3.7
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/vctrs_0.3.6.zip' Warning in install.packages : InternetOpenUrl failed: 'The operation timed out' Error in download.file(url, destfile, method, mode = "wb", ...) : cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/vctrs_0.3.6.zip' Warning in install.packages : download of package ‘vctrs’ failed > install.packages("vctrs", type="source") Installing package into ‘\pdcnt19/AikenF$/My Documents/R-local’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/src/contrib/vctrs_0.3.7.tar.gz' Warning in install.packages : InternetOpenUrl failed: 'The operation timed out' Error in download.file(url, destfile, method, mode = "wb", ...) : cannot open URL 'https://cran.rstudio.com/src/contrib/vctrs_0.3.7.tar.gz' Warning in install.packages : download of package ‘vctrs’ failed
David Keyes
March 30, 2021
This is almost certainly an issue with where it's trying to install packages. I made a video to help you fix it. Let me know if that works!
Faythe Aiken
March 31, 2021
Thanks, David. Unfortunately, this didn't fix it. I think I'll just give up on my work PC and switch to Mac for this course. I'm stumped and it may be from some network complications.
Lisa Janz
March 31, 2021
I can't figure out why, but the keep throwing the following error code: Error: object 'faketucky' not found Here is what I have done:> library(tidyverse) -- Attaching packages ---------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() > library(skimr) > faketucky->read_csv("faketucky.csv") Error: object 'faketucky' not found > setwd("C:/Users/ArcticFox/Desktop/getting-started-master/data") > faketucky->read_csv("faketucky.csv") Error: object 'faketucky' not found
Lisa Janz
March 31, 2021
And it doesn't work if I put the arrow going in the right direction either. I have used R pretty regularly and tried several things with this, but for some reason, I really can't get it to open the file.
David Keyes
March 31, 2021
Did you open the Getting Started project? If you do that, you shouldn't need to use the setwd() line.
Lisa Janz
March 31, 2021
That was just something I was trying to see if it working. I have tried various permutations and nothing has worked. Sometimes R can be glitchy, so I tried it again this morning after the computer has been shut off and restarted. Now I am getting this: > faketucky<-read_csv("faketucky.csv") Error in read_csv("faketucky.csv") : could not find function "read_csv"
David Keyes
March 31, 2021
Have you loaded the tidyverse before running that line? You need to run:
library(tidyverse)
If you don't do that in this session of R, you won't have access to the
read_csv()
function.Lisa Janz
March 31, 2021
I did a couple times, but I can try running it again. I moved on to the Fundamentals course and am finding that the console is not responding to (read.docx) either. Is this also something that tidyverse should take care of? I don't typically work using those programs, so maybe this is what is throwing me off?
Lisa Janz
March 31, 2021
> library(tidyverse) -- Attaching packages ---------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() > faketucky<-read_csv("faketucky.csv") Error: 'faketucky.csv' does not exist in current working directory ('C:/Users/ArcticFox/Desktop/getting-started-master').
Lisa Janz
March 31, 2021
After some investigation, I realized that what I needed to do was move the "getting-started" material out of my desktop (where the system automatically dumped it). R seems to have been locked into my "Documents" folder and was not able to read higher in my computer indexes. I then had to set my working directory to the data folder containing faketucky.csv. That seems to have fixed the problem.
David Keyes
March 31, 2021
Ok, glad you got it figured out! FYI, you'll have more success if you open the project, which sets your working directory to the root of the folder where it's located. See this article from Jenny Bryan for an explanation.
Lisa Janz
April 1, 2021
Thanks for this. That was how I started doing it - opening the project, but it my system couldn't access it when it was on my desktop. I think there is something going on with how R-Studio is accessing my directories. When I wrote "read.docx" earlier, it was in relation to the Fundamentals program. It was not just with this set of exercises... R was not reading any of the projects properly. I now realize that this was a directory problem.
Josh Rodriguez
May 14, 2021
Hey David, It appears I am getting the common issue noted in the comments here. That "faketucky does not exist in the current working directory." I looked at your response as a way to resolve the matter but it doesn't appear that faketucky is in my Rproj by default. This is where my R session is attempting to pull the data from by default
Josh Rodriguez
May 14, 2021
Never mind! I have a second R console open from downloading the course project and that isn't where I needed to be for uploading the faketucky dataset
David Keyes
May 14, 2021
Glad you got it figured out! Yes, each RStudio project sets the working directory, which is why it only worked on the one where you downloaded the course project.
Scott Clark
July 18, 2021
Hi David. Tidyverse was installed and loaded. I could see and use read.csv, but not read_csv. I noticed readr wasn't listed in the packages:
> library(tidyverse) -- Attaching packages ---------------------------------------------------- tidyverse 1.3.1 -- v ggplot2 3.3.5 v dplyr 1.0.7 v tibble 3.1.2 v stringr 1.4.0 v tidyr 1.1.3 v forcats 0.5.1 v purrr 0.3.4
I was able to install and load readr separately to get around this, but is there a reason why it might not have installed with the rest of tidyverse? Could I be missing any other packages that I might need later?
David Keyes
July 19, 2021
That's super odd! I've never seen this before. I don't know exactly why this is happening. If it were me, I'd just try reinstalling tidyverse using
install.packages("tidyverse")
. Let me know if that helps!Christine Mahoney
August 21, 2021
Difficult having issues. I keep receiving Error: 'faketucky.csv' does not exist in current working directory ('/Users/christinemahoney/Desktop/getting-started-master').
Christine Mahoney
August 21, 2021
also tired ("data/faketucky.csv")
David Keyes
August 23, 2021
Are you sure you're working in a project? For example, here's a screenshot that shows I'm working in a project. Do you have something in the upper right corner of RStudio?
Prince Baawuah
October 11, 2021
I mostly work with very very large datasets. Are there any packages and/or tips on how to efficiently import and work with very very large datasets quickly(e.g. if parallel processing?) on the desktop?
Charlie Hadley
October 12, 2021
Hi Prince! Thanks for asking this, as always when working with large data the answer depends on two things:
The readr::read_csv() function was NOT built for speed. The vroom::vroom() function is designed for speed and is significantly faster than the native {data.table} import functions, see the table at the top of this page https://vroom.r-lib.org/. See here for more thorough benchmarking.
This is assuming that you can fit the entire dataset in memory, which I choose to translate into datasets smaller than 10Gb. If your data is larger than this it likely makes sense to convert your data into an SQLite database and operate on that. This blogpost takes you through this entire process, including how to create a database in R and to use dplyr to operate on the database.
With files of this size there are going to be computation bottlenecks that parallel processing can help with. The tidyverse includes a functional programming paradigm through the {purrr} package, this can be parallelised through the excellent {furrr} package. I'm assuming some familiarity with programming terminology as you mentioned parallel processing.
For truly large datasets (100Gb+) I don't have real-world experience in using these. If your data is that large it is probably useful to know more about the structure of your datasets and the types of analysis you're planning on doing.
Lukas Harringer
March 10, 2022
Hi, when I run the read_csv function, the data appears in my Console not in the Environment section.
Charlie Hadley
March 11, 2022
Hello Lukas,
I suspect that you've missed off the assignment, which is a very common mistake for both new and experienced R users. Here's a short video demonstrating what I think is happening for you. Let me know if the issue is something else. Cheers, Charlotte
Michael Steinhoff
March 17, 2022
could not find function "read_csv". Looked back at error code from loading tidyverse and have this: ** Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j = 0.7.6 is required ** Seems like something is not up to date, but i'm not sure what
Michael Steinhoff
March 17, 2022
self-solved! well, with a little help from stack overflow...
Charlie Hadley
March 17, 2022
Awesome to hear! Good googling.
Jessica Brewer
October 5, 2022
What is meant by "the working directory"? The main folder in the Files environment?
Charlie Hadley
October 5, 2022
The working directory is a folder on your machine that R is currently looking inside of for file paths. It's beneficial as it allows us to use a relative file path, eg data/my-data.csv instead of an absolute file path like C:/Users/charliejhadley/documents/r/2022/analysis/data/my-data.csv
These relative file paths are convenient for us when writing code and for making our code more transportable and reproducible. RStudio projects work by setting the working directory to the folder with the .Rproj file inside of it.
Cheers, Charlie
Amy Williams
October 10, 2022
Hi, Im trying to import the data but I have a message saying the file is not in my current working directory ,library(tidyverse) > library(skimr) > #open up data file use code below > faketucky <-read_csv("data/faketucky.csv") Error: 'data/faketucky.csv' does not exist in current working directory
not sure how to change this?
Thank you
Charlie Hadley
October 10, 2022
Hello Amy,
This is a common error - don't worry. It's likely due to you being in the wrong project. Can I check that you're in the Getting Started project. The top right hand corner of RStudio tells you which project you're in.
Thanks - Charlie
Amy Williams
October 10, 2022
Hi charlie, so In the right hand corner it says im not currently in any project, im not sure how I've managed this I was following along and opened up the getting started package ect but I'm not in any project? sorry im not sure what i've done but thanks for the help!
Amy
Charlie Hadley
October 11, 2022
Hi Amy,
Thanks for replying. In the earlier video in the course David talks about RStudio projects as being useful for setting working directories. Please download the project for this exercise and ensure it's open by looking for "getting-started-with-r" in the top-right hand corner of RStudio.
Thanks, Charlie
Hani Alnakhli
January 18, 2023
Hi David, I have got this text! Enter an item from the menu, or 0 to exit not pretty sure what was my mistake
David Keyes
January 18, 2023
Were you trying to install a package? This sounds like a message that happens when you're trying to install a package.
Hani Alnakhli
January 19, 2023
yes, I was using the instal code and the library code, but none of them worked out
David Keyes
January 19, 2023
Ok, I'm a bit unclear at this point what the problem you're having is. Are you able to complete the lesson? If not, can you please clarify what exactly is holding you back?
Mike Horton
March 9, 2023
Hi, I'm not sure what is going wrong here, but I am getting this error message in response to my syntax - please note that I am putting a < and then a - in the syntax, but it convert this into an arrow when I type them within this question box faketucky <- read_csv(“data/faketucky.csv”) Error: unexpected input in "faketucky <- read_csv(“"
Any ideas? Thanks!
Mike Horton
March 9, 2023
OK, ignore the bit about the arrows. The published text here is different from how it appears in the question box at the top of the page. Repeating the original error message:
> faketucky <- read_csv(“data/faketucky.csv”) Error: unexpected input in "faketucky <- read_csv(“"
Mike Horton
March 9, 2023
It's OK, I got it to work, but I am putting the solution here as it might be useful to someone else. Here is the syntax that didn't work: faketucky <- read_csv(“data/faketucky.csv”)
And here is the syntax that did work: faketucky <- read_csv("data/faketucky.csv")
Notice any difference? No, me neither! At least not initially. The top line (which didn't work) I copied and pasted from Word (as I was making notes in Word). The bottom line (which did work) I just typed directly into RStudio. It seems that there is a difference in the speech marks that are used when text is pasted in from Word compared to direct typing: “these” Vs "these"
David Keyes
March 9, 2023
You're right that it makes a big difference whether you use regular quotes versus smart quotes. And, unfortunately, the commenting system here tries to convert the former to the latter (we're making changes to the website soon that will eliminate this problem). Great job figuring out the issue!
Mercy Abarike
March 26, 2023
I get this feedback anytime I try importing the faketucky data faketucky <-read_csv("data/faketucky.csv") Error: 'data/faketucky.csv' does not exist in current working directory ('C:/Users/Mrs.Mercy/OneDrive/Desktop/Nat 1').
ashwath gadapa
April 26, 2023
Hi David ,
I'm unable to load read_csv function . i have the below log for your reference
> install.packages("skimr") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/ Installing package into ‘C:/Users/Admin/AppData/Local/R/win-library/4.3’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.3/skimr_2.1.5.zip' Content type 'application/zip' length 1236705 bytes (1.2 MB) downloaded 1.2 MB
package ‘skimr’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in C:\Users\Admin\AppData\Local\Temp\Rtmpm8G3tz\downloaded_packages > library(skimr) > library(skimr) > faketucky faketucky <- read_csv("data/faektucky.csv") Error in read_csv("data/faektucky.csv") : could not find function "read_csv"
David Keyes
April 26, 2023
You need to add the line
library(tidyverse)
and run it in order to use theread_csv()
function. Try that and let me know if it fixes things!Tuhin CHATURVEDI
April 29, 2023
For Posit Cloud Users: Posit Cloud allows us to go to the file "faketucky.csv". When we left-click on the file, it gives us an option to "Import Dataset". When we choose "Import Dataset", it loads the (readr) package [via library(readr)] and then automatically imports faketucky.csv using the self-generated code [faketucky < - read_csv("~/getting-started-master/data/faketucky.csv")]. Very neat!
David Keyes
April 30, 2023
Yes, that works on RStudio Desktop as well. Moving forward, though, I would work on installing packages yourself using
install.packages()
because you won't always get this kind of help from RStudio Desktop or Posit Cloud.Gabriela Elizondo
June 30, 2023
Hi David, I cannot get it to work. I have tried restarting R, loading the packages and it keeps giving me warnings and errors. Restarting R session...
> install.packages("tidyverse") Installing package into ‘/cloud/lib/x86_64-pc-linux-gnu-library/4.3’ (as ‘lib’ is unspecified) trying URL 'http://rspm/default/linux/focal/latest/src/contrib/tidyverse_2.0.0.tar.gz' Content type 'application/x-gzip' length 425237 bytes (415 KB)
downloaded 415 KB
The downloaded source packages are in ‘/tmp/RtmpdYRVTq/downloaded_packages’ > faketucky load("/cloud/home/r2101164/getting-started-master/data/faketucky.csv") Error in load("/cloud/home/r2101164/getting-started-master/data/faketucky.csv") : bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file ‘faketucky.csv’ has magic number 'stude' Use of save versions prior to 2 is deprecated > install.packages(“tidyverse”)
Gabriela Elizondo
June 30, 2023
Tuhin CHATURVEDI's comment worked! Thank you!
David Keyes
June 30, 2023
Glad it worked!
Maia Volk
August 25, 2023
Hi David,
I'm having an issue loading the tidyverse package. I keep getting this message:
Error: package or namespace load failed for ‘tidyverse’: .onAttach failed in attachNamespace() for 'tidyverse', details: call: NULL error: package or namespace load failed for ‘ggplot2’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘fansi’
Can you help me? Thank you!
David Keyes
August 25, 2023
What happened is that, when you tried to install the
tidyverse
package, one of its dependency packages (packages that the tidyverse needs to run) did not install correctly. I'd manually try to install that package using this code:Try that and let me know if it helps.
Valerie Kaster
September 10, 2023
I am getting an error. I went back and started over to make sure I did all the steps and same response.
help please
David Keyes
September 11, 2023
When did you start the course? I made some changes to it recently that may be confusing you because you may have watched old lessons previously.
Archana Joshi
September 13, 2023
My current working directory that R Studio shows is C:\users\username\Documents
When I created a new R script file - import and followed the above steps to read the penguins file, it gives me an error - 'penguins_data.csv' does not exist in current working directory ('C:/Users/Rajeev Joshi/Documents'). I saved the import.R in getting-started-main folder.
How do I change the current working directory to getting-started-main?
Please help
Libby Heeren
September 15, 2023
Hi, Archana! As long as you're inside an R Project, your working directory will be the project, so make sure you're inside the getting-started-main project before typing the library and read_csv code into your import.R file (which is saved in the project folder).
I made a short video to demonstrate what it should look like.
Bhumika Bhattacharya
September 18, 2023
I have installed tidyverse packagebut when I am running the code read.csv("penguins_data.csv") it is showing this on the console:
Bhumika Bhattacharya
September 18, 2023
it is working for the tibble only