Skip to content
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  1. Create a new R script file and save it as import.R

  2. Add the line library(tidyverse) at the top of your R script file and run it to load the tidyverse package.

  3. Use the read_csv() function (not read.csv()) to import the penguins_data.csv file

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

S. Revi Sterling

S. Revi Sterling

March 17, 2021

> faketucky <- read.csv(data/faketucky.csv) Error in read.table(file = file, header = header, sep = sep, quote = quote, : object 'faketucky.csv' not found

S. Revi Sterling

S. Revi Sterling

March 17, 2021

i reloaded the packages... so confused!

David Keyes

David Keyes

March 17, 2021

Two things here:

  1. the data/faketucky.csv needs to be in quotes. Your code should look like this:

faketucky <- read.csv("data/faketucky.csv")

  1. Make sure to use the read_csv() function, not read.csv() (note the _ between read and csv). read.csv() won't work with some things you'll do in a few lessons.

Atlang Mompe

Atlang Mompe

March 29, 2021

Hi David,

In your example you have double quotes around your syntax, but it wont work on my computer (using windows), unless I have single quotes, is that normal? This is the code that works for me: faketucky <-read_csv ('data/faketucky.csv')

David Keyes

David Keyes

March 29, 2021

My guess is that when you use double quotes they are being converted into "smart quotes". Do double quotes work in other places for you? If you try to install the tidyverse, for example, does this work?

install.packages("tidyverse")

Faythe Aiken

Faythe Aiken

March 30, 2021

Hi David - I'm unable to load the read_csv function from tidyverse. When trying to install the tidyverse package, I get the following failure to download either the binary or source files. What's puzzling is I can download them directly in my browser but in R Studio. > install.packages("vctrs", type="binary") Installing package into ‘\pdcnt19/AikenF$/My Documents/R-local’ (as ‘lib’ is unspecified)

There is a binary version available (and will be installed) but the source version is later: binary source vctrs 0.3.6 0.3.7

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/vctrs_0.3.6.zip' Warning in install.packages : InternetOpenUrl failed: 'The operation timed out' Error in download.file(url, destfile, method, mode = "wb", ...) : cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/vctrs_0.3.6.zip' Warning in install.packages : download of package ‘vctrs’ failed > install.packages("vctrs", type="source") Installing package into ‘\pdcnt19/AikenF$/My Documents/R-local’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/src/contrib/vctrs_0.3.7.tar.gz' Warning in install.packages : InternetOpenUrl failed: 'The operation timed out' Error in download.file(url, destfile, method, mode = "wb", ...) : cannot open URL 'https://cran.rstudio.com/src/contrib/vctrs_0.3.7.tar.gz' Warning in install.packages : download of package ‘vctrs’ failed

David Keyes

David Keyes

March 30, 2021

This is almost certainly an issue with where it's trying to install packages. I made a video to help you fix it. Let me know if that works!

Faythe Aiken

Faythe Aiken

March 31, 2021

Thanks, David. Unfortunately, this didn't fix it. I think I'll just give up on my work PC and switch to Mac for this course. I'm stumped and it may be from some network complications.

I can't figure out why, but the keep throwing the following error code: Error: object 'faketucky' not found Here is what I have done:> library(tidyverse) -- Attaching packages ---------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() > library(skimr) > faketucky->read_csv("faketucky.csv") Error: object 'faketucky' not found > setwd("C:/Users/ArcticFox/Desktop/getting-started-master/data") > faketucky->read_csv("faketucky.csv") Error: object 'faketucky' not found

And it doesn't work if I put the arrow going in the right direction either. I have used R pretty regularly and tried several things with this, but for some reason, I really can't get it to open the file.

David Keyes

David Keyes

March 31, 2021

Did you open the Getting Started project? If you do that, you shouldn't need to use the setwd() line.

That was just something I was trying to see if it working. I have tried various permutations and nothing has worked. Sometimes R can be glitchy, so I tried it again this morning after the computer has been shut off and restarted. Now I am getting this: > faketucky<-read_csv("faketucky.csv") Error in read_csv("faketucky.csv") : could not find function "read_csv"

David Keyes

David Keyes

March 31, 2021

Have you loaded the tidyverse before running that line? You need to run:

library(tidyverse)

If you don't do that in this session of R, you won't have access to the read_csv() function.

I did a couple times, but I can try running it again. I moved on to the Fundamentals course and am finding that the console is not responding to (read.docx) either. Is this also something that tidyverse should take care of? I don't typically work using those programs, so maybe this is what is throwing me off?

> library(tidyverse) -- Attaching packages ---------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() > faketucky<-read_csv("faketucky.csv") Error: 'faketucky.csv' does not exist in current working directory ('C:/Users/ArcticFox/Desktop/getting-started-master').

After some investigation, I realized that what I needed to do was move the "getting-started" material out of my desktop (where the system automatically dumped it). R seems to have been locked into my "Documents" folder and was not able to read higher in my computer indexes. I then had to set my working directory to the data folder containing faketucky.csv. That seems to have fixed the problem.

David Keyes

David Keyes

March 31, 2021

Ok, glad you got it figured out! FYI, you'll have more success if you open the project, which sets your working directory to the root of the folder where it's located. See this article from Jenny Bryan for an explanation.

Lisa Janz

Lisa Janz

April 1, 2021

Thanks for this. That was how I started doing it - opening the project, but it my system couldn't access it when it was on my desktop. I think there is something going on with how R-Studio is accessing my directories. When I wrote "read.docx" earlier, it was in relation to the Fundamentals program. It was not just with this set of exercises... R was not reading any of the projects properly. I now realize that this was a directory problem.

Josh Rodriguez

Josh Rodriguez

May 14, 2021

Hey David, It appears I am getting the common issue noted in the comments here. That "faketucky does not exist in the current working directory." I looked at your response as a way to resolve the matter but it doesn't appear that faketucky is in my Rproj by default. This is where my R session is attempting to pull the data from by default

Josh Rodriguez

Josh Rodriguez

May 14, 2021

Never mind! I have a second R console open from downloading the course project and that isn't where I needed to be for uploading the faketucky dataset

David Keyes

David Keyes

May 14, 2021

Glad you got it figured out! Yes, each RStudio project sets the working directory, which is why it only worked on the one where you downloaded the course project.

Scott Clark

Scott Clark

July 18, 2021

Hi David. Tidyverse was installed and loaded. I could see and use read.csv, but not read_csv. I noticed readr wasn't listed in the packages:

> library(tidyverse) -- Attaching packages ---------------------------------------------------- tidyverse 1.3.1 -- v ggplot2 3.3.5 v dplyr 1.0.7 v tibble 3.1.2 v stringr 1.4.0 v tidyr 1.1.3 v forcats 0.5.1 v purrr 0.3.4

I was able to install and load readr separately to get around this, but is there a reason why it might not have installed with the rest of tidyverse? Could I be missing any other packages that I might need later?

David Keyes

David Keyes

July 19, 2021

That's super odd! I've never seen this before. I don't know exactly why this is happening. If it were me, I'd just try reinstalling tidyverse using install.packages("tidyverse"). Let me know if that helps!

Christine Mahoney

Christine Mahoney

August 21, 2021

Difficult having issues. I keep receiving Error: 'faketucky.csv' does not exist in current working directory ('/Users/christinemahoney/Desktop/getting-started-master').

Christine Mahoney

Christine Mahoney

August 21, 2021

also tired ("data/faketucky.csv")

David Keyes

David Keyes

August 23, 2021

Are you sure you're working in a project? For example, here's a screenshot that shows I'm working in a project. Do you have something in the upper right corner of RStudio?

Prince Baawuah

Prince Baawuah

October 11, 2021

I mostly work with very very large datasets. Are there any packages and/or tips on how to efficiently import and work with very very large datasets quickly(e.g. if parallel processing?) on the desktop?

Charlie Hadley

Charlie Hadley

October 12, 2021

Hi Prince! Thanks for asking this, as always when working with large data the answer depends on two things:

  • How large are your data? 1Gb? 10Gb? 100+Gb?
  • What type of analysis are you doing - do you need to operate on the entire dataset or can you iteratively apply a stat across a file that you're streaming in?

The readr::read_csv() function was NOT built for speed. The vroom::vroom() function is designed for speed and is significantly faster than the native {data.table} import functions, see the table at the top of this page https://vroom.r-lib.org/. See here for more thorough benchmarking.

This is assuming that you can fit the entire dataset in memory, which I choose to translate into datasets smaller than 10Gb. If your data is larger than this it likely makes sense to convert your data into an SQLite database and operate on that. This blogpost takes you through this entire process, including how to create a database in R and to use dplyr to operate on the database.

With files of this size there are going to be computation bottlenecks that parallel processing can help with. The tidyverse includes a functional programming paradigm through the {purrr} package, this can be parallelised through the excellent {furrr} package. I'm assuming some familiarity with programming terminology as you mentioned parallel processing.

For truly large datasets (100Gb+) I don't have real-world experience in using these. If your data is that large it is probably useful to know more about the structure of your datasets and the types of analysis you're planning on doing.

Lukas Harringer

Lukas Harringer

March 10, 2022

Hi, when I run the read_csv function, the data appears in my Console not in the Environment section.

Charlie Hadley

Charlie Hadley

March 11, 2022

Hello Lukas,

I suspect that you've missed off the assignment, which is a very common mistake for both new and experienced R users. Here's a short video demonstrating what I think is happening for you. Let me know if the issue is something else. Cheers, Charlotte

Michael Steinhoff

Michael Steinhoff

March 17, 2022

could not find function "read_csv". Looked back at error code from loading tidyverse and have this: ** Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j = 0.7.6 is required ** Seems like something is not up to date, but i'm not sure what

Michael Steinhoff

Michael Steinhoff

March 17, 2022

self-solved! well, with a little help from stack overflow...

Charlie Hadley

Charlie Hadley

March 17, 2022

Awesome to hear! Good googling.

Jessica Brewer

Jessica Brewer

October 5, 2022

What is meant by "the working directory"? The main folder in the Files environment?

Charlie Hadley

Charlie Hadley

October 5, 2022

The working directory is a folder on your machine that R is currently looking inside of for file paths. It's beneficial as it allows us to use a relative file path, eg data/my-data.csv instead of an absolute file path like C:/Users/charliejhadley/documents/r/2022/analysis/data/my-data.csv

These relative file paths are convenient for us when writing code and for making our code more transportable and reproducible. RStudio projects work by setting the working directory to the folder with the .Rproj file inside of it.

Cheers, Charlie

Amy Williams

Amy Williams

October 10, 2022

Hi, Im trying to import the data but I have a message saying the file is not in my current working directory ,library(tidyverse) > library(skimr) > #open up data file use code below > faketucky <-read_csv("data/faketucky.csv") Error: 'data/faketucky.csv' does not exist in current working directory

not sure how to change this?

Thank you

Charlie Hadley

Charlie Hadley

October 10, 2022

Hello Amy,

This is a common error - don't worry. It's likely due to you being in the wrong project. Can I check that you're in the Getting Started project. The top right hand corner of RStudio tells you which project you're in.

Thanks - Charlie

Amy Williams

Amy Williams

October 10, 2022

Hi charlie, so In the right hand corner it says im not currently in any project, im not sure how I've managed this I was following along and opened up the getting started package ect but I'm not in any project? sorry im not sure what i've done but thanks for the help!

Amy

Charlie Hadley

Charlie Hadley

October 11, 2022

Hi Amy,

Thanks for replying. In the earlier video in the course David talks about RStudio projects as being useful for setting working directories. Please download the project for this exercise and ensure it's open by looking for "getting-started-with-r" in the top-right hand corner of RStudio.

Thanks, Charlie

Hani Alnakhli

Hani Alnakhli

January 18, 2023

Hi David, I have got this text! Enter an item from the menu, or 0 to exit not pretty sure what was my mistake

David Keyes

David Keyes

January 18, 2023

Were you trying to install a package? This sounds like a message that happens when you're trying to install a package.

Hani Alnakhli

Hani Alnakhli

January 19, 2023

yes, I was using the instal code and the library code, but none of them worked out

David Keyes

David Keyes

January 19, 2023

Ok, I'm a bit unclear at this point what the problem you're having is. Are you able to complete the lesson? If not, can you please clarify what exactly is holding you back?

Mike Horton

Mike Horton

March 9, 2023

Hi, I'm not sure what is going wrong here, but I am getting this error message in response to my syntax - please note that I am putting a < and then a - in the syntax, but it convert this into an arrow when I type them within this question box faketucky <- read_csv(“data/faketucky.csv”) Error: unexpected input in "faketucky <- read_csv(“"

Any ideas? Thanks!

Mike Horton

Mike Horton

March 9, 2023

OK, ignore the bit about the arrows. The published text here is different from how it appears in the question box at the top of the page. Repeating the original error message:

> faketucky <- read_csv(“data/faketucky.csv”) Error: unexpected input in "faketucky <- read_csv(“"

Mike Horton

Mike Horton

March 9, 2023

It's OK, I got it to work, but I am putting the solution here as it might be useful to someone else. Here is the syntax that didn't work: faketucky <- read_csv(“data/faketucky.csv”)

And here is the syntax that did work: faketucky <- read_csv("data/faketucky.csv")

Notice any difference? No, me neither! At least not initially. The top line (which didn't work) I copied and pasted from Word (as I was making notes in Word). The bottom line (which did work) I just typed directly into RStudio. It seems that there is a difference in the speech marks that are used when text is pasted in from Word compared to direct typing: “these” Vs "these"

David Keyes

David Keyes

March 9, 2023

You're right that it makes a big difference whether you use regular quotes versus smart quotes. And, unfortunately, the commenting system here tries to convert the former to the latter (we're making changes to the website soon that will eliminate this problem). Great job figuring out the issue!

Mercy Abarike

Mercy Abarike

March 26, 2023

I get this feedback anytime I try importing the faketucky data faketucky <-read_csv("data/faketucky.csv") Error: 'data/faketucky.csv' does not exist in current working directory ('C:/Users/Mrs.Mercy/OneDrive/Desktop/Nat 1').

ashwath gadapa

ashwath gadapa

April 26, 2023

Hi David ,

I'm unable to load read_csv function . i have the below log for your reference

> install.packages("skimr") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/ Installing package into ‘C:/Users/Admin/AppData/Local/R/win-library/4.3’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.3/skimr_2.1.5.zip' Content type 'application/zip' length 1236705 bytes (1.2 MB) downloaded 1.2 MB

package ‘skimr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in C:\Users\Admin\AppData\Local\Temp\Rtmpm8G3tz\downloaded_packages > library(skimr) > library(skimr) > faketucky faketucky <- read_csv("data/faektucky.csv") Error in read_csv("data/faektucky.csv") : could not find function "read_csv"

David Keyes

David Keyes

April 26, 2023

You need to add the line library(tidyverse) and run it in order to use the read_csv() function. Try that and let me know if it fixes things!

Tuhin CHATURVEDI

Tuhin CHATURVEDI

April 29, 2023

For Posit Cloud Users: Posit Cloud allows us to go to the file "faketucky.csv". When we left-click on the file, it gives us an option to "Import Dataset". When we choose "Import Dataset", it loads the (readr) package [via library(readr)] and then automatically imports faketucky.csv using the self-generated code [faketucky < - read_csv("~/getting-started-master/data/faketucky.csv")]. Very neat!

David Keyes

David Keyes

April 30, 2023

Yes, that works on RStudio Desktop as well. Moving forward, though, I would work on installing packages yourself using install.packages() because you won't always get this kind of help from RStudio Desktop or Posit Cloud.

Gabriela Elizondo

Gabriela Elizondo

June 30, 2023

Hi David, I cannot get it to work. I have tried restarting R, loading the packages and it keeps giving me warnings and errors. Restarting R session...

> install.packages("tidyverse") Installing package into ‘/cloud/lib/x86_64-pc-linux-gnu-library/4.3’ (as ‘lib’ is unspecified) trying URL 'http://rspm/default/linux/focal/latest/src/contrib/tidyverse_2.0.0.tar.gz' Content type 'application/x-gzip' length 425237 bytes (415 KB)

downloaded 415 KB

  • installing binary package ‘tidyverse’ ...
  • DONE (tidyverse)

The downloaded source packages are in ‘/tmp/RtmpdYRVTq/downloaded_packages’ > faketucky load("/cloud/home/r2101164/getting-started-master/data/faketucky.csv") Error in load("/cloud/home/r2101164/getting-started-master/data/faketucky.csv") : bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file ‘faketucky.csv’ has magic number 'stude' Use of save versions prior to 2 is deprecated > install.packages(“tidyverse”)

Gabriela Elizondo

Gabriela Elizondo

June 30, 2023

Tuhin CHATURVEDI's comment worked! Thank you!

David Keyes

David Keyes

June 30, 2023

Glad it worked!

Hi David,

I'm having an issue loading the tidyverse package. I keep getting this message:

Error: package or namespace load failed for ‘tidyverse’: .onAttach failed in attachNamespace() for 'tidyverse', details: call: NULL error: package or namespace load failed for ‘ggplot2’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘fansi’

Can you help me? Thank you!

David Keyes

David Keyes

August 25, 2023

What happened is that, when you tried to install the tidyverse package, one of its dependency packages (packages that the tidyverse needs to run) did not install correctly. I'd manually try to install that package using this code:

install.packages("fansi")

Try that and let me know if it helps.

Valerie Kaster

Valerie Kaster

September 10, 2023

I am getting an error. I went back and started over to make sure I did all the steps and same response.

read.csv("penquins_data.csv") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'penquins_data.csv': No such file or directory

help please

When did you start the course? I made some changes to it recently that may be confusing you because you may have watched old lessons previously.

Archana Joshi

Archana Joshi

September 13, 2023

My current working directory that R Studio shows is C:\users\username\Documents

When I created a new R script file - import and followed the above steps to read the penguins file, it gives me an error - 'penguins_data.csv' does not exist in current working directory ('C:/Users/Rajeev Joshi/Documents'). I saved the import.R in getting-started-main folder.

How do I change the current working directory to getting-started-main?

Please help

Hi, Archana! As long as you're inside an R Project, your working directory will be the project, so make sure you're inside the getting-started-main project before typing the library and read_csv code into your import.R file (which is saved in the project folder).

I made a short video to demonstrate what it should look like.

Bhumika Bhattacharya

Bhumika Bhattacharya

September 18, 2023

Pending approval

I have installed tidyverse packagebut when I am running the code read.csv("penguins_data.csv") it is showing this on the console:

read.csv("penguins_data.csv") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'penguins_data.csv': No such file or directory

Bhumika Bhattacharya

Bhumika Bhattacharya

September 18, 2023

Pending approval

it is working for the tibble only