Skip to content
R for the Rest of Us: A Statistics-Free Introduction comes out June 25th. Or you can read the online version today. Check it out →
R for the Rest of Us Logo

Import Our Data Again

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  1. Adjust your read_csv() code so that you import the data again

  2. Use the na argument to tell read_csv() what data should be treated as missing

  3. Use the col_types argument to make sure that sex_v2 gets imported as character data

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Hm, when I try to run penguins_data <- read_csv("penguins_data.csv", na = "-999")

I keep getting the following error:

Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : 
  invalid first argument
Error in assign(cacheKey, frame, .rs.CachedDataEnv) : 
  attempt to use zero-length variable name

David Keyes

David Keyes Founder

September 15, 2023

This appears to be a bug in RStudio (others have seen the same thing, as have I). You can safely ignore it. If you update RStudio in a few weeks, my guess is it will be fixed.

Thanks!

Alberto Cabrera

Alberto Cabrera

November 5, 2023

penguins <- read_csv("https://raw.githubusercontent.com/rfortherestofus/rin3-fall-2023/main/data-raw/penguins.csv")

Retrieves a csv file from a GitHub account, subfolder data-raw and creates a data frame labeled penguins

Kiara Sanchez

Kiara Sanchez

September 17, 2023

Pending approval

I keep running my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c") with teh skim function but the only column that changes to NA is the sex column.

Kiara Sanchez

Kiara Sanchez

September 17, 2023

Pending approval

Nevermind all fixed

Shubhra Murarka

Shubhra Murarka

March 1, 2024

Error when running: penguins_data <- read_csv("penguins_data.csv",

  •                       na = "-999")
    

Error in base::nchar(wide_chars$test, type = "width") :
lazy-load database '/Users/shubhra/Library/R/x86_64/4.1/library/cli/R/sysdata.rdb' is corrupt In addition: Warning messages: 1: In base::nchar(wide_chars$test, type = "width") : restarting interrupted promise evaluation 2: In base::nchar(wide_chars$test, type = "width") : internal error -3 in R_decompress1

David Keyes

David Keyes Founder

March 1, 2024

Hmm, I'm not quite sure what's going on. One quick question, though: did you just install R/RStudio for this course or had you installed it previously?

Alyssa Jeffers

Alyssa Jeffers

March 14, 2024

Hi there, when I ran the argument for changing sex_v2 to character data, I noticed that col_types showed up as a new data item in my Data Environment, but I noticed on your demo screen, it didn't show up as an additional item after running the code, you only have the penguins_data. Should this not have happened? This was my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c"))

David Keyes

David Keyes Founder

March 14, 2024

Very strange! I've not seen that. Can you record a quick video using this and show me what you're seeing? Please email me after you upload the video so I know to look for it.

David Keyes

David Keyes Founder

March 19, 2024

Ok, I've watched your video and I've got a solution for you! Check out this video. Let me know if this helps!

Alyssa Jeffers

Alyssa Jeffers

March 20, 2024

Got it, thanks for explaining that! I bet that's what I did.

Sandra Virgo

Sandra Virgo

March 20, 2024

When I run the read_csv() code again to deal with the -999 values, it does not completely work.

Looking at the data using view(penguins_data), I still have some -999 values in case 4, as well as some -999.0 values in case 4.

I have adapted the code to read read_csv("penguins_data.csv", na = "-999, -999.0") to deal with the ones with the decimal point and the zero, but even after that there are still these issues, including in sex_v2

I can see that in sex there are some NA values now, which means the code has partially worked, I guess.

Apologies, I don't seem to be able to get a screenshot in here.

Libby Heeren

Libby Heeren Coach

March 20, 2024

Hi, Sandra! I made a short video going over the process of replacing values with NA in the data. Please take a look at it and let me know if you have any questions! https://muse.ai/v/4i7KhQx

Recap: -999 and -999.0 are the same value, just displayed differently. When using the na argument, you can either use one value inside quotations, like "-999", or two values inside the c() function, each in their own set of quotation marks, like this: c("-999", "NA")

Sandra Virgo

Sandra Virgo

March 20, 2024

Thanks so much, Libby. The moment you said that the -999.0 values were the same as the -999 values it got me back on the right track again. I thought I had tried every combination of solutions but I clearly hadn't tried the easiest one. The video was very helpful - thanks so much!

Libby Heeren

Libby Heeren Coach

March 20, 2024

Woo! That's a win! So glad it helped, and thanks for asking questions! They help everyone :)