Import Our Data Again

This lesson is called Import Our Data Again, part of the R in 3 Months (Spring 2025) course. This lesson is called Import Our Data Again, part of the R in 3 Months (Spring 2025) course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Adjust your read_csv() code so that you import the data again
Use the na argument to tell read_csv() what data should be treated as missing
Use the col_types argument to make sure that sex_v2 gets imported as character data

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Betsy Dalton • September 15, 2023

Hm, when I try to run penguins_data <- read_csv("penguins_data.csv", na = "-999")

I keep getting the following error:

Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : 
  invalid first argument
Error in assign(cacheKey, frame, .rs.CachedDataEnv) : 
  attempt to use zero-length variable name

David Keyes Founder • September 15, 2023

This appears to be a bug in RStudio (others have seen the same thing, as have I). You can safely ignore it. If you update RStudio in a few weeks, my guess is it will be fixed.

Betsy Dalton • September 15, 2023

Thanks!

Alberto Cabrera • November 5, 2023

penguins <- read_csv("https://raw.githubusercontent.com/rfortherestofus/rin3-fall-2023/main/data-raw/penguins.csv")

Retrieves a csv file from a GitHub account, subfolder data-raw and creates a data frame labeled penguins

Kiara Sanchez • September 17, 2023

Pending approval

I keep running my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c") with teh skim function but the only column that changes to NA is the sex column.

Kiara Sanchez • September 17, 2023

Pending approval

Nevermind all fixed

Shubhra Murarka • March 1, 2024

Error when running: penguins_data <- read_csv("penguins_data.csv",

```
                      na = "-999")
```

Error in base::nchar(wide_chars$test, type = "width") :
lazy-load database '/Users/shubhra/Library/R/x86_64/4.1/library/cli/R/sysdata.rdb' is corrupt In addition: Warning messages: 1: In base::nchar(wide_chars$test, type = "width") : restarting interrupted promise evaluation 2: In base::nchar(wide_chars$test, type = "width") : internal error -3 in R_decompress1

David Keyes Founder • March 1, 2024

Hmm, I'm not quite sure what's going on. One quick question, though: did you just install R/RStudio for this course or had you installed it previously?

Alyssa Jeffers • March 14, 2024

Hi there, when I ran the argument for changing sex_v2 to character data, I noticed that col_types showed up as a new data item in my Data Environment, but I noticed on your demo screen, it didn't show up as an additional item after running the code, you only have the penguins_data. Should this not have happened? This was my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c"))

David Keyes Founder • March 14, 2024

Very strange! I've not seen that. Can you record a quick video using this and show me what you're seeing? Please email me after you upload the video so I know to look for it.

David Keyes Founder • March 19, 2024

Ok, I've watched your video and I've got a solution for you! Check out this video. Let me know if this helps!

Alyssa Jeffers • March 20, 2024

Got it, thanks for explaining that! I bet that's what I did.

Sandra Virgo • March 20, 2024

When I run the read_csv() code again to deal with the -999 values, it does not completely work.

Looking at the data using view(penguins_data), I still have some -999 values in case 4, as well as some -999.0 values in case 4.

I have adapted the code to read read_csv("penguins_data.csv", na = "-999, -999.0") to deal with the ones with the decimal point and the zero, but even after that there are still these issues, including in sex_v2

I can see that in sex there are some NA values now, which means the code has partially worked, I guess.

Apologies, I don't seem to be able to get a screenshot in here.

Libby Heeren Coach • March 20, 2024

Hi, Sandra! I made a short video going over the process of replacing values with NA in the data. Please take a look at it and let me know if you have any questions! https://muse.ai/v/4i7KhQx

Recap: -999 and -999.0 are the same value, just displayed differently. When using the na argument, you can either use one value inside quotations, like "-999", or two values inside the c() function, each in their own set of quotation marks, like this: c("-999", "NA")

Sandra Virgo • March 20, 2024

Thanks so much, Libby. The moment you said that the -999.0 values were the same as the -999 values it got me back on the right track again. I thought I had tried every combination of solutions but I clearly hadn't tried the easiest one. The video was very helpful - thanks so much!

Libby Heeren Coach • March 20, 2024

Woo! That's a win! So glad it helped, and thanks for asking questions! They help everyone :)

Puspanjali Gurung • July 31, 2024

When I try to run the following command, it doesn't omit the -999 from the table. Why is this the case?

penguins_data <- read_csv("penguins_data.csv", na = "-999")

David Keyes Founder • August 1, 2024

It's a bit hard to say. If you want to record a video to show me what's happening I can take a look. Please do that here.

Caroline Kypson • September 4, 2024

The -999 in my table did not disappear after I ran penguins_data <- read_csv("penguins_data.csv", na = "-999"). It did disappear in the skim data though.

David Keyes Founder • September 4, 2024

Can you clarify what you mean by your "table"?

Vivian KAWANAMI • September 19, 2024

hi David and Gracielle! Quick Q: when writing the code for read_csv and all the embedded functions, the line becomes too long to read and I've noticed that David uses a shortcut to add the new functions (e.g., na and then col_types) to lines further below. Should I just press enter or is there another way of parsing the line? Thanks!

Gracielle Higino Coach • September 20, 2024

Olá Vivian! Yes, hitting enter is all you need to add a new line within your function. Notice that this only works if you're working on a text file like "import.R", as shown in the video (i.e., your script). If you're working on the console, to add a new line you might need to use shift+enter, and the console will show a plus sign instead of a > at the beginning of the new line. This is to mean that you still need to add something to finish the command!