Import Our Data Again

This lesson is called Import Our Data Again, part of the Getting Started With R course. This lesson is called Import Our Data Again, part of the Getting Started With R course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Adjust your read_csv() code so that you import the data again
Use the na argument to tell read_csv() what data should be treated as missing
Use the col_types argument to make sure that sex_v2 gets imported as character data

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Betsy Dalton • September 15, 2023

Hm, when I try to run penguins_data <- read_csv("penguins_data.csv", na = "-999")

I keep getting the following error:

Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : 
  invalid first argument
Error in assign(cacheKey, frame, .rs.CachedDataEnv) : 
  attempt to use zero-length variable name

David Keyes Founder • September 15, 2023

This appears to be a bug in RStudio (others have seen the same thing, as have I). You can safely ignore it. If you update RStudio in a few weeks, my guess is it will be fixed.

Betsy Dalton • September 15, 2023

Thanks!

Alberto Cabrera • November 5, 2023

penguins <- read_csv("https://raw.githubusercontent.com/rfortherestofus/rin3-fall-2023/main/data-raw/penguins.csv")

Retrieves a csv file from a GitHub account, subfolder data-raw and creates a data frame labeled penguins

Kiara Sanchez • September 17, 2023

Pending approval

I keep running my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c") with teh skim function but the only column that changes to NA is the sex column.

Kiara Sanchez • September 17, 2023

Pending approval

Nevermind all fixed

Shubhra Murarka • March 1, 2024

Error when running: penguins_data <- read_csv("penguins_data.csv",

```
                      na = "-999")
```

Error in base::nchar(wide_chars$test, type = "width") :
lazy-load database '/Users/shubhra/Library/R/x86_64/4.1/library/cli/R/sysdata.rdb' is corrupt In addition: Warning messages: 1: In base::nchar(wide_chars$test, type = "width") : restarting interrupted promise evaluation 2: In base::nchar(wide_chars$test, type = "width") : internal error -3 in R_decompress1

David Keyes Founder • March 1, 2024

Hmm, I'm not quite sure what's going on. One quick question, though: did you just install R/RStudio for this course or had you installed it previously?

Alyssa Jeffers • March 14, 2024

Hi there, when I ran the argument for changing sex_v2 to character data, I noticed that col_types showed up as a new data item in my Data Environment, but I noticed on your demo screen, it didn't show up as an additional item after running the code, you only have the penguins_data. Should this not have happened? This was my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c"))

David Keyes Founder • March 14, 2024

Very strange! I've not seen that. Can you record a quick video using this and show me what you're seeing? Please email me after you upload the video so I know to look for it.

David Keyes Founder • March 19, 2024

Ok, I've watched your video and I've got a solution for you! Check out this video. Let me know if this helps!

Alyssa Jeffers • March 20, 2024

Got it, thanks for explaining that! I bet that's what I did.

Sandra Virgo • March 20, 2024

When I run the read_csv() code again to deal with the -999 values, it does not completely work.

Looking at the data using view(penguins_data), I still have some -999 values in case 4, as well as some -999.0 values in case 4.

I have adapted the code to read read_csv("penguins_data.csv", na = "-999, -999.0") to deal with the ones with the decimal point and the zero, but even after that there are still these issues, including in sex_v2

I can see that in sex there are some NA values now, which means the code has partially worked, I guess.

Apologies, I don't seem to be able to get a screenshot in here.

Libby Heeren Coach • March 20, 2024

Hi, Sandra! I made a short video going over the process of replacing values with NA in the data. Please take a look at it and let me know if you have any questions! https://muse.ai/v/4i7KhQx

Recap: -999 and -999.0 are the same value, just displayed differently. When using the na argument, you can either use one value inside quotations, like "-999", or two values inside the c() function, each in their own set of quotation marks, like this: c("-999", "NA")

Sandra Virgo • March 20, 2024

Thanks so much, Libby. The moment you said that the -999.0 values were the same as the -999 values it got me back on the right track again. I thought I had tried every combination of solutions but I clearly hadn't tried the easiest one. The video was very helpful - thanks so much!

Libby Heeren Coach • March 20, 2024

Woo! That's a win! So glad it helped, and thanks for asking questions! They help everyone :)

Puspanjali Gurung • July 31, 2024

When I try to run the following command, it doesn't omit the -999 from the table. Why is this the case?

penguins_data <- read_csv("penguins_data.csv", na = "-999")

David Keyes Founder • August 1, 2024

It's a bit hard to say. If you want to record a video to show me what's happening I can take a look. Please do that here.

Caroline Kypson • September 4, 2024

The -999 in my table did not disappear after I ran penguins_data <- read_csv("penguins_data.csv", na = "-999"). It did disappear in the skim data though.

David Keyes Founder • September 4, 2024

Can you clarify what you mean by your "table"?

Vivian KAWANAMI • September 19, 2024

hi David and Gracielle! Quick Q: when writing the code for read_csv and all the embedded functions, the line becomes too long to read and I've noticed that David uses a shortcut to add the new functions (e.g., na and then col_types) to lines further below. Should I just press enter or is there another way of parsing the line? Thanks!

Gracielle Higino Coach • September 20, 2024

Olá Vivian! Yes, hitting enter is all you need to add a new line within your function. Notice that this only works if you're working on a text file like "import.R", as shown in the video (i.e., your script). If you're working on the console, to add a new line you might need to use shift+enter, and the console will show a plus sign instead of a > at the beginning of the new line. This is to mean that you still need to add something to finish the command!