Import Our Data Again
This lesson is called Import Our Data Again, part of the Getting Started With R course. This lesson is called Import Our Data Again, part of the Getting Started With R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Adjust your
read_csv()
code so that you import the data againUse the na argument to tell
read_csv()
what data should be treated as missingUse the
col_types
argument to make sure thatsex_v2
gets imported as character data
Have any questions? Put them below and we will help you out!
Course Content
12 Lessons
You need to be signed-in to comment on this post. Login.
Betsy Dalton • September 15, 2023
Hm, when I try to run
penguins_data <- read_csv("penguins_data.csv", na = "-999")
I keep getting the following error:
David Keyes Founder • September 15, 2023
This appears to be a bug in RStudio (others have seen the same thing, as have I). You can safely ignore it. If you update RStudio in a few weeks, my guess is it will be fixed.
Betsy Dalton • September 15, 2023
Thanks!
Alberto Cabrera • November 4, 2023
penguins <- read_csv("https://raw.githubusercontent.com/rfortherestofus/rin3-fall-2023/main/data-raw/penguins.csv")
Retrieves a csv file from a GitHub account, subfolder data-raw and creates a data frame labeled penguins
Kiara Sanchez • September 17, 2023
I keep running my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c") with teh skim function but the only column that changes to NA is the sex column.
Kiara Sanchez • September 17, 2023
Nevermind all fixed
Shubhra Murarka • March 1, 2024
Error when running: penguins_data <- read_csv("penguins_data.csv",
Error in base::nchar(wide_chars$test, type = "width") :
lazy-load database '/Users/shubhra/Library/R/x86_64/4.1/library/cli/R/sysdata.rdb' is corrupt In addition: Warning messages: 1: In base::nchar(wide_chars$test, type = "width") : restarting interrupted promise evaluation 2: In base::nchar(wide_chars$test, type = "width") : internal error -3 in R_decompress1
David Keyes Founder • March 1, 2024
Hmm, I'm not quite sure what's going on. One quick question, though: did you just install R/RStudio for this course or had you installed it previously?
Alyssa Jeffers • March 14, 2024
Hi there, when I ran the argument for changing sex_v2 to character data, I noticed that col_types showed up as a new data item in my Data Environment, but I noticed on your demo screen, it didn't show up as an additional item after running the code, you only have the penguins_data. Should this not have happened? This was my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c"))
David Keyes Founder • March 14, 2024
Very strange! I've not seen that. Can you record a quick video using this and show me what you're seeing? Please email me after you upload the video so I know to look for it.
David Keyes Founder • March 19, 2024
Ok, I've watched your video and I've got a solution for you! Check out this video. Let me know if this helps!
Alyssa Jeffers • March 20, 2024
Got it, thanks for explaining that! I bet that's what I did.
Sandra Virgo • March 20, 2024
When I run the read_csv() code again to deal with the -999 values, it does not completely work.
Looking at the data using view(penguins_data), I still have some -999 values in case 4, as well as some -999.0 values in case 4.
I have adapted the code to read read_csv("penguins_data.csv", na = "-999, -999.0") to deal with the ones with the decimal point and the zero, but even after that there are still these issues, including in sex_v2
I can see that in sex there are some NA values now, which means the code has partially worked, I guess.
Apologies, I don't seem to be able to get a screenshot in here.
Libby Heeren Coach • March 20, 2024
Hi, Sandra! I made a short video going over the process of replacing values with NA in the data. Please take a look at it and let me know if you have any questions! https://muse.ai/v/4i7KhQx
Recap: -999 and -999.0 are the same value, just displayed differently. When using the na argument, you can either use one value inside quotations, like
"-999"
, or two values inside thec()
function, each in their own set of quotation marks, like this:c("-999", "NA")
Sandra Virgo • March 20, 2024
Thanks so much, Libby. The moment you said that the -999.0 values were the same as the -999 values it got me back on the right track again. I thought I had tried every combination of solutions but I clearly hadn't tried the easiest one. The video was very helpful - thanks so much!
Libby Heeren Coach • March 20, 2024
Woo! That's a win! So glad it helped, and thanks for asking questions! They help everyone :)
Puspanjali Gurung • July 31, 2024
When I try to run the following command, it doesn't omit the -999 from the table. Why is this the case?
David Keyes Founder • August 1, 2024
It's a bit hard to say. If you want to record a video to show me what's happening I can take a look. Please do that here.
Caroline Kypson • September 3, 2024
The -999 in my table did not disappear after I ran penguins_data <- read_csv("penguins_data.csv", na = "-999"). It did disappear in the skim data though.
David Keyes Founder • September 3, 2024
Can you clarify what you mean by your "table"?
Vivian KAWANAMI • September 19, 2024
hi David and Gracielle! Quick Q: when writing the code for read_csv and all the embedded functions, the line becomes too long to read and I've noticed that David uses a shortcut to add the new functions (e.g., na and then col_types) to lines further below. Should I just press enter or is there another way of parsing the line? Thanks!
Gracielle Higino Coach • September 19, 2024
Olá Vivian! Yes, hitting enter is all you need to add a new line within your function. Notice that this only works if you're working on a text file like "import.R", as shown in the video (i.e., your script). If you're working on the console, to add a new line you might need to use shift+enter, and the console will show a plus sign instead of a
>
at the beginning of the new line. This is to mean that you still need to add something to finish the command!