Get access to all lessons in this course.
Getting Started with R
Import Our Data Again
This lesson is locked
This lesson is called Import Our Data Again, part of the Getting Started with R course. This lesson is called Import Our Data Again, part of the Getting Started with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Adjust your
read_csv()
code so that you import the data againUse the na argument to tell
read_csv()
what data should be treated as missingUse the
col_types
argument to make sure thatsex_v2
gets imported as character data
You need to be signed-in to comment on this post. Login.
Christine Farrugia
March 15, 2021
I received an error related to the col_types function and I can't figure out how to correct it.
Import the faketucky data into a data frame called faketucky.
> faketucky library(tidyverse) -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag()
David Keyes
March 15, 2021
Howdy! So you're not actually seeing an error. What you're seeing is just a couple messages that show up when you load the tidyverse. I made a short video to explain it to you. Hope it helps!
Christine Farrugia
March 16, 2021
Thank you for your response! The video does not have any sound. The bigger issue is how to fix the error when I try to run col_character. This is the message I get: > faketucky<-read.csv("data/faketucky.csv",
Error in col_character() : could not find function "col_character"
How do I fix this?
Christine Farrugia
March 16, 2021
I just realized there was a typo in the last code block I sent (coltypes should be col_types), but the error still occurs when I use col_types.
David Keyes
March 16, 2021
Oh jeez, sorry about the video with no sound! In terms of your issue, the reason it is occurring is that you're using the read.csv() function, not read_csv() (note the _ in place of the .). Try it again with that and it should work!
Christine Farrugia
March 16, 2021
Thank you!! I have spent hours googling this issue. I greatly appreciate your fast help!
Jyoni Shuler
March 16, 2021
Hi David, I noticed that the code is indented in specific ways depending on the command. Do we have to format our text with indents to have it aligned as you showed in your demonstration? If so, do we just use "tab" or are there other, more efficient ways to go about spacing our code?
Jyoni Shuler
March 16, 2021
Oh, never mind - I see it formats automatically!
David Keyes
March 16, 2021
No worries. You can also indent automatically using command+I (Mac) or control+I (Windows). See this demonstration.
Peleise Smith
March 18, 2021
Hi there, I ran the skim function and I get the following error message:
Error: attempt to use zero-length variable name
Not sure which variable it's referring to since there are 12 and R seems to have read all 12 variables in the csv.
Thank you for sharing your knowledge! :)
David Keyes
March 18, 2021
Could you post your code so I can see exactly what the issue might be?
Christian Marin
April 3, 2021
Hi David, I'm doing a refresher and noticed only some of my code is working for the character variables. An error pops up with an 'unexpected ')' for school district and received high school diploma.. Here is my code: col_types = list(first_high_school_attended = col_character(), race_ethnicity = col_character(), male = col_character(), enrolled_in_college = col_character(), received_high_school_diploma = col_character(), school_district = col_character()))
My output is only showing character variables for first_high_school attended, and school district, and race_ethnicity, leaving the rest in numeric.
David Keyes
April 5, 2021
You've got one to many ) at the end of your code :)
Adwoa Odoom
August 17, 2021
Hello David, For some reason when I try to change the character types for the variables you mentioned, none of them turn into a character. I copy-pasted the same exact code you used and I also kept getting this error: Error in read_csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), : could not find function "read_csv". Could you assist?
David Keyes
August 18, 2021
Are you sure you loaded the tidyverse package (i.e. have the code
library(tidyverse)
somewhere in your R script file)? If that function can't be found, it almost certainly means you haven't loaded the package. Let me know if that helps!Camille Antinori
January 7, 2022
Hi, Why do I get a different output for the read_csv function than you get at 2:54? I get:
> faketucky<-read_csv("data/faketucky.csv") Rows: 57855 Columns: 12
-- Column specification ------------------------------------- Delimiter: "," chr (3): first_high_school_attended, school_district, rac... dbl (9): student_id, male, free_and_reduced_lunch, percen...
i Use
spec()
to retrieve the full column specification for this data. i Specify the column types or setshow_col_types = FALSE
to quiet this message.David Keyes
January 7, 2022
Are you sure you used
read_csv()
notread.csv()
. Note the _ (not .) between read and csv.Camille Antinori
January 7, 2022
Hi David, The code is just as I pasted above. Maybe it is the version I am using: R version 4.0.2 (2020-06-22) -- "Taking Off Again"
David Keyes
January 8, 2022
I just ran it and I get the same thing as you now too. I don't think it's the version of R, but rather the version of the
readr
package. It's been updated since I recorded this video and it looks like the output messages are different now. Hope that answers it for you!Camille Antinori
January 8, 2022
Thanks for checking!
John Franjione
January 16, 2022
A minor question... My skimr output is different than what it is in the video. I've got the variables right (i.e. there are now 7 character variables), but instead of showing integer counts of {missing, complete, n, min, max, empty, n_unique}, I've got: {n_missing, complete_rate, min, max, empty, n_unique, whitespace}
FWIW, the n_missing values are the same (14788 for enrolled_in_college, 860 for free_and_reduced_lunch, 14 for male).
Another difference is your list of character variables is in alphabetical order (enrolled_in_college, first_high_school_attended, etc.), whereas mine is not (first_high_school_attended, school_distrct,... enrolled_in_college). I don't see any relationship between the order and any of the other column entries.
David Keyes
January 18, 2022
This is almost certainly a difference in how these packages work today versus how they worked a couple years ago when I made this course. Nothing different is happening under the hood, it's just the message they give appears to be slightly different. Let me know if you have other questions!
LILIANA CUBAS GAONA
January 18, 2022
Hello David, I am using read.csv() function (because I did not find read_csv() function), so I put "." instead of "", but it is not still working. I have tried also to put "", but same error. Could you tell me where my error is please? Many thanks > # Import the faketucky data into a data frame called faketucky. > faketucky # Import the faketucky data into a data frame called faketucky. > faketucky <- read.csv("data/faketucky.csv", na = "999",col_types = list(enrolled_in_college = col_character(),
Error in col_character() : could not find function "col_character"
David Keyes
January 18, 2022
Have you installed and loaded the tidyverse package? read_csv() won't work without that.
LILIANA CUBAS GAONA
January 19, 2022
Many thanks for your answer David. I had to install again tidyverse package. It seems that I have to do it each time that I open R studio. I happens me also with Skimr. Any suggestion about that? thanks in advance.
David Keyes
January 19, 2022
You should only have to install packages once per computer. By that I mean running
install.packages("tidyverse")
. You do need to load any package you want to use each time you open RStudio. That means runninglibrary(tidyverse)
at the top of your code each time. Does that clarify things?Aditi Shah
March 19, 2022
When we add the na = "999" argument to the read_csv function, are we passing the number as a string? If so, how does the function remove the values that are displayed as 999.000000 in the gpa column?
Additionally, when renaming the columns, why does passing the column name in brackets of the col_character() function not work? I get an "unused argument" error.
Charlie Hadley
March 21, 2022
Hi Aditi! When using the read_csv(..., na = "999") we need to give the na value as a string because when read_csv() first parses the data file it has not yet determined what type of data each column contains. The column type is decided afterwards by the parse_guess() function. For a demonstration, let's create a fake dataset that contains 999 in both a character and numeric column:
See how 999 is converted to NA in both columns.
I'm not entirely sure about your renaming column question. This argument can be given a vector of new column names, but if so the first row will be assumed to be data: as you'll see here.
Could you provide some more detail about your col_names argument question? Thanks, Charlotte
Aditi Shah
March 23, 2022
Thank you for your response!
About the second question, I'm wondering why this doesn't work:
Charlie Hadley
March 23, 2022
The
col_character()
and othercol_*()
do not take any arguments, instead their positional order dictates their behaviour. So if we wanted to force the first 4 columns to be treated as character columns we would writeIt is possible to target columns by their name through the use of a named list, eg
Aditi Shah
March 23, 2022
I see. This is a tangential question, but I tried to run the following code to avoid repeating the col_character() function in your example:
The second one worked but not the first. If it's not too much of a hassle, could you please explain why that is?
Danielle Lowry
March 21, 2022
Strange question... I noticed that David's summary stats in the console when he runs the skim command are all nicely left justified. On my end, decimals line up and it looks so messy. Is there a way to correct this in R so my output looks more clean like David's?
Charlie Hadley
March 22, 2022
Hi Danielle! That's not a strange question. The {skimr} package does as good a job as possible to neatly fit its output into the terminal window dependent on its current width. The package is almost entirely designed to give a quick overview of data and it would require unpicking the package to customise the appearance of the output. As you get into using {ggplot2} for data visualisations we'll show you (and you can ask us questions) about making these look pixel perfect.
Laura Estrela Muriel
March 23, 2022
Hi David, I get this error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list())) This is what I wrote: faketucky <- read.csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), recieved_high_school_diploma = col_character()))
Thank you!
Charlie Hadley
March 23, 2022
Hi Laura! This is a very common error - instead of using the read_csv() function from the tidyverse you've written read.csv(). The error is being generated because the read.csv() function does not have the
col_types
argument, if you swap read_csv() for read.csv() you should find it works. Cheers, CharlotteLaura Estrela Muriel
March 23, 2022
Oh, right. Thank you Charlotte!!
Isidora Murillo
May 16, 2023
Hi Charlotte! I have the same problem as Laura so I changed the function to read_csv() but it did not work. Here is what I wrote (it does not recognize col_types). > read_csv("/Users/isidoramurillo/Desktop/R/getting-started-master/data/faketucky.csv",
Thanks!
David Keyes
May 16, 2023
Hi there! Did you use col_types() or col_t() because in your comment I see the latter. Please let me know and we can take it from there!
Isidora Murillo
May 18, 2023
Hi David, I wrote col_types () (might have deleted the letters when I copied the text)
jeph mathias
March 23, 2022
HI Charlie/David Just trying the re-import data bit. I have this code in my scriptfaketucky col_types = list(enrolled_in_college = col_character(),
any ideas? Thanks
jeph mathias
March 23, 2022
The second half of what I was trying to ask was about the error message that I get but ignore for now.I am going to try with teh cloud version.
Gloria Li
May 17, 2022
Error in col_character() : could not find function "col_character"
Charlie Hadley
May 18, 2022
Hello Gloria,
Could I see the code you're running? It's likely that this is due to one of two common mistakes:
Cheers,
Charlie
Ravindra Mehta
July 3, 2022
I get an error message with the following code. Not sure what is wrong. Thanks for any guidance > faketucky<-read_csv("data/faketucky.csv", na="999", col_types() = list(male = col_character(),free_and_reduced_lunch = col_character(),received_high_school_diploma = col_character(), enrolled_in_college = col_character())) Error: unexpected '=' in "faketucky
Charlie Hadley
July 12, 2022
Hello Ravindra,
This error is caused by the round brackets you provided to the col_types argument. Which is quite technical to explain, so here's a short video.
Thanks,
Charlie
Julieth Silao
September 21, 2022
Thanks, but the problem is faketucky
Julieth Silao
September 21, 2022
Hello David. I got error say could not find function "col_character"
faketucky
David Keyes
September 21, 2022
Have you loaded the tidyverse package using library(tidyverse) in your code prior to running this?
Amanda Krantz
September 21, 2022
It seems like the new import went through fine, but my data doesn't match what I see of yours in the video (my result below).
── Variable type: character ────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique 1 first_high_school_attended 0 1 4 14 0 393 2 school_district 0 1 4 13 0 171 3 male 14 1.00 1 1 0 2 4 race_ethnicity 0 1 0 24 794 6 5 free_and_reduced_lunch 860 0.985 1 1 0 2 6 received_high_school_diploma 0 1 1 1 0 2 7 enrolled_in_college 14788 0.744 1 1 0 2 whitespace 1 0 2 0 3 0 4 0 5 0 6 0 7 0
── Variable type: numeric ──────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 1 student_id 0 1 55922. 32333. 1 27910.
2 percent_absent 111 0.998 8.78 16.0 0 3.26 3 gpa 2185 0.962 2.59 0.874 0 2.00 4 act_reading_score 14121 0.756 19.8 5.80 2 15
5 act_math_score 14101 0.756 19.0 4.65 1 16
p50 p75 p100 hist 1 56070 83872. 111990 ▇▇▇▇▇ 2 6.27 11.3 3153 ▇▁▁▁▁ 3 2.66 3.28 4 ▁▂▆▇▇ 4 19 23 36 ▁▆▇▅▂ 5 17 22 36 ▁▃▇▃▁
David Keyes
September 21, 2022
So I think what happened is that the default behavior of the read_csv() function has changed slightly since I recorded this video. As a result, it may import data with slightly different column types. Nothing to worry about, as you also know now how to change column types.
Ellen Wilson
October 3, 2022
I am getting an error message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list()))
This is the code I entered: faketucky<- read.csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
Ellen Wilson
October 3, 2022
Ah. I just realized I had read.csv instead of read_csv! But, I'm still getting an error. Now it says: Error in enc2utf8(na) : argument is not a character vector
My code is now: faketucky<- read_csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
David Keyes
October 3, 2022
Try putting the 999 in quotes ("999").
Ellen Wilson
October 4, 2022
Thanks--that worked!
Julieth Silao
October 29, 2022
faketucky <- read_csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
and i get error saying col_character not found
Julieth Silao
October 29, 2022
#editing column in the table faketucky <- read.csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college =col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())) i also try this but still get an error
David Keyes
October 31, 2022
You need to use read_csv(), not read.csv().
Dimeji Olawuyi
December 22, 2022
Hi David, I run this col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
But received Error: unexpected ')' in "col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))"
Could you point out what's wrong?
Pegah Maleki
January 20, 2023
Hello! Trying to run this code and getting error message: col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), male = col_character(), received_high_school_diploma = col_character()))
Error message:
> col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), Error: unexpected ',' in "col_types = list(enrolled_in_college)= col_character(),"
David Keyes
January 20, 2023
It looks like you have a space between the c and h in lunch. Can you check that and see if it works if you fix it?
Sarah Sexton
January 24, 2023
I am running the code faketucky <- read_csv("data/faketucky.csv"), na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())
And I keep getting the error Error: unexpected ',' in "faketucky <- read_csv("data/faketucky.csv")," but if I take out the commas it doesn't make the changes to the file?
David Keyes
January 24, 2023
Try removing this parentheses: https://show.rfor.us/61Q7v9f9
Sarah Sexton
January 25, 2023
That worked, thank you!
The Evaluation Center
March 15, 2023
Can you explain why in the first couple of exercises where we read the CSV into R, we are able to use read.csv(), but when we tell R to read the CSV with the col_types() function, we have to update read.csv() to read_csv()?
David Keyes
March 15, 2023
It's a good question! The reason is that, when we didn't use any arguments, both functions work. However, when we use col_types(), that doesn't exist in read.csv() but does in read_csv(). Hope that helps!
Mark Adrian Salvador
March 22, 2023
I didn't get an error though. Probably it's fixed already.
David Vasquez
March 23, 2023
Hi! I am coding as indicated and the results are as expected.
My code: faketucky <- read_csv("data/faketucky.csv", na = "999", col_types = list (male = col_character(), free_and_reduced_lunch = col_character(), received_high_school_diploma = col_character(), enrolled_in_college = col_character()))
skim(faketucky)
The only issue I have is that all of my histograms look like this, and do not show bars:
I tried googling and looking in StackOverflow but the explanations are a bit too technical. Any instruction on how to solve this is very well appreciated! Thanks!
David Vasquez
March 23, 2023
This is how histograms look
David Vasquez
March 23, 2023
The platform does not let me reply with the code I get instead of my histograms (it gets erased). It is a long string of several of these: U+2587. Sorry for any confusion.
David Keyes
March 23, 2023
I think this is a common issue on Windows, unfortunately. You could try running the function
fix_windows_histograms()
first and then runningskim()
again to see if that helps.Aviv Nur
June 11, 2023
Hi, when I run the code to import the dataset, I got this error Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name
What does it mean? Thank you
David Keyes
June 12, 2023
This actually seems to be a bug in the most recent version of RStudio (detailed and very technical discussion here). For now, I wouldn't worry about it. If you want to get rid of the message, you can download an earlier version of RStudio from here. Or, just wait until you are prompted to download a new version of RStudio and upgrade, which, I assume, will get rid of the message.
Nicole Sanchez
June 13, 2023
Hello again, when I did the data types code I keep getting, > faketucky <- read_csv( "data/faketucky.csv",
Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name Thank you, Nicole
David Keyes
June 13, 2023
Please take a look at my response to Aviv, which deals with the same issue.
Nicole Sanchez
June 13, 2023
Sorry David, but that link to Aviv did not work. Thx
David Keyes
June 14, 2023
Scroll down and you should see it.