Get access to all lessons in this course.
Getting Started With R
Import Our Data Again
This lesson is locked
This lesson is called Import Our Data Again, part of the Getting Started With R course. This lesson is called Import Our Data Again, part of the Getting Started With R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Adjust your
read_csv()
code so that you import the data againUse the na argument to tell
read_csv()
what data should be treated as missingUse the
col_types
argument to make sure thatsex_v2
gets imported as character data
You need to be signed-in to comment on this post. Login.
Christine Farrugia
March 16, 2021
I received an error related to the col_types function and I can't figure out how to correct it.
Import the faketucky data into a data frame called faketucky.
> faketucky library(tidyverse) -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag()
David Keyes Founder
March 16, 2021
Howdy! So you're not actually seeing an error. What you're seeing is just a couple messages that show up when you load the tidyverse. I made a short video to explain it to you. Hope it helps!
Jyoni Shuler
March 17, 2021
Hi David, I noticed that the code is indented in specific ways depending on the command. Do we have to format our text with indents to have it aligned as you showed in your demonstration? If so, do we just use "tab" or are there other, more efficient ways to go about spacing our code?
Peleise Smith
March 19, 2021
Hi there, I ran the skim function and I get the following error message:
Error: attempt to use zero-length variable name
Not sure which variable it's referring to since there are 12 and R seems to have read all 12 variables in the csv.
Thank you for sharing your knowledge! :)
Christian Marin
April 4, 2021
Hi David, I'm doing a refresher and noticed only some of my code is working for the character variables. An error pops up with an 'unexpected ')' for school district and received high school diploma.. Here is my code: col_types = list(first_high_school_attended = col_character(), race_ethnicity = col_character(), male = col_character(), enrolled_in_college = col_character(), received_high_school_diploma = col_character(), school_district = col_character()))
My output is only showing character variables for first_high_school attended, and school district, and race_ethnicity, leaving the rest in numeric.
Adwoa Odoom
August 17, 2021
Hello David, For some reason when I try to change the character types for the variables you mentioned, none of them turn into a character. I copy-pasted the same exact code you used and I also kept getting this error: Error in read_csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), : could not find function "read_csv". Could you assist?
Camille Antinori
January 7, 2022
Hi, Why do I get a different output for the read_csv function than you get at 2:54? I get:
> faketucky<-read_csv("data/faketucky.csv") Rows: 57855 Columns: 12
-- Column specification ------------------------------------- Delimiter: "," chr (3): first_high_school_attended, school_district, rac... dbl (9): student_id, male, free_and_reduced_lunch, percen...
i Use
spec()
to retrieve the full column specification for this data. i Specify the column types or setshow_col_types = FALSE
to quiet this message.John Franjione
January 17, 2022
A minor question... My skimr output is different than what it is in the video. I've got the variables right (i.e. there are now 7 character variables), but instead of showing integer counts of {missing, complete, n, min, max, empty, n_unique}, I've got: {n_missing, complete_rate, min, max, empty, n_unique, whitespace}
FWIW, the n_missing values are the same (14788 for enrolled_in_college, 860 for free_and_reduced_lunch, 14 for male).
Another difference is your list of character variables is in alphabetical order (enrolled_in_college, first_high_school_attended, etc.), whereas mine is not (first_high_school_attended, school_distrct,... enrolled_in_college). I don't see any relationship between the order and any of the other column entries.
LILIANA CUBAS GAONA
January 18, 2022
Hello David, I am using read.csv() function (because I did not find read_csv() function), so I put "." instead of "", but it is not still working. I have tried also to put "", but same error. Could you tell me where my error is please? Many thanks > # Import the faketucky data into a data frame called faketucky. > faketucky # Import the faketucky data into a data frame called faketucky. > faketucky <- read.csv("data/faketucky.csv", na = "999",col_types = list(enrolled_in_college = col_character(),
Error in col_character() : could not find function "col_character"
Aditi Shah
March 19, 2022
When we add the na = "999" argument to the read_csv function, are we passing the number as a string? If so, how does the function remove the values that are displayed as 999.000000 in the gpa column?
Additionally, when renaming the columns, why does passing the column name in brackets of the col_character() function not work? I get an "unused argument" error.
Danielle Lowry
March 22, 2022
Strange question... I noticed that David's summary stats in the console when he runs the skim command are all nicely left justified. On my end, decimals line up and it looks so messy. Is there a way to correct this in R so my output looks more clean like David's?
Laura Estrela Muriel
March 24, 2022
Hi David, I get this error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list())) This is what I wrote: faketucky <- read.csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), recieved_high_school_diploma = col_character()))
Thank you!
jeph mathias
March 24, 2022
HI Charlie/David Just trying the re-import data bit. I have this code in my scriptfaketucky col_types = list(enrolled_in_college = col_character(),
any ideas? Thanks
jeph mathias
March 24, 2022
The second half of what I was trying to ask was about the error message that I get but ignore for now.I am going to try with teh cloud version.
Gloria Li
May 17, 2022
Error in col_character() : could not find function "col_character"
Ravindra Mehta
July 3, 2022
I get an error message with the following code. Not sure what is wrong. Thanks for any guidance > faketucky<-read_csv("data/faketucky.csv", na="999", col_types() = list(male = col_character(),free_and_reduced_lunch = col_character(),received_high_school_diploma = col_character(), enrolled_in_college = col_character())) Error: unexpected '=' in "faketucky
Julieth Silao
September 21, 2022
Hello David. I got error say could not find function "col_character"
faketucky
Amanda Krantz
September 21, 2022
It seems like the new import went through fine, but my data doesn't match what I see of yours in the video (my result below).
── Variable type: character ────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique 1 first_high_school_attended 0 1 4 14 0 393 2 school_district 0 1 4 13 0 171 3 male 14 1.00 1 1 0 2 4 race_ethnicity 0 1 0 24 794 6 5 free_and_reduced_lunch 860 0.985 1 1 0 2 6 received_high_school_diploma 0 1 1 1 0 2 7 enrolled_in_college 14788 0.744 1 1 0 2 whitespace 1 0 2 0 3 0 4 0 5 0 6 0 7 0
── Variable type: numeric ──────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 1 student_id 0 1 55922. 32333. 1 27910.
2 percent_absent 111 0.998 8.78 16.0 0 3.26 3 gpa 2185 0.962 2.59 0.874 0 2.00 4 act_reading_score 14121 0.756 19.8 5.80 2 15
5 act_math_score 14101 0.756 19.0 4.65 1 16
p50 p75 p100 hist 1 56070 83872. 111990 ▇▇▇▇▇ 2 6.27 11.3 3153 ▇▁▁▁▁ 3 2.66 3.28 4 ▁▂▆▇▇ 4 19 23 36 ▁▆▇▅▂ 5 17 22 36 ▁▃▇▃▁
Ellen Wilson
October 4, 2022
I am getting an error message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list()))
This is the code I entered: faketucky<- read.csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
Julieth Silao
October 29, 2022
faketucky <- read_csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
and i get error saying col_character not found
Julieth Silao
October 29, 2022
#editing column in the table faketucky <- read.csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college =col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())) i also try this but still get an error
Dimeji Olawuyi
December 23, 2022
Hi David, I run this col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
But received Error: unexpected ')' in "col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))"
Could you point out what's wrong?
Pegah Maleki
January 21, 2023
Hello! Trying to run this code and getting error message: col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), male = col_character(), received_high_school_diploma = col_character()))
Error message:
> col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), Error: unexpected ',' in "col_types = list(enrolled_in_college)= col_character(),"
Sarah Sexton
January 25, 2023
I am running the code faketucky <- read_csv("data/faketucky.csv"), na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())
And I keep getting the error Error: unexpected ',' in "faketucky <- read_csv("data/faketucky.csv")," but if I take out the commas it doesn't make the changes to the file?
The Evaluation Center
March 15, 2023
Can you explain why in the first couple of exercises where we read the CSV into R, we are able to use read.csv(), but when we tell R to read the CSV with the col_types() function, we have to update read.csv() to read_csv()?
Mark Adrian Salvador
March 22, 2023
I didn't get an error though. Probably it's fixed already.
David Vasquez
March 23, 2023
Hi! I am coding as indicated and the results are as expected.
My code: faketucky <- read_csv("data/faketucky.csv", na = "999", col_types = list (male = col_character(), free_and_reduced_lunch = col_character(), received_high_school_diploma = col_character(), enrolled_in_college = col_character()))
skim(faketucky)
The only issue I have is that all of my histograms look like this, and do not show bars:
I tried googling and looking in StackOverflow but the explanations are a bit too technical. Any instruction on how to solve this is very well appreciated! Thanks!
Aviv Nur
June 11, 2023
Hi, when I run the code to import the dataset, I got this error Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name
What does it mean? Thank you
Nicole Sanchez
June 14, 2023
Hello again, when I did the data types code I keep getting, > faketucky <- read_csv( "data/faketucky.csv",
Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name Thank you, Nicole