Import Our Data Again

This lesson is called Import Our Data Again, part of the R in 3 Months (Fall 2022) course. This lesson is called Import Our Data Again, part of the R in 3 Months (Fall 2022) course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Adjust your read_csv() code so that you import the data again
Use the na argument to tell read_csv() what data should be treated as missing
Use the col_types argument to make sure that sex_v2 gets imported as character data

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Christine Farrugia • March 16, 2021

I received an error related to the col_types function and I can't figure out how to correct it.

Import the faketucky data into a data frame called faketucky.

> faketucky library(tidyverse) -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag()

David Keyes Founder • March 16, 2021

Howdy! So you're not actually seeing an error. What you're seeing is just a couple messages that show up when you load the tidyverse. I made a short video to explain it to you. Hope it helps!

Jyoni Shuler • March 17, 2021

Hi David, I noticed that the code is indented in specific ways depending on the command. Do we have to format our text with indents to have it aligned as you showed in your demonstration? If so, do we just use "tab" or are there other, more efficient ways to go about spacing our code?

Peleise Smith • March 19, 2021

Hi there, I ran the skim function and I get the following error message:

Error: attempt to use zero-length variable name

Not sure which variable it's referring to since there are 12 and R seems to have read all 12 variables in the csv.

Thank you for sharing your knowledge! :)

Christian Marin • April 4, 2021

Hi David, I'm doing a refresher and noticed only some of my code is working for the character variables. An error pops up with an 'unexpected ')' for school district and received high school diploma.. Here is my code: col_types = list(first_high_school_attended = col_character(), race_ethnicity = col_character(), male = col_character(), enrolled_in_college = col_character(), received_high_school_diploma = col_character(), school_district = col_character()))

My output is only showing character variables for first_high_school attended, and school district, and race_ethnicity, leaving the rest in numeric.

Adwoa Odoom • August 17, 2021

Hello David, For some reason when I try to change the character types for the variables you mentioned, none of them turn into a character. I copy-pasted the same exact code you used and I also kept getting this error: Error in read_csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), : could not find function "read_csv". Could you assist?

Camille Antinori • January 7, 2022

Hi, Why do I get a different output for the read_csv function than you get at 2:54? I get:
> faketucky<-read_csv("data/faketucky.csv") Rows: 57855 Columns: 12
-- Column specification ------------------------------------- Delimiter: "," chr (3): first_high_school_attended, school_district, rac... dbl (9): student_id, male, free_and_reduced_lunch, percen...

i Use spec() to retrieve the full column specification for this data. i Specify the column types or set show_col_types = FALSE to quiet this message.

John Franjione • January 17, 2022

A minor question... My skimr output is different than what it is in the video. I've got the variables right (i.e. there are now 7 character variables), but instead of showing integer counts of {missing, complete, n, min, max, empty, n_unique}, I've got: {n_missing, complete_rate, min, max, empty, n_unique, whitespace}

FWIW, the n_missing values are the same (14788 for enrolled_in_college, 860 for free_and_reduced_lunch, 14 for male).

Another difference is your list of character variables is in alphabetical order (enrolled_in_college, first_high_school_attended, etc.), whereas mine is not (first_high_school_attended, school_distrct,... enrolled_in_college). I don't see any relationship between the order and any of the other column entries.

LILIANA CUBAS GAONA • January 18, 2022

Hello David, I am using read.csv() function (because I did not find read_csv() function), so I put "." instead of "", but it is not still working. I have tried also to put "", but same error. Could you tell me where my error is please? Many thanks > # Import the faketucky data into a data frame called faketucky. > faketucky # Import the faketucky data into a data frame called faketucky. > faketucky <- read.csv("data/faketucky.csv", na = "999",col_types = list(enrolled_in_college = col_character(),

                                                                  free_and_reduced_lunch = col_character(),

                                                                  male = col_character(), recieved_high_school_diploma = col_character()))

Error in col_character() : could not find function "col_character"

Aditi Shah • March 19, 2022

When we add the na = "999" argument to the read_csv function, are we passing the number as a string? If so, how does the function remove the values that are displayed as 999.000000 in the gpa column?

Additionally, when renaming the columns, why does passing the column name in brackets of the col_character() function not work? I get an "unused argument" error.

Danielle Lowry • March 22, 2022

Strange question... I noticed that David's summary stats in the console when he runs the skim command are all nicely left justified. On my end, decimals line up and it looks so messy. Is there a way to correct this in R so my output looks more clean like David's?

Laura Estrela Muriel • March 24, 2022

Hi David, I get this error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list())) This is what I wrote: faketucky <- read.csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), recieved_high_school_diploma = col_character()))

Thank you!

jeph mathias • March 24, 2022

HI Charlie/David Just trying the re-import data bit. I have this code in my scriptfaketucky col_types = list(enrolled_in_college = col_character(),

                  free_and_reduced_lunch = col_character(),

                  male = col_character(),

                  received_high_school_diploma = col_cha

any ideas? Thanks

jeph mathias • March 24, 2022

The second half of what I was trying to ask was about the error message that I get but ignore for now.I am going to try with teh cloud version.

Gloria Li • May 17, 2022

Error in col_character() : could not find function "col_character"

Ravindra Mehta • July 3, 2022

I get an error message with the following code. Not sure what is wrong. Thanks for any guidance > faketucky<-read_csv("data/faketucky.csv", na="999", col_types() = list(male = col_character(),free_and_reduced_lunch = col_character(),received_high_school_diploma = col_character(), enrolled_in_college = col_character())) Error: unexpected '=' in "faketucky

Julieth Silao • September 21, 2022

Hello David. I got error say could not find function "col_character"

faketucky

Amanda Krantz • September 21, 2022

It seems like the new import went through fine, but my data doesn't match what I see of yours in the video (my result below).

── Variable type: character ────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique 1 first_high_school_attended 0 1 4 14 0 393 2 school_district 0 1 4 13 0 171 3 male 14 1.00 1 1 0 2 4 race_ethnicity 0 1 0 24 794 6 5 free_and_reduced_lunch 860 0.985 1 1 0 2 6 received_high_school_diploma 0 1 1 1 0 2 7 enrolled_in_college 14788 0.744 1 1 0 2 whitespace 1 0 2 0 3 0 4 0 5 0 6 0 7 0

── Variable type: numeric ──────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 1 student_id 0 1 55922. 32333. 1 27910.
2 percent_absent 111 0.998 8.78 16.0 0 3.26 3 gpa 2185 0.962 2.59 0.874 0 2.00 4 act_reading_score 14121 0.756 19.8 5.80 2 15
5 act_math_score 14101 0.756 19.0 4.65 1 16
p50 p75 p100 hist 1 56070 83872. 111990 ▇▇▇▇▇ 2 6.27 11.3 3153 ▇▁▁▁▁ 3 2.66 3.28 4 ▁▂▆▇▇ 4 19 23 36 ▁▆▇▅▂ 5 17 22 36 ▁▃▇▃▁

Ellen Wilson • October 4, 2022

I am getting an error message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list()))

This is the code I entered: faketucky<- read.csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

Julieth Silao • October 29, 2022

faketucky <- read_csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

and i get error saying col_character not found

Julieth Silao • October 29, 2022

#editing column in the table faketucky <- read.csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college =col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())) i also try this but still get an error

Dimeji Olawuyi • December 23, 2022

Hi David, I run this col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

But received Error: unexpected ')' in "col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))"

Could you point out what's wrong?

Pegah Maleki • January 21, 2023

Hello! Trying to run this code and getting error message: col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), male = col_character(), received_high_school_diploma = col_character()))

Error message:

> col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), Error: unexpected ',' in "col_types = list(enrolled_in_college)= col_character(),"

Sarah Sexton • January 25, 2023

I am running the code faketucky <- read_csv("data/faketucky.csv"), na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())

And I keep getting the error Error: unexpected ',' in "faketucky <- read_csv("data/faketucky.csv")," but if I take out the commas it doesn't make the changes to the file?

The Evaluation Center • March 15, 2023

Can you explain why in the first couple of exercises where we read the CSV into R, we are able to use read.csv(), but when we tell R to read the CSV with the col_types() function, we have to update read.csv() to read_csv()?

Mark Adrian Salvador • March 22, 2023

I didn't get an error though. Probably it's fixed already.

David Vasquez • March 23, 2023

Hi! I am coding as indicated and the results are as expected.

My code: faketucky <- read_csv("data/faketucky.csv", na = "999", col_types = list (male = col_character(), free_and_reduced_lunch = col_character(), received_high_school_diploma = col_character(), enrolled_in_college = col_character()))

skim(faketucky)

The only issue I have is that all of my histograms look like this, and do not show bars:

I tried googling and looking in StackOverflow but the explanations are a bit too technical. Any instruction on how to solve this is very well appreciated! Thanks!

Aviv Nur • June 11, 2023

Hi, when I run the code to import the dataset, I got this error Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name

What does it mean? Thank you

Nicole Sanchez • June 14, 2023

Hello again, when I did the data types code I keep getting, > faketucky <- read_csv( "data/faketucky.csv",

                  na = &quot;999&quot;,

                  col_types = list(enrolled_in_college = col_character(),

                                   free_and_reduced_lunch = col_character(),

                                   male = col_character(),

                                   received_high_school_diploma = col_character()))

Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name Thank you, Nicole