Skip to content
R for the Rest of Us Logo

Import Our Data Again

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  1. Adjust your read_csv() code so that you import the data again

  2. Use the na argument to tell read_csv() what data should be treated as missing

  3. Use the col_types argument to make sure that sex_v2 gets imported as character data

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Christine Farrugia

Christine Farrugia

March 15, 2021

I received an error related to the col_types function and I can't figure out how to correct it.

Import the faketucky data into a data frame called faketucky.

> faketucky library(tidyverse) -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag()

David Keyes

David Keyes

March 15, 2021

Howdy! So you're not actually seeing an error. What you're seeing is just a couple messages that show up when you load the tidyverse. I made a short video to explain it to you. Hope it helps!

Christine Farrugia

Christine Farrugia

March 16, 2021

Thank you for your response! The video does not have any sound. The bigger issue is how to fix the error when I try to run col_character. This is the message I get: > faketucky<-read.csv("data/faketucky.csv",

  •                 coltypes = list(enrolled_in_college = col_character(),
    
  •                                 free_and_reduced_lunch = col_character(),
    
  •                                 male = col_character(),
    
  •                                 received_high_school_diploma = col_character()))
    

Error in col_character() : could not find function "col_character"

How do I fix this?

Christine Farrugia

Christine Farrugia

March 16, 2021

I just realized there was a typo in the last code block I sent (coltypes should be col_types), but the error still occurs when I use col_types.

David Keyes

David Keyes

March 16, 2021

Oh jeez, sorry about the video with no sound! In terms of your issue, the reason it is occurring is that you're using the read.csv() function, not read_csv() (note the _ in place of the .). Try it again with that and it should work!

Christine Farrugia

Christine Farrugia

March 16, 2021

Thank you!! I have spent hours googling this issue. I greatly appreciate your fast help!

Jyoni Shuler

Jyoni Shuler

March 16, 2021

Hi David, I noticed that the code is indented in specific ways depending on the command. Do we have to format our text with indents to have it aligned as you showed in your demonstration? If so, do we just use "tab" or are there other, more efficient ways to go about spacing our code?

Jyoni Shuler

Jyoni Shuler

March 16, 2021

Oh, never mind - I see it formats automatically!

David Keyes

David Keyes

March 16, 2021

No worries. You can also indent automatically using command+I (Mac) or control+I (Windows). See this demonstration.

Peleise Smith

Peleise Smith

March 18, 2021

Hi there, I ran the skim function and I get the following error message:

Error: attempt to use zero-length variable name

Not sure which variable it's referring to since there are 12 and R seems to have read all 12 variables in the csv.

Thank you for sharing your knowledge! :)

David Keyes

David Keyes

March 18, 2021

Could you post your code so I can see exactly what the issue might be?

Christian Marin

Christian Marin

April 3, 2021

Hi David, I'm doing a refresher and noticed only some of my code is working for the character variables. An error pops up with an 'unexpected ')' for school district and received high school diploma.. Here is my code: col_types = list(first_high_school_attended = col_character(), race_ethnicity = col_character(), male = col_character(), enrolled_in_college = col_character(), received_high_school_diploma = col_character(), school_district = col_character()))

My output is only showing character variables for first_high_school attended, and school district, and race_ethnicity, leaving the rest in numeric.

David Keyes

David Keyes

April 5, 2021

You've got one to many ) at the end of your code :)

Adwoa Odoom

Adwoa Odoom

August 17, 2021

Hello David, For some reason when I try to change the character types for the variables you mentioned, none of them turn into a character. I copy-pasted the same exact code you used and I also kept getting this error: Error in read_csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), : could not find function "read_csv". Could you assist?

David Keyes

David Keyes

August 18, 2021

Are you sure you loaded the tidyverse package (i.e. have the code library(tidyverse) somewhere in your R script file)? If that function can't be found, it almost certainly means you haven't loaded the package. Let me know if that helps!

Camille Antinori

Camille Antinori

January 7, 2022

Hi, Why do I get a different output for the read_csv function than you get at 2:54? I get:
> faketucky<-read_csv("data/faketucky.csv") Rows: 57855 Columns: 12
-- Column specification ------------------------------------- Delimiter: "," chr (3): first_high_school_attended, school_district, rac... dbl (9): student_id, male, free_and_reduced_lunch, percen...

i Use spec() to retrieve the full column specification for this data. i Specify the column types or set show_col_types = FALSE to quiet this message.

David Keyes

David Keyes

January 7, 2022

Are you sure you used read_csv() not read.csv(). Note the _ (not .) between read and csv.

Camille Antinori

Camille Antinori

January 7, 2022

Hi David, The code is just as I pasted above. Maybe it is the version I am using: R version 4.0.2 (2020-06-22) -- "Taking Off Again"

David Keyes

David Keyes

January 8, 2022

I just ran it and I get the same thing as you now too. I don't think it's the version of R, but rather the version of the readr package. It's been updated since I recorded this video and it looks like the output messages are different now. Hope that answers it for you!

Camille Antinori

Camille Antinori

January 8, 2022

Thanks for checking!

John Franjione

John Franjione

January 16, 2022

A minor question... My skimr output is different than what it is in the video. I've got the variables right (i.e. there are now 7 character variables), but instead of showing integer counts of {missing, complete, n, min, max, empty, n_unique}, I've got: {n_missing, complete_rate, min, max, empty, n_unique, whitespace}

FWIW, the n_missing values are the same (14788 for enrolled_in_college, 860 for free_and_reduced_lunch, 14 for male).

Another difference is your list of character variables is in alphabetical order (enrolled_in_college, first_high_school_attended, etc.), whereas mine is not (first_high_school_attended, school_distrct,... enrolled_in_college). I don't see any relationship between the order and any of the other column entries.

David Keyes

David Keyes

January 18, 2022

This is almost certainly a difference in how these packages work today versus how they worked a couple years ago when I made this course. Nothing different is happening under the hood, it's just the message they give appears to be slightly different. Let me know if you have other questions!

LILIANA CUBAS GAONA

LILIANA CUBAS GAONA

January 18, 2022

Hello David, I am using read.csv() function (because I did not find read_csv() function), so I put "." instead of "", but it is not still working. I have tried also to put "", but same error. Could you tell me where my error is please? Many thanks > # Import the faketucky data into a data frame called faketucky. > faketucky # Import the faketucky data into a data frame called faketucky. > faketucky <- read.csv("data/faketucky.csv", na = "999",col_types = list(enrolled_in_college = col_character(),

  •                                                                   free_and_reduced_lunch = col_character(),
    
  •                                                                   male = col_character(), recieved_high_school_diploma = col_character()))
    

Error in col_character() : could not find function "col_character"

David Keyes

David Keyes

January 18, 2022

Have you installed and loaded the tidyverse package? read_csv() won't work without that.

LILIANA CUBAS GAONA

LILIANA CUBAS GAONA

January 19, 2022

Many thanks for your answer David. I had to install again tidyverse package. It seems that I have to do it each time that I open R studio. I happens me also with Skimr. Any suggestion about that? thanks in advance.

David Keyes

David Keyes

January 19, 2022

You should only have to install packages once per computer. By that I mean running install.packages("tidyverse"). You do need to load any package you want to use each time you open RStudio. That means running library(tidyverse) at the top of your code each time. Does that clarify things?

Aditi Shah

Aditi Shah

March 19, 2022

When we add the na = "999" argument to the read_csv function, are we passing the number as a string? If so, how does the function remove the values that are displayed as 999.000000 in the gpa column?

Additionally, when renaming the columns, why does passing the column name in brackets of the col_character() function not work? I get an "unused argument" error.

Charlie Hadley

Charlie Hadley

March 21, 2022

Hi Aditi! When using the read_csv(..., na = "999") we need to give the na value as a string because when read_csv() first parses the data file it has not yet determined what type of data each column contains. The column type is decided afterwards by the parse_guess() function. For a demonstration, let's create a fake dataset that contains 999 in both a character and numeric column:

library(tidyverse)

tribble(
  ~name, ~age,
  "Charlie", 34,
  "999", 30,
  "David", 999
) %>% 
  write_csv("data/fake-data.csv")

read_csv("data/fake-data.csv", na = "999")

See how 999 is converted to NA in both columns.

I'm not entirely sure about your renaming column question. This argument can be given a vector of new column names, but if so the first row will be assumed to be data: as you'll see here.

read_csv("data/faketucky.csv",
         col_names = c("student_id", "first_high_school_attended", "school_district" = "education_district", 
                       "male", "race_ethnicity", "free_and_reduced_lunch", "percent_absent", 
                       "gpa", "act_reading_score", "act_math_score", "received_high_school_diploma", 
                       "enrolled_in_college"))  

Could you provide some more detail about your col_names argument question? Thanks, Charlotte

Aditi Shah

Aditi Shah

March 23, 2022

Thank you for your response!

About the second question, I'm wondering why this doesn't work:

read_csv("data/faketucky.csv", na = "999",
                      col_types = list(col_character(enrolled_in_college),
                                        col_character(free_and_reduced_lunch),
                                        col_character(male),
                                        col_character(received_high_school_diploma)))

Charlie Hadley

Charlie Hadley

March 23, 2022

The col_character() and other col_*() do not take any arguments, instead their positional order dictates their behaviour. So if we wanted to force the first 4 columns to be treated as character columns we would write

read_csv("data/faketucky.csv", na = "999",
         col_types = list(col_character(),
                          col_character(),
                          col_character(),
                          col_character()))

It is possible to target columns by their name through the use of a named list, eg

read_csv("data/faketucky.csv", na = "999",
         col_types = list(enrolled_in_college = col_character()))

Aditi Shah

Aditi Shah

March 23, 2022

I see. This is a tangential question, but I tried to run the following code to avoid repeating the col_character() function in your example:

faketucky &lt;- read_csv(&quot;data/faketucky.csv&quot;, na = &quot;999&quot;,
                      col_types = list(rep(col_character(),4)))

faketucky &lt;- read_csv(&quot;data/faketucky.csv&quot;, na = &quot;999&quot;,
                      col_types = rep(list(col_character()),4))

The second one worked but not the first. If it's not too much of a hassle, could you please explain why that is?

Danielle Lowry

Danielle Lowry

March 21, 2022

Strange question... I noticed that David's summary stats in the console when he runs the skim command are all nicely left justified. On my end, decimals line up and it looks so messy. Is there a way to correct this in R so my output looks more clean like David's?

Charlie Hadley

Charlie Hadley

March 22, 2022

Hi Danielle! That's not a strange question. The {skimr} package does as good a job as possible to neatly fit its output into the terminal window dependent on its current width. The package is almost entirely designed to give a quick overview of data and it would require unpicking the package to customise the appearance of the output. As you get into using {ggplot2} for data visualisations we'll show you (and you can ask us questions) about making these look pixel perfect.

Laura Estrela Muriel

Laura Estrela Muriel

March 23, 2022

Hi David, I get this error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list())) This is what I wrote: faketucky <- read.csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), recieved_high_school_diploma = col_character()))

Thank you!

Charlie Hadley

Charlie Hadley

March 23, 2022

Hi Laura! This is a very common error - instead of using the read_csv() function from the tidyverse you've written read.csv(). The error is being generated because the read.csv() function does not have the col_types argument, if you swap read_csv() for read.csv() you should find it works. Cheers, Charlotte

Laura Estrela Muriel

Laura Estrela Muriel

March 23, 2022

Oh, right. Thank you Charlotte!!

Isidora Murillo

Isidora Murillo

May 16, 2023

Hi Charlotte! I have the same problem as Laura so I changed the function to read_csv() but it did not work. Here is what I wrote (it does not recognize col_types). > read_csv("/Users/isidoramurillo/Desktop/R/getting-started-master/data/faketucky.csv",

  •      na= "999", (col_t = list(enrolled_in_college = col_character(), 
    
  •                       free_and_reduced_lunch = col_character(),
    
  •                       male = col_character(),
    
  •                       received_high_school_diploma = col_character()))
    
  • skim(faketucky) Error: unexpected symbol in: " received_high_school_diploma = col_character())) skim"

Thanks!

David Keyes

David Keyes

May 16, 2023

Hi there! Did you use col_types() or col_t() because in your comment I see the latter. Please let me know and we can take it from there!

Isidora Murillo

Isidora Murillo

May 18, 2023

Hi David, I wrote col_types () (might have deleted the letters when I copied the text)

jeph mathias

jeph mathias

March 23, 2022

HI Charlie/David Just trying the re-import data bit. I have this code in my scriptfaketucky col_types = list(enrolled_in_college = col_character(),

  •                   free_and_reduced_lunch = col_character(),
    
  •                   male = col_character(),
    
  •                   received_high_school_diploma = col_cha
    

any ideas? Thanks

jeph mathias

jeph mathias

March 23, 2022

The second half of what I was trying to ask was about the error message that I get but ignore for now.I am going to try with teh cloud version.

Gloria Li

Gloria Li

May 17, 2022

Error in col_character() : could not find function "col_character"

Charlie Hadley

Charlie Hadley

May 18, 2022

Hello Gloria,

Could I see the code you're running? It's likely that this is due to one of two common mistakes:

  • Not loading the tidyverse, ie library("tidyverse")
  • Accidentally the read.csv() function instead of read_csv()

Cheers,

Charlie

Ravindra Mehta

Ravindra Mehta

July 3, 2022

I get an error message with the following code. Not sure what is wrong. Thanks for any guidance > faketucky<-read_csv("data/faketucky.csv", na="999", col_types() = list(male = col_character(),free_and_reduced_lunch = col_character(),received_high_school_diploma = col_character(), enrolled_in_college = col_character())) Error: unexpected '=' in "faketucky

Charlie Hadley

Charlie Hadley

July 12, 2022

Hello Ravindra,

This error is caused by the round brackets you provided to the col_types argument. Which is quite technical to explain, so here's a short video.

Thanks,

Charlie

Julieth Silao

Julieth Silao

September 21, 2022

Thanks, but the problem is faketucky

Julieth Silao

Julieth Silao

September 21, 2022

Hello David. I got error say could not find function "col_character"

faketucky

Have you loaded the tidyverse package using library(tidyverse) in your code prior to running this?

Amanda Krantz

Amanda Krantz

September 21, 2022

It seems like the new import went through fine, but my data doesn't match what I see of yours in the video (my result below).

── Variable type: character ────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique 1 first_high_school_attended 0 1 4 14 0 393 2 school_district 0 1 4 13 0 171 3 male 14 1.00 1 1 0 2 4 race_ethnicity 0 1 0 24 794 6 5 free_and_reduced_lunch 860 0.985 1 1 0 2 6 received_high_school_diploma 0 1 1 1 0 2 7 enrolled_in_college 14788 0.744 1 1 0 2 whitespace 1 0 2 0 3 0 4 0 5 0 6 0 7 0

── Variable type: numeric ──────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 1 student_id 0 1 55922. 32333. 1 27910.
2 percent_absent 111 0.998 8.78 16.0 0 3.26 3 gpa 2185 0.962 2.59 0.874 0 2.00 4 act_reading_score 14121 0.756 19.8 5.80 2 15
5 act_math_score 14101 0.756 19.0 4.65 1 16
p50 p75 p100 hist 1 56070 83872. 111990 ▇▇▇▇▇ 2 6.27 11.3 3153 ▇▁▁▁▁ 3 2.66 3.28 4 ▁▂▆▇▇ 4 19 23 36 ▁▆▇▅▂ 5 17 22 36 ▁▃▇▃▁

So I think what happened is that the default behavior of the read_csv() function has changed slightly since I recorded this video. As a result, it may import data with slightly different column types. Nothing to worry about, as you also know now how to change column types.

Ellen Wilson

Ellen Wilson

October 3, 2022

I am getting an error message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list()))

This is the code I entered: faketucky<- read.csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

Ellen Wilson

Ellen Wilson

October 3, 2022

Ah. I just realized I had read.csv instead of read_csv! But, I'm still getting an error. Now it says: Error in enc2utf8(na) : argument is not a character vector

My code is now: faketucky<- read_csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

David Keyes

David Keyes

October 3, 2022

Try putting the 999 in quotes ("999").

Ellen Wilson

Ellen Wilson

October 4, 2022

Thanks--that worked!

Julieth Silao

Julieth Silao

October 29, 2022

faketucky <- read_csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

and i get error saying col_character not found

Julieth Silao

Julieth Silao

October 29, 2022

#editing column in the table faketucky <- read.csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college =col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())) i also try this but still get an error

David Keyes

David Keyes

October 31, 2022

You need to use read_csv(), not read.csv().

Dimeji Olawuyi

Dimeji Olawuyi

December 22, 2022

Hi David, I run this col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))

But received Error: unexpected ')' in "col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))"

Could you point out what's wrong?

Pegah Maleki

Pegah Maleki

January 20, 2023

Hello! Trying to run this code and getting error message: col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), male = col_character(), received_high_school_diploma = col_character()))

Error message:

> col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), Error: unexpected ',' in "col_types = list(enrolled_in_college)= col_character(),"

David Keyes

David Keyes

January 20, 2023

It looks like you have a space between the c and h in lunch. Can you check that and see if it works if you fix it?

Sarah Sexton

Sarah Sexton

January 24, 2023

I am running the code faketucky <- read_csv("data/faketucky.csv"), na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())

And I keep getting the error Error: unexpected ',' in "faketucky <- read_csv("data/faketucky.csv")," but if I take out the commas it doesn't make the changes to the file?

David Keyes

David Keyes

January 24, 2023

Try removing this parentheses: https://show.rfor.us/61Q7v9f9

Sarah Sexton

Sarah Sexton

January 25, 2023

That worked, thank you!

The Evaluation Center

The Evaluation Center

March 15, 2023

Can you explain why in the first couple of exercises where we read the CSV into R, we are able to use read.csv(), but when we tell R to read the CSV with the col_types() function, we have to update read.csv() to read_csv()?

David Keyes

David Keyes

March 15, 2023

It's a good question! The reason is that, when we didn't use any arguments, both functions work. However, when we use col_types(), that doesn't exist in read.csv() but does in read_csv(). Hope that helps!

Mark Adrian Salvador

Mark Adrian Salvador

March 22, 2023

I didn't get an error though. Probably it's fixed already.

David Vasquez

David Vasquez

March 23, 2023

Hi! I am coding as indicated and the results are as expected.

My code: faketucky <- read_csv("data/faketucky.csv", na = "999", col_types = list (male = col_character(), free_and_reduced_lunch = col_character(), received_high_school_diploma = col_character(), enrolled_in_college = col_character()))

skim(faketucky)

The only issue I have is that all of my histograms look like this, and do not show bars:

I tried googling and looking in StackOverflow but the explanations are a bit too technical. Any instruction on how to solve this is very well appreciated! Thanks!

David Vasquez

David Vasquez

March 23, 2023

This is how histograms look

David Vasquez

David Vasquez

March 23, 2023

The platform does not let me reply with the code I get instead of my histograms (it gets erased). It is a long string of several of these: U+2587. Sorry for any confusion.

David Keyes

David Keyes

March 23, 2023

I think this is a common issue on Windows, unfortunately. You could try running the function fix_windows_histograms() first and then running skim() again to see if that helps.

Hi, when I run the code to import the dataset, I got this error Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name

What does it mean? Thank you

David Keyes

David Keyes

June 12, 2023

This actually seems to be a bug in the most recent version of RStudio (detailed and very technical discussion here). For now, I wouldn't worry about it. If you want to get rid of the message, you can download an earlier version of RStudio from here. Or, just wait until you are prompted to download a new version of RStudio and upgrade, which, I assume, will get rid of the message.

Nicole Sanchez

Nicole Sanchez

June 13, 2023

Hello again, when I did the data types code I keep getting, > faketucky <- read_csv( "data/faketucky.csv",

  •                   na = &quot;999&quot;,
    
  •                   col_types = list(enrolled_in_college = col_character(),
    
  •                                    free_and_reduced_lunch = col_character(),
    
  •                                    male = col_character(),
    
  •                                    received_high_school_diploma = col_character()))
    

Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name Thank you, Nicole

David Keyes

David Keyes

June 13, 2023

Please take a look at my response to Aviv, which deals with the same issue.

Nicole Sanchez

Nicole Sanchez

June 13, 2023

Sorry David, but that link to Aviv did not work. Thx

David Keyes

David Keyes

June 14, 2023

Scroll down and you should see it.