Import Our Data Again
This lesson is called Import Our Data Again, part of the R in 3 Months (Fall 2022) course. This lesson is called Import Our Data Again, part of the R in 3 Months (Fall 2022) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Adjust your
read_csv()
code so that you import the data againUse the na argument to tell
read_csv()
what data should be treated as missingUse the
col_types
argument to make sure thatsex_v2
gets imported as character data
Have any questions? Put them below and we will help you out!
Course Content
142 Lessons
1
Welcome to Getting Started with R
00:57
2
Install R
02:05
3
Install RStudio
02:14
4
Projects
07:54
5
Files in R
04:33
6
Packages
02:38
7
Import Data
05:24
8
Objects and Functions
03:16
9
Examine our Data
12:50
10
Import Our Data Again
07:11
11
Getting Help
07:46
12
Wrapping Up
13
R in 3 Months Fall 2022 - Introductions thread!
14
R in 3 Months Fall 2022 Week 1 Live Session
59:22
1
Getting Started
03:01
2
The Tidyverse
12:11
3
select
05:48
4
mutate
04:08
5
filter
10:26
6
summarize
03:20
7
group_by
02:56
8
count
02:06
9
arrange
03:58
10
Create a New Data Frame
02:42
11
Crosstabs
06:58
12
Wrapping Up
13
R in 3 Months Fall 2022 Week 3 Project Assignment
02:33
14
R in 3 Months Fall 2022 Week 3 Drop-in Session
1:03:00
15
R in 3 Months Fall 2022 Week 3 Live Session
1:01:42
1
An Important Workflow Tip
05:00
2
The Grammar of Graphics
06:08
3
Scatterplots
05:15
4
Histograms
02:31
5
Bar Charts
06:32
6
color and fill
03:58
7
scales
09:14
8
Text and Labels
08:04
9
Plot Labels
06:06
10
Themes
03:56
11
Facets
05:57
12
Save Plots
05:17
13
Wrapping Up
14
You Did It!
15
R in 3 Months Fall 2022 Week 4 Project Assignment
04:32
16
R in 3 Months Fall 2022 Week 4 Drop-in Session
1:02:15
17
R in 3 Months Fall 2022 Week 4 Live Session
1:00:17
1
Welcome, Logistics, Course Materials, and Additional Resources
2
What is Git? What is GitHub?
02:23
3
Why Should You Learn to Use Git and GitHub?
03:04
4
Update Everything
07:34
5
Install Git
04:04
6
Configure Git
02:10
7
Create a Local Git Repository
03:16
8
Commits
06:00
9
Commit History
04:28
10
GitHub Repositories
04:47
11
Connect RStudio and GitHub
05:06
12
Push an RStudio Project to a GitHub Repository
02:57
13
Pull a GitHub Repository to an RStudio Project
02:52
14
Keep RStudio and GitHub in Sync
02:27
15
R in 3 Months Fall 2022 Week 6 Project Assignment
07:55
16
R in 3 Months Fall 2022 Week 6 Drop-in Session
17
R in 3 Months Fall 2022 Week 6 Live Session
1:02:01
1
Overview
2
Importing Data
15:45
3
Tidy Data
08:11
4
Reshaping Data
10:18
5
Dealing with Missing Data
04:56
6
Changing Variable Types
05:30
7
Advanced Variable Creation
19:26
8
Advanced Summarizing
10:00
9
Binding Data Frames
06:50
10
R in 3 Months Fall 2022 Week 7 Drop-in Session
1:06:06
11
R in 3 Months Fall 2022 Week 7 Live Session
1:01:49
1
Data Visualization Best Practices
04:51
2
Tidy Data
04:01
3
Pipe Data Into ggplot
04:50
4
Reorder Plots to Highlight Findings
06:09
5
Line Charts
04:10
6
Use Color to Highlight Findings
08:25
7
Declutter
10:47
8
Use the scales Package for Nicely Formatted Values
03:42
9
Use Direct Labeling
11:43
10
R in 3 Months Fall 2022 Week 9 Project Assignment
11
R in 3 Months Fall 2022 Week 9 Drop-in Session
22:04
12
R in 3 Months Fall 2022 Week 9 Live Session
1:02:50
1
Use Axis Text Wisely
03:15
2
Use Titles to Highlight Findings
03:33
3
Use Color in Titles to Highlight Findings
03:51
4
Use Annotations to Explain
04:52
5
Tweak Spacing
05:11
6
Customize Your Theme
02:48
7
Customize Your Fonts
08:18
8
Try New Plot Types
11:50
9
R in 3 Months Fall 2022 Week 11 Project Assignment
03:23
10
R in 3 Months Fall 2022 Week 11 Drop-in Session
11
R in 3 Months Fall 2022 Week 11 Live Session
57:28
1
Advanced Markdown Text Formatting
10:52
2
Tables
19:33
3
Advanced YAML
11:49
4
Inline R Code
06:57
5
Making Your Reports Shine: Word Edition
06:53
6
Making Your Reports Shine: HTML Edition
06:32
7
Making Your Reports Shine: PDF Edition
08:21
8
Presentations
04:21
9
Dashboards
06:28
10
Other Formats
04:57
11
You Did It!
12
R in 3 Months Fall 2022 Week 12 Drop-in Session
13
R in 3 Months Fall 2022 Week 12 Live Session
1:04:50
1
All R in 3 Months Fall 2022 Videos
2
Reading documentation pages
05:20
3
Working with file paths and RStudio Projects
05:37
4
Styling RMarkdown docs
5
Structuring large projects (and dealing with slow knitting of Rmd files)
03:47
6
Quarto vs RMarkdown
02:56
7
How to get lesson and lecture slides
03:15
8
{lubridate} for working with dates and times
07:28
9
Statistical Tests
10:17
You need to be signed-in to comment on this post. Login.
Christine Farrugia • March 16, 2021
I received an error related to the col_types function and I can't figure out how to correct it.
Import the faketucky data into a data frame called faketucky.
> faketucky library(tidyverse) -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.5 v tidyr 1.1.3 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts ------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag()
David Keyes Founder • March 16, 2021
Howdy! So you're not actually seeing an error. What you're seeing is just a couple messages that show up when you load the tidyverse. I made a short video to explain it to you. Hope it helps!
Jyoni Shuler • March 17, 2021
Hi David, I noticed that the code is indented in specific ways depending on the command. Do we have to format our text with indents to have it aligned as you showed in your demonstration? If so, do we just use "tab" or are there other, more efficient ways to go about spacing our code?
Peleise Smith • March 19, 2021
Hi there, I ran the skim function and I get the following error message:
Error: attempt to use zero-length variable name
Not sure which variable it's referring to since there are 12 and R seems to have read all 12 variables in the csv.
Thank you for sharing your knowledge! :)
Christian Marin • April 4, 2021
Hi David, I'm doing a refresher and noticed only some of my code is working for the character variables. An error pops up with an 'unexpected ')' for school district and received high school diploma.. Here is my code: col_types = list(first_high_school_attended = col_character(), race_ethnicity = col_character(), male = col_character(), enrolled_in_college = col_character(), received_high_school_diploma = col_character(), school_district = col_character()))
My output is only showing character variables for first_high_school attended, and school district, and race_ethnicity, leaving the rest in numeric.
Adwoa Odoom • August 17, 2021
Hello David, For some reason when I try to change the character types for the variables you mentioned, none of them turn into a character. I copy-pasted the same exact code you used and I also kept getting this error: Error in read_csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), : could not find function "read_csv". Could you assist?
Camille Antinori • January 7, 2022
Hi, Why do I get a different output for the read_csv function than you get at 2:54? I get:
> faketucky<-read_csv("data/faketucky.csv") Rows: 57855 Columns: 12
-- Column specification ------------------------------------- Delimiter: "," chr (3): first_high_school_attended, school_district, rac... dbl (9): student_id, male, free_and_reduced_lunch, percen...
i Use
spec()
to retrieve the full column specification for this data. i Specify the column types or setshow_col_types = FALSE
to quiet this message.John Franjione • January 17, 2022
A minor question... My skimr output is different than what it is in the video. I've got the variables right (i.e. there are now 7 character variables), but instead of showing integer counts of {missing, complete, n, min, max, empty, n_unique}, I've got: {n_missing, complete_rate, min, max, empty, n_unique, whitespace}
FWIW, the n_missing values are the same (14788 for enrolled_in_college, 860 for free_and_reduced_lunch, 14 for male).
Another difference is your list of character variables is in alphabetical order (enrolled_in_college, first_high_school_attended, etc.), whereas mine is not (first_high_school_attended, school_distrct,... enrolled_in_college). I don't see any relationship between the order and any of the other column entries.
LILIANA CUBAS GAONA • January 18, 2022
Hello David, I am using read.csv() function (because I did not find read_csv() function), so I put "." instead of "", but it is not still working. I have tried also to put "", but same error. Could you tell me where my error is please? Many thanks > # Import the faketucky data into a data frame called faketucky. > faketucky # Import the faketucky data into a data frame called faketucky. > faketucky <- read.csv("data/faketucky.csv", na = "999",col_types = list(enrolled_in_college = col_character(),
Error in col_character() : could not find function "col_character"
Aditi Shah • March 19, 2022
When we add the na = "999" argument to the read_csv function, are we passing the number as a string? If so, how does the function remove the values that are displayed as 999.000000 in the gpa column?
Additionally, when renaming the columns, why does passing the column name in brackets of the col_character() function not work? I get an "unused argument" error.
Danielle Lowry • March 22, 2022
Strange question... I noticed that David's summary stats in the console when he runs the skim command are all nicely left justified. On my end, decimals line up and it looks so messy. Is there a way to correct this in R so my output looks more clean like David's?
Laura Estrela Muriel • March 24, 2022
Hi David, I get this error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list())) This is what I wrote: faketucky <- read.csv("data/faketucky.csv", na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), recieved_high_school_diploma = col_character()))
Thank you!
jeph mathias • March 24, 2022
HI Charlie/David Just trying the re-import data bit. I have this code in my scriptfaketucky col_types = list(enrolled_in_college = col_character(),
any ideas? Thanks
jeph mathias • March 24, 2022
The second half of what I was trying to ask was about the error message that I get but ignore for now.I am going to try with teh cloud version.
Gloria Li • May 17, 2022
Error in col_character() : could not find function "col_character"
Ravindra Mehta • July 3, 2022
I get an error message with the following code. Not sure what is wrong. Thanks for any guidance > faketucky<-read_csv("data/faketucky.csv", na="999", col_types() = list(male = col_character(),free_and_reduced_lunch = col_character(),received_high_school_diploma = col_character(), enrolled_in_college = col_character())) Error: unexpected '=' in "faketucky
Julieth Silao • September 21, 2022
Hello David. I got error say could not find function "col_character"
faketucky
Amanda Krantz • September 21, 2022
It seems like the new import went through fine, but my data doesn't match what I see of yours in the video (my result below).
── Variable type: character ────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique 1 first_high_school_attended 0 1 4 14 0 393 2 school_district 0 1 4 13 0 171 3 male 14 1.00 1 1 0 2 4 race_ethnicity 0 1 0 24 794 6 5 free_and_reduced_lunch 860 0.985 1 1 0 2 6 received_high_school_diploma 0 1 1 1 0 2 7 enrolled_in_college 14788 0.744 1 1 0 2 whitespace 1 0 2 0 3 0 4 0 5 0 6 0 7 0
── Variable type: numeric ──────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 1 student_id 0 1 55922. 32333. 1 27910.
2 percent_absent 111 0.998 8.78 16.0 0 3.26 3 gpa 2185 0.962 2.59 0.874 0 2.00 4 act_reading_score 14121 0.756 19.8 5.80 2 15
5 act_math_score 14101 0.756 19.0 4.65 1 16
p50 p75 p100 hist 1 56070 83872. 111990 ▇▇▇▇▇ 2 6.27 11.3 3153 ▇▁▁▁▁ 3 2.66 3.28 4 ▁▂▆▇▇ 4 19 23 36 ▁▆▇▅▂ 5 17 22 36 ▁▃▇▃▁
Ellen Wilson • October 4, 2022
I am getting an error message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (col_types = list(list(), list(), list(), list()))
This is the code I entered: faketucky<- read.csv("data/faketucky.csv", na=999, col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
Julieth Silao • October 29, 2022
faketucky <- read_csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
and i get error saying col_character not found
Julieth Silao • October 29, 2022
#editing column in the table faketucky <- read.csv("data/faketucky.csv", na = "999", col_type = list(enrolled_in_college =col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())) i also try this but still get an error
Dimeji Olawuyi • December 23, 2022
Hi David, I run this col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))
But received Error: unexpected ')' in "col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character()))"
Could you point out what's wrong?
Pegah Maleki • January 21, 2023
Hello! Trying to run this code and getting error message: col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), male = col_character(), received_high_school_diploma = col_character()))
Error message:
> col_types = list(enrolled_in_college)= col_character(),free_and_reduced_lunc h= col_character(), Error: unexpected ',' in "col_types = list(enrolled_in_college)= col_character(),"
Sarah Sexton • January 25, 2023
I am running the code faketucky <- read_csv("data/faketucky.csv"), na = "999", col_types = list(enrolled_in_college = col_character(), free_and_reduced_lunch = col_character(), male = col_character(), received_high_school_diploma = col_character())
And I keep getting the error Error: unexpected ',' in "faketucky <- read_csv("data/faketucky.csv")," but if I take out the commas it doesn't make the changes to the file?
The Evaluation Center • March 15, 2023
Can you explain why in the first couple of exercises where we read the CSV into R, we are able to use read.csv(), but when we tell R to read the CSV with the col_types() function, we have to update read.csv() to read_csv()?
Mark Adrian Salvador • March 22, 2023
I didn't get an error though. Probably it's fixed already.
David Vasquez • March 23, 2023
Hi! I am coding as indicated and the results are as expected.
My code: faketucky <- read_csv("data/faketucky.csv", na = "999", col_types = list (male = col_character(), free_and_reduced_lunch = col_character(), received_high_school_diploma = col_character(), enrolled_in_college = col_character()))
skim(faketucky)
The only issue I have is that all of my histograms look like this, and do not show bars:
I tried googling and looking in StackOverflow but the explanations are a bit too technical. Any instruction on how to solve this is very well appreciated! Thanks!
Aviv Nur • June 11, 2023
Hi, when I run the code to import the dataset, I got this error Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name
What does it mean? Thank you
Nicole Sanchez • June 14, 2023
Hello again, when I did the data types code I keep getting, > faketucky <- read_csv( "data/faketucky.csv",
Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name Thank you, Nicole