Import Our Data Again
This lesson is called Import Our Data Again, part of the R in 3 Months (Spring 2025) course. This lesson is called Import Our Data Again, part of the R in 3 Months (Spring 2025) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Adjust your
read_csv()
code so that you import the data againUse the na argument to tell
read_csv()
what data should be treated as missingUse the
col_types
argument to make sure thatsex_v2
gets imported as character data
Have any questions? Put them below and we will help you out!
Course Content
127 Lessons
1
Welcome to Getting Started with R
00:57
2
Install R
02:05
3
Install RStudio
02:14
4
Files in R
04:33
5
Projects
07:54
6
Packages
02:38
7
Import Data
05:24
8
Objects and Functions
03:16
9
Examine our Data
12:50
10
Import Our Data Again
07:11
11
Getting Help
07:46
12
Week 1 Live Session (Spring 2025)
1:03:11
1
Welcome to Fundamentals of R
01:36
2
Update Everything
02:45
3
Start a New Project
02:16
4
The Tidyverse
03:34
5
Pipes
04:15
6
select()
07:25
7
mutate()
04:25
8
filter()
10:05
9
summarize()
05:59
10
group_by() and summarize()
05:54
11
arrange()
02:07
12
Create a New Data Frame
03:58
13
Bring it All Together (Data Wrangling)
07:29
14
Week 2 Project Assignment
09:39
15
Week 2 Coworking Session (Spring 2025)
16
Week 2 Live Session (Spring 2025)
1:03:24
1
The Grammar of Graphics
04:39
2
Scatterplots
03:46
3
Histograms
05:47
4
Bar Charts
06:37
5
Setting color and fill Aesthetic Properties
02:39
6
Setting color and fill Scales
05:40
7
Setting x and y Scales
03:09
8
Adding Text to Plots
07:32
9
Plot Labels
03:57
10
Themes
02:19
11
Facets
03:12
12
Save Plots
02:57
13
Bring it All Together (Data Visualization)
06:42
14
Week 3 Project Assignment
03:30
15
Week 3 Coworking Session (Spring 2025)
16
Week 3 Live Session (Spring 2025)
1:02:31
1
Downloading and Importing Data
10:32
2
Overview of Tidy Data
05:50
3
Tidy Data Rule #1: Every Column is a Variable
07:43
4
Tidy Data Rule #3: Every Cell is a Single Value
10:04
5
Tidy Data Rule #2: Every Row is an Observation
04:42
6
Week 6 Coworking Session (Spring 2025)
7
Week 6 Live Session (Spring 2025)
1:02:38
1
Best Practices in Data Visualization
03:44
2
Tidy Data
02:25
3
Pipe Data into ggplot
09:54
4
Reorder Plots to Highlight Findings
03:37
5
Line Charts
04:17
6
Use Color to Highlight Findings
09:16
7
Declutter
08:29
8
Add Descriptive Labels to Your Plots
09:10
9
Use Titles to Highlight Findings
08:14
10
Use Annotations to Explain
07:09
11
Week 9 Coworking Session (Spring 2025)
12
Week 9 Live Session (Spring 2025)
59:09
1
Advanced Markdown
06:43
2
Tables
18:36
3
Advanced YAML and Code Chunk Options
05:53
4
Inline R Code
04:42
5
Making Your Reports Shine: Word Edition
04:30
6
Making Your Reports Shine: PDF Edition
06:11
7
Making Your Reports Shine: HTML Edition
06:06
8
Presentations
10:12
9
Dashboards
05:38
10
Websites
06:43
11
Publishing Your Work
04:38
12
Quarto Extensions
05:50
13
Parameterized Reporting, Part 1
10:57
14
Parameterized Reporting, Part 2
05:11
15
Parameterized Reporting, Part 3
07:47
16
Week 12 Coworking Session (Spring 2025)
17
Week 12 Live Session (Spring 2025)
57:01
You need to be signed-in to comment on this post. Login.
Betsy Dalton • September 15, 2023
Hm, when I try to run
penguins_data <- read_csv("penguins_data.csv", na = "-999")
I keep getting the following error:
David Keyes Founder • September 15, 2023
This appears to be a bug in RStudio (others have seen the same thing, as have I). You can safely ignore it. If you update RStudio in a few weeks, my guess is it will be fixed.
Betsy Dalton • September 15, 2023
Thanks!
Alberto Cabrera • November 5, 2023
penguins <- read_csv("https://raw.githubusercontent.com/rfortherestofus/rin3-fall-2023/main/data-raw/penguins.csv")
Retrieves a csv file from a GitHub account, subfolder data-raw and creates a data frame labeled penguins
Kiara Sanchez • September 17, 2023
I keep running my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c") with teh skim function but the only column that changes to NA is the sex column.
Kiara Sanchez • September 17, 2023
Nevermind all fixed
Shubhra Murarka • March 1, 2024
Error when running: penguins_data <- read_csv("penguins_data.csv",
Error in base::nchar(wide_chars$test, type = "width") :
lazy-load database '/Users/shubhra/Library/R/x86_64/4.1/library/cli/R/sysdata.rdb' is corrupt In addition: Warning messages: 1: In base::nchar(wide_chars$test, type = "width") : restarting interrupted promise evaluation 2: In base::nchar(wide_chars$test, type = "width") : internal error -3 in R_decompress1
David Keyes Founder • March 1, 2024
Hmm, I'm not quite sure what's going on. One quick question, though: did you just install R/RStudio for this course or had you installed it previously?
Alyssa Jeffers • March 14, 2024
Hi there, when I ran the argument for changing sex_v2 to character data, I noticed that col_types showed up as a new data item in my Data Environment, but I noticed on your demo screen, it didn't show up as an additional item after running the code, you only have the penguins_data. Should this not have happened? This was my code: penguins_data <- read_csv("penguins_data.csv", na = c("-999", "NA"), col_types = cols(sex_v2 = "c"))
David Keyes Founder • March 14, 2024
Very strange! I've not seen that. Can you record a quick video using this and show me what you're seeing? Please email me after you upload the video so I know to look for it.
David Keyes Founder • March 19, 2024
Ok, I've watched your video and I've got a solution for you! Check out this video. Let me know if this helps!
Alyssa Jeffers • March 20, 2024
Got it, thanks for explaining that! I bet that's what I did.
Sandra Virgo • March 20, 2024
When I run the read_csv() code again to deal with the -999 values, it does not completely work.
Looking at the data using view(penguins_data), I still have some -999 values in case 4, as well as some -999.0 values in case 4.
I have adapted the code to read read_csv("penguins_data.csv", na = "-999, -999.0") to deal with the ones with the decimal point and the zero, but even after that there are still these issues, including in sex_v2
I can see that in sex there are some NA values now, which means the code has partially worked, I guess.
Apologies, I don't seem to be able to get a screenshot in here.
Libby Heeren Coach • March 20, 2024
Hi, Sandra! I made a short video going over the process of replacing values with NA in the data. Please take a look at it and let me know if you have any questions! https://muse.ai/v/4i7KhQx
Recap: -999 and -999.0 are the same value, just displayed differently. When using the na argument, you can either use one value inside quotations, like
"-999"
, or two values inside thec()
function, each in their own set of quotation marks, like this:c("-999", "NA")
Sandra Virgo • March 20, 2024
Thanks so much, Libby. The moment you said that the -999.0 values were the same as the -999 values it got me back on the right track again. I thought I had tried every combination of solutions but I clearly hadn't tried the easiest one. The video was very helpful - thanks so much!
Libby Heeren Coach • March 20, 2024
Woo! That's a win! So glad it helped, and thanks for asking questions! They help everyone :)
Puspanjali Gurung • July 31, 2024
When I try to run the following command, it doesn't omit the -999 from the table. Why is this the case?
David Keyes Founder • August 1, 2024
It's a bit hard to say. If you want to record a video to show me what's happening I can take a look. Please do that here.
Caroline Kypson • September 4, 2024
The -999 in my table did not disappear after I ran penguins_data <- read_csv("penguins_data.csv", na = "-999"). It did disappear in the skim data though.
David Keyes Founder • September 4, 2024
Can you clarify what you mean by your "table"?
Vivian KAWANAMI • September 19, 2024
hi David and Gracielle! Quick Q: when writing the code for read_csv and all the embedded functions, the line becomes too long to read and I've noticed that David uses a shortcut to add the new functions (e.g., na and then col_types) to lines further below. Should I just press enter or is there another way of parsing the line? Thanks!
Gracielle Higino Coach • September 20, 2024
Olá Vivian! Yes, hitting enter is all you need to add a new line within your function. Notice that this only works if you're working on a text file like "import.R", as shown in the video (i.e., your script). If you're working on the console, to add a new line you might need to use shift+enter, and the console will show a plus sign instead of a
>
at the beginning of the new line. This is to mean that you still need to add something to finish the command!