Skip to content
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

Complete the summarize sections of the data-wrangling-and-analysis-exercises.Rmd file.

Learn More

General Data Wrangling and Analysis Resources

Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:

Chapter 5 of R for Data Science

RStudio Cloud primer on working with data

Tidyverse for Beginners by Danielle Navarro

Learning Statistics with R by Danielle Navarro

Introduction to the Tidyverse by Alison Hill

A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas

Working in the Tidyverse by Desi Quintans and Jeff Powell

Christine Monnier video tutorials on dplyr

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

I got the count function to count the number of rows as assigned, but wanted to figure out a way to figure out the number of rows without an NA/with an answer for hours of sleep per night. I tried to add an argument "na.rm = TRUE" to the n() function in a few places in the code chunk but it didn't work.

David Keyes

David Keyes

October 3, 2021

You'd want to do this in two steps. I showed how to do it in this short video. The code I used is here. Hope that helps!

Hi David. I had the same question as above but cannot access the short video. Any way you could repost the link? Thanks.

Sorry, that video got deleted, but here's a new one! If you have other questions, please let me know.

Laura Hickerson

Laura Hickerson

January 17, 2022

Hi David - with the summarize function, it looks like you did not use a select statement afterward, but it still outputs the result. I have to put in a select statement afterwards to get any output for mean_hours_sleep. Any thoughts?

Laura Hickerson

Laura Hickerson

January 17, 2022

oops - sorry, now my code is displaying the output without select. Any tips about when select is required for output?

Kim Cataldo

Kim Cataldo

April 3, 2022

After we run the summarize function to create ‘mean_hours_sleep’ is this considered a new variable? Or only if we assign it? If we don’t assign it, do we have to repeat the summarize line of code whenever we want to reference it again, like when we’re using group_by?

nhanes %>% group_by(gender, work) %>% summarize(mean_hours_sleep = mean(sleep_hrs_night, na.rm = TRUE))

Charlie Hadley

Charlie Hadley

April 5, 2022

Hi Kim! This is another form of the "have you printed to the console or made an assignment" conundrum.

Your code as currently provided is spitting out to the console and that's it.

nhanes %>%
    group_by(gender, work) %>%
    summarize(mean_hours_sleep = mean(sleep_hrs_night, na.rm = TRUE))

But if we include an assignment,

nhanes_hours_sleep <- nhanes %>%
    group_by(gender, work) %>%
    summarize(mean_hours_sleep = mean(sleep_hrs_night, na.rm = TRUE))

Then mean_hours_sleep will now live inside of nhanes_hours_sleep.

In terms of language, I would usually describe mean_hours_sleep as a column inside the dataset (or object) named nhanes_hours_sleep. That's because "variable" already has a few meanings; a single value object in your environment or a variable within a formula created with y ~ x. This isn't to "correct you". But our hope is eventually you end up teaching someone else R as well, and when they have questions about "variables" you'll need to understand what context that is within.

Cheers,

Charlie

Kim Cataldo

Kim Cataldo

April 10, 2022

This is helpful, thank you!