Get access to all lessons in this course.
-
RMarkdown
- Why Use RMarkdown?
- RMarkdown Overview
- YAML
- Text
- Code Chunks
- Wrapping Up
-
Data Wrangling and Analysis
- Getting Started
- The Tidyverse
- select
- mutate
- filter
- summarize
- group_by
- count
- arrange
- Create a New Data Frame
- Crosstabs
- Wrapping Up
-
Data Visualization
- An Important Workflow Tip
- The Grammar of Graphics
- Scatterplots
- Histograms
- Bar Charts
- color and fill
- scales
- Text and Labels
- Plot Labels
- Themes
- Facets
- Save Plots
- Wrapping Up
-
Wrapping Up
- You Did It!
Fundamentals of R
mutate
This lesson is locked
This lesson is called mutate, part of the Fundamentals of R course. This lesson is called mutate, part of the Fundamentals of R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Complete the mutate sections of the data-wrangling-and-analysis-exercises.Rmd file.
Learn More
General Data Wrangling and Analysis Resources
Because most material that discusses data wrangling and analysis with the dplyr packges does so in a way that covers all of the verbs discussed in this course, I have chosen not to separate them by lesson. Instead, here are some helpful resources for learning more about all of the tidyverse verbs discussed in this course:
Chapter 5 of R for Data Science
RStudio Cloud primer on working with data
Tidyverse for Beginners by Danielle Navarro
Learning Statistics with R by Danielle Navarro
Introduction to the Tidyverse by Alison Hill
A gRadual intRoduction to data wRangling by Chester Ismay and Ted Laderas
You need to be signed-in to comment on this post. Login.
Bohdanna Kinasevych
March 27, 2021
how do I add multiple columns together to create a new variable without having to write out each existing variable in the mutate function. For example if I have 10 questions on a survey and I want to create a composite score based on a sum of these 10 questions, is there a way to do this without having to write out: var1 + var2 + var3 .... + var10?
David Keyes
March 29, 2021
Great question, Bo! The best way to handle this is to tidy your data. Take a look at the tidy data and reshaping data lessons, which deal with this exxact issue. If you still have questions, let me know!
Eduardo Rodriguez
March 30, 2021
Hi David, how do I change a value format? For example, in the course assignment we divide by 30 and get a long decimal value. How would I format it so that it only shows two decimal places or even, out of curiosity, as a percentage?
Eduardo Rodriguez
March 30, 2021
Scratch the first part of the question. I could just wrap it in the round function. But how would I show a decimal as a percentage?
David Keyes
March 30, 2021
Yup, here's a video for you explaining this!
Eduardo Rodriguez
March 31, 2021
Whoa. More than helpful! The scales package is great and the round_half_up call is good to know. In Canada we also round up when it is a 5. Thanks again, David.
Harold Stanislaw
March 31, 2021
Question regarding overwriting a variable with itself. I'm always nervous about doing this, in case I mess up. On the other hand, I can see where creating new variables and keeping the old ones can easily get out of hand and/or require using the drop pipe a lot. If I want to overwrite a variable onto itself is there an easy way to undo the mutate if I make a mistake?
David Keyes
March 31, 2021
Yup! The easiest way is to just remove the line where you overwrite the variable and then rerun your code. That will bring your data frame back to the state where it was previously. Let me know if that makes sense!
Harold Stanislaw
March 31, 2021
Yes, it makes sense -- thanks!
Lina Khan
September 27, 2021
When the values are changed, like rounding values, how come it doesn't show in the dataset tab? Or, have I forgotten something? For example, the output still shows the rounded values when I select the variable. But other than clean_names, the changes don't appear in the dataset. Thanks!
David Keyes
September 28, 2021
Good question! This is something that confused me when I was learning R. Here's a quick explanation for you! Let me know if this is clear.
Ellen Wilson
October 6, 2022
This link no longer works
David Keyes
October 6, 2022
Sorry about that! We switched video hosts and that video didn't seem to get transferred over. We just talked about something similar yesterday in a training program we're doing for the Visitor Studies Association (VSA). You can see our discussion here where we talk about assigning objects vs displaying them. Let me know if that helps!
David Keyes
September 28, 2021
I typically keep both a numeric and a character variable. Here's a quick explanation of why.
Lindsay Quarles
October 21, 2021
Can you help me better understand the round function? I don't understand the digits part? Does it only apply to decimals? What if you want to round to the nearest whole number what would you put? Or to round to the nearest 10?
Lindsay Quarles
October 21, 2021
I figured it out. 0 for nearest whole number, -1 for nearest 10, and so on.
Yenn Lee
July 19, 2022
Hi, I have a column that I would like to split into three. The data in the existing column looks like the following, and I would like it to feed into three new columns: id, bio, and username.
{"id":"1234567","bio":"John Smith, first-year undergraduate in Physics","username":"JSAstronaut"}
All rows follow this exact format (although the bio part may be longer), so I feel there must be a simple way to do, and I was just wondering if anyone has suggestions. Or is this going to be covered at a later point in the courses? So far I have tried the separate function in tidyr, but it hasn't quite turned out to be the way I hoped.
I'd appreciate any pointers. Thank you!
Charlie Hadley
July 21, 2022
Hi Yenn!
This is a somewhat beyond basic problem to solve. Because it was a fun problem to solve I wrote a bit of custom code for you. There are two versions:
Unfortunately, there's a limit to how much custom code I can write for folks on this course - and I've hit that limit. We have additional courses that might be useful to you, and the R in 3 Months program gives you 12 weeks of custom feedback videos for your data. However, of course there's no requirement for you to make any purchases from us.
I also need to tell you to be careful about quotations in programming. Most strings are contained within double quotation marks. But. These must be contained in straight quotes, curly quotes will break your code. Because your strings themselves contain double quotes there are two approaches:
Hopefully this does help you. R is an awesome tool, but sometimes problem solving requires a few more steps than we can introduce in this course.
Cheers,
Charlie
Tatiana Bustos
July 27, 2022
Hi What if I wanted to create a new variable for a subset of groups with one type and then another set of groups with another For example, if I wanted to only create a new variable for "Females" with country = United States, but then select only "Males" with country = Canada. I tried to do the following as a best guess nhanes %>% mutate(completed_survey = "Yes") select(contains("female"))
David Keyes
July 28, 2022
Your best bet would be to use the
case_when() function
. I cover this in the Going Deeper course. Here's a video from the course showing how it works.