Get access to all lessons in this course.
-
Advanced Data Wrangling
- Downloading and Importing Data
- Overview of Tidy Data
- Tidy Data Rule #1: Every Column is a Variable
- Tidy Data Rule #3: Every Cell is a Single Value
- Tidy Data Rule #2: Every Row is an Observation
- Changing Variable Types
- Dealing with Missing Data
- Advanced Summarizing
- Binding Data Frames
- Functions
- Data Merging
- Exporting Data
- Bring It All Together (Advanced Data Wrangling)
-
Advanced Data Visualization
- Best Practices in Data Visualization
- Tidy Data
- Pipe Data into ggplot
- Reorder Plots to Highlight Findings
- Line Charts
- Use Color to Highlight Findings
- Declutter
- Add Descriptive Labels to Your Plots
- Use Titles to Highlight Findings
- Use Annotations to Explain
- Tweak Spacing
- Create a Custom Theme
- Customize Your Fonts
- Try New Plot Types
- Bring it All Together (Advanced Data Visualization)
-
Quarto
- Advanced Markdown
- Advanced YAML and Code Chunk Options
- Tables
- Inline R Code
- Making Your Reports Shine: Word Edition
- Making Your Reports Shine: PDF Edition
- Making Your Reports Shine: HTML Edition
- Presentations
- Dashboards
- Websites
- Publishing Your Work
- Quarto Extensions
- Parameterized Reporting, Part 1
- Parameterized Reporting, Part 2
- Parameterized Reporting, Part 3
- Wrapping up Going Deeper with R
Going Deeper with R
Use Color to Highlight Findings
This lesson is locked
This lesson is called Use Color to Highlight Findings, part of the Going Deeper with R course. This lesson is called Use Color to Highlight Findings, part of the Going Deeper with R course.
If the video is not playing correctly, you can watch it in a new window
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Identify one school district that has had a lot of growth in its Hispanic/Latino population from 2017-2018 to 2018-2019
Create a new data frame called
highlight_district
and only include this district in itUse the
highlight_district
data frame to create a newgeom_line()
layer on top of the other dataMake sure this new layer is a bright color and all other layers are some type of light gray
Just FYI, to have R not use scientific notation, run this code: options(scipen = 999)
.
Learn More
If you want to see the list of all named colors in R, Gina Reynolds has put one together.
The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).
You need to be signed-in to comment on this post. Login.
Catherine Roller White
May 8, 2021
For some reason, R seems to be reading data as non-numeric (I think). I've been running this code:
enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic or Latino") %>% pivot_wider(id_cols = c(district, district_id), names_from = year, values_from = percent) %>% mutate(growth = "2018-2019" - "2017-2018")
I receive the error message: Error: Problem with
mutate()
inputgrowth
. x non-numeric argument to binary operator i Inputgrowth
is"2018-2019" - "2017-2018"
.Both the 2017-2018 and 2018-2019 variables show up as so I'm not sure why they are being interpreted as non-numeric. In case it's relevant, 2017-2018 and 2018-2019 both show up in green in my code, but it looks like they showed up in black in your code.
Vuk Sekicki
May 12, 2021
By the way any good and practical statistics course to recommend?
Lucilla Piccari
May 12, 2021
Hello! I get two straight vertical red lines at each side of the plot, in correspondence with the beginning and end of the period of time plotted. What can I do differently to not have that?
Here's my code:
enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic/Latino") %>% ggplot(aes(x = year, y = percent_of_total_in_school, group = district)) + geom_line(color = "gray67") + geom_line(data = highlight_district, inherit.aes = TRUE, color = "firebrick")
Atlang Mompe
June 24, 2021
Hi David, for some weird reason when I create the highlight district, I get 2 observations and 6 variables? Is there an error that I am doing? I notice the Douglas ESD has 14 observations when I filter the enrollment data.
Juan Clavijo
November 28, 2021
Hello! How can I use filter to select not just one district, but say the top 3 or top 5 districts with the largest growth so I can highlight those instead of just one?
Matt M
December 5, 2021
The code that shows up under the Solutions video is only a partial portion of the code needed to get to the solution. It omits the pivot_wider() and mutate() portions of the code that are shared in the Solutions video.
Elan Sykes
December 30, 2021
Could you nest a function in the data argument for ggplot, like a "highlight_district" function with an argument for district_id to allow you to recreate this chart many times? I tried to do this with highlight_district % filter(race_ethnicity == "race_ethnicity") %>% filter(district_id == "district_id") }
highlight_district("1980", "Hispanic/Latino")
But it didn't work on its own, let alone within a ggplot argument.
JULIO VERA DE LEON
May 4, 2022
Hi!
Could you please explain a little further what changes when you add the "`" symbol?
Thanks!
Rachel Nicholson
December 2, 2022
When I pivot_wider I am not getting a percent for the school year. I get for 2018-2019 and for 2017-2018. What am I doing wrong?
Rachel Nicholson
December 6, 2022
Here's my code. I'm getting a response that I have one dbl for 18-19 and two dbl for 17-18. I seem to have an extra row of 17-18 data but don't know how to get rid of it or how it got added. https://gist.github.com/AlyssaCarr/0ecf6ee0b5f91586213ae5a1710a5471
Hatem Kotb
January 14, 2023
Thanks for the note on statistical interpretation. Will tackle 'Inferential Statistics using R' after this. Also the list of colors in R link is useful! 😊🙏🏼
Zain Asaf
June 22, 2023
I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object
2018-2019
is not found. Here is the link to my code: https://github.com/zainasaf/zain_rin3 project/commit/7211a8f62f3ce24b0622a67334e196befee85898Zain Asaf
June 22, 2023
I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object
2018-2019
is not found. Here is the link to my code: https://gist.github.com/zainasaf/df9e2a8709660366e88e2f07ab6d1642