Use Color to Highlight Findings
This lesson is called Use Color to Highlight Findings, part of the R in 3 Months (Fall 2022) course. This lesson is called Use Color to Highlight Findings, part of the R in 3 Months (Fall 2022) course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Identify one school district that has had a lot of growth in its Hispanic/Latino population from 2017-2018 to 2018-2019
Create a new data frame called
highlight_district
and only include this district in itUse the
highlight_district
data frame to create a newgeom_line()
layer on top of the other dataMake sure this new layer is a bright color and all other layers are some type of light gray
Just FYI, to have R not use scientific notation, run this code: options(scipen = 999)
.
Learn More
If you want to see the list of all named colors in R, Gina Reynolds has put one together.
The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).
Have any questions? Put them below and we will help you out!
Course Content
142 Lessons
You need to be signed-in to comment on this post. Login.
Catherine Roller White • May 8, 2021
For some reason, R seems to be reading data as non-numeric (I think). I've been running this code:
enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic or Latino") %>% pivot_wider(id_cols = c(district, district_id), names_from = year, values_from = percent) %>% mutate(growth = "2018-2019" - "2017-2018")
I receive the error message: Error: Problem with
mutate()
inputgrowth
. x non-numeric argument to binary operator i Inputgrowth
is"2018-2019" - "2017-2018"
.Both the 2017-2018 and 2018-2019 variables show up as so I'm not sure why they are being interpreted as non-numeric. In case it's relevant, 2017-2018 and 2018-2019 both show up in green in my code, but it looks like they showed up in black in your code.
Vuk Sekicki • May 12, 2021
By the way any good and practical statistics course to recommend?
Lucilla Piccari • May 12, 2021
Hello! I get two straight vertical red lines at each side of the plot, in correspondence with the beginning and end of the period of time plotted. What can I do differently to not have that?
Here's my code:
enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic/Latino") %>% ggplot(aes(x = year, y = percent_of_total_in_school, group = district)) + geom_line(color = "gray67") + geom_line(data = highlight_district, inherit.aes = TRUE, color = "firebrick")
Atlang Mompe • June 24, 2021
Hi David, for some weird reason when I create the highlight district, I get 2 observations and 6 variables? Is there an error that I am doing? I notice the Douglas ESD has 14 observations when I filter the enrollment data.
Juan Clavijo • November 28, 2021
Hello! How can I use filter to select not just one district, but say the top 3 or top 5 districts with the largest growth so I can highlight those instead of just one?
Matt M • December 5, 2021
The code that shows up under the Solutions video is only a partial portion of the code needed to get to the solution. It omits the pivot_wider() and mutate() portions of the code that are shared in the Solutions video.
Elan Sykes • December 30, 2021
Could you nest a function in the data argument for ggplot, like a "highlight_district" function with an argument for district_id to allow you to recreate this chart many times? I tried to do this with highlight_district % filter(race_ethnicity == "race_ethnicity") %>% filter(district_id == "district_id") }
highlight_district("1980", "Hispanic/Latino")
But it didn't work on its own, let alone within a ggplot argument.
JULIO VERA DE LEON • May 4, 2022
Hi!
Could you please explain a little further what changes when you add the "`" symbol?
Thanks!
Alyssa Carr • December 2, 2022
When I pivot_wider I am not getting a percent for the school year. I get for 2018-2019 and for 2017-2018. What am I doing wrong?
Alyssa Carr • December 6, 2022
Here's my code. I'm getting a response that I have one dbl for 18-19 and two dbl for 17-18. I seem to have an extra row of 17-18 data but don't know how to get rid of it or how it got added. https://gist.github.com/AlyssaCarr/0ecf6ee0b5f91586213ae5a1710a5471
Hatem Kotb • January 14, 2023
Thanks for the note on statistical interpretation. Will tackle 'Inferential Statistics using R' after this. Also the list of colors in R link is useful! 😊🙏🏼
Zain Asaf • June 22, 2023
I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object
2018-2019
is not found. Here is the link to my code: https://github.com/zainasaf/zain_rin3 project/commit/7211a8f62f3ce24b0622a67334e196befee85898Zain Asaf • June 22, 2023
I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object
2018-2019
is not found. Here is the link to my code: https://gist.github.com/zainasaf/df9e2a8709660366e88e2f07ab6d1642