Skip to content
R for the Rest of Us Logo

Use Color to Highlight Findings

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  • Identify one school district that has had a lot of growth in its Hispanic/Latino population from 2017-2018 to 2018-2019

  • Create a new data frame called highlight_district and only include this district in it

  • Use the highlight_district data frame to create a new geom_line() layer on top of the other data

  • Make sure this new layer is a bright color and all other layers are some type of light gray

Just FYI, to have R not use scientific notation, run this code: options(scipen = 999).

Learn More

If you want to see the list of all named colors in R, Gina Reynolds has put one together.

The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Catherine Roller White

Catherine Roller White

May 8, 2021

For some reason, R seems to be reading data as non-numeric (I think). I've been running this code:

enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic or Latino") %>% pivot_wider(id_cols = c(district, district_id), names_from = year, values_from = percent) %>% mutate(growth = "2018-2019" - "2017-2018")

I receive the error message: Error: Problem with mutate() input growth. x non-numeric argument to binary operator i Input growth is "2018-2019" - "2017-2018".

Both the 2017-2018 and 2018-2019 variables show up as so I'm not sure why they are being interpreted as non-numeric. In case it's relevant, 2017-2018 and 2018-2019 both show up in green in my code, but it looks like they showed up in black in your code.

Vuk Sekicki

Vuk Sekicki

May 12, 2021

By the way any good and practical statistics course to recommend?

Lucilla Piccari

Lucilla Piccari

May 12, 2021

Hello! I get two straight vertical red lines at each side of the plot, in correspondence with the beginning and end of the period of time plotted. What can I do differently to not have that?

Here's my code:

enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic/Latino") %>% ggplot(aes(x = year, y = percent_of_total_in_school, group = district)) + geom_line(color = "gray67") + geom_line(data = highlight_district, inherit.aes = TRUE, color = "firebrick")

Atlang Mompe

Atlang Mompe

June 24, 2021

Hi David, for some weird reason when I create the highlight district, I get 2 observations and 6 variables? Is there an error that I am doing? I notice the Douglas ESD has 14 observations when I filter the enrollment data.

Juan Clavijo

Juan Clavijo

November 28, 2021

Hello! How can I use filter to select not just one district, but say the top 3 or top 5 districts with the largest growth so I can highlight those instead of just one?

The code that shows up under the Solutions video is only a partial portion of the code needed to get to the solution. It omits the pivot_wider() and mutate() portions of the code that are shared in the Solutions video.

Could you nest a function in the data argument for ggplot, like a "highlight_district" function with an argument for district_id to allow you to recreate this chart many times? I tried to do this with highlight_district % filter(race_ethnicity == "race_ethnicity") %>% filter(district_id == "district_id") }

highlight_district("1980", "Hispanic/Latino")

But it didn't work on its own, let alone within a ggplot argument.

JULIO VERA DE LEON

JULIO VERA DE LEON

May 4, 2022

Hi!

Could you please explain a little further what changes when you add the "`" symbol?

Thanks!

Rachel Nicholson

Rachel Nicholson

December 2, 2022

When I pivot_wider I am not getting a percent for the school year. I get for 2018-2019 and for 2017-2018. What am I doing wrong?

Rachel Nicholson

Rachel Nicholson

December 6, 2022

Here's my code. I'm getting a response that I have one dbl for 18-19 and two dbl for 17-18. I seem to have an extra row of 17-18 data but don't know how to get rid of it or how it got added. https://gist.github.com/AlyssaCarr/0ecf6ee0b5f91586213ae5a1710a5471

Thanks for the note on statistical interpretation. Will tackle 'Inferential Statistics using R' after this. Also the list of colors in R link is useful! 😊🙏🏼

Zain Asaf

Zain Asaf

June 22, 2023

I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object 2018-2019 is not found. Here is the link to my code: https://github.com/zainasaf/zain_rin3 project/commit/7211a8f62f3ce24b0622a67334e196befee85898

Zain Asaf

Zain Asaf

June 22, 2023

I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object 2018-2019 is not found. Here is the link to my code: https://gist.github.com/zainasaf/df9e2a8709660366e88e2f07ab6d1642