Skip to content
R for the Rest of Us Logo

Use Color to Highlight Findings

This lesson is locked

Get access to all lessons in this course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  • Identify one school district that has had a lot of growth in its Hispanic/Latino population from 2017-2018 to 2018-2019

  • Create a new data frame called highlight_district and only include this district in it

  • Use the highlight_district data frame to create a new geom_line() layer on top of the other data

  • Make sure this new layer is a bright color and all other layers are some type of light gray

Just FYI, to have R not use scientific notation, run this code: options(scipen = 999).

Learn More

If you want to see the list of all named colors in R, Gina Reynolds has put one together.

The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Catherine Roller White

Catherine Roller White

May 8, 2021

For some reason, R seems to be reading data as non-numeric (I think). I've been running this code:

enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic or Latino") %>% pivot_wider(id_cols = c(district, district_id), names_from = year, values_from = percent) %>% mutate(growth = "2018-2019" - "2017-2018")

I receive the error message: Error: Problem with mutate() input growth. x non-numeric argument to binary operator i Input growth is "2018-2019" - "2017-2018".

Both the 2017-2018 and 2018-2019 variables show up as so I'm not sure why they are being interpreted as non-numeric. In case it's relevant, 2017-2018 and 2018-2019 both show up in green in my code, but it looks like they showed up in black in your code.

David Keyes

David Keyes

May 8, 2021

Try replacing the " around 2018-2019 and 2017-2018 with `. The last line should become:

mutate(growth = 2018-20192017-2018)

The ` is used for non-syntactically correct variable names (e.g. ones that start with a number). The " makes R treat the text within it a string. So R was complaining that it didn't know how to subtract the text 2017-2018 from the text 2018-2019. Does that make sense?

Catherine Roller White

Catherine Roller White

May 8, 2021

Excellent! I was trying single quotes and double quotes, but I didn't think to use ` (I don't even know what to call that symbol.) Thanks so much!

what key on the keyboard is ` ? I've tried

  1. the apostrophe ' (which changes the font to green)

  2. the ` which is next the same key as the tilde next to the 1 at the top.

  3. pasting in your suggested code. But I get errors on both.

  4. ' the apostrophe error mutate(growth = '2018-2019' - '2017-2018') "Error: Problem with mutate() column growth. i growth = "2018-2019" - "2017-2018". x non-numeric argument to binary operator"

  5. by the tilde mutate(growth =2018-2019-2017-2018) Error: Problem with mutate()columngrowth. i growth = 2018-2019 - `2017-2018``. x object '2018-2019' not found

  6. your pasted code mutate(growth = 2018-20192017-2018) Error: unexpected input in: " values_from = 'Percent of Total')%>% mutate(growth = 2018-2019 –"

Full code chunk: enrollmment_by_race_ethnicity_wide % filter((race_ethnicity == "Hispanice/Latino")) %>% pivot_wider(id_cols = c(district, district_id, race_ethnicity), names_from = year, values_from = 'Percent of Total')%>% mutate(growth = 2018-20192017-2018)

David Keyes

David Keyes

December 6, 2021

It should be the ` on the key that is also for the tilde. Does that not work for you? If not, are you sure you have variables called 2018-2019 and 2017-2018 in your code?

In seeing my comment above, I see now that the errors were that:

  1. I had misspelled Hispanic in the filter and
  2. I was not using the minus from the number pad, but the - from the hyphen. It runs now. Thanks.

Vuk Sekicki

Vuk Sekicki

May 12, 2021

By the way any good and practical statistics course to recommend?

Lucilla Piccari

Lucilla Piccari

May 12, 2021

Hello! I get two straight vertical red lines at each side of the plot, in correspondence with the beginning and end of the period of time plotted. What can I do differently to not have that?

Here's my code:

enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic/Latino") %>% ggplot(aes(x = year, y = percent_of_total_in_school, group = district)) + geom_line(color = "gray67") + geom_line(data = highlight_district, inherit.aes = TRUE, color = "firebrick")

David Keyes

David Keyes

May 13, 2021

I'm not getting that when I run your code. Can you create a screenshot of your issue? Feel free to upload to imgur.com and post a link.

Timothy Ewers

Timothy Ewers

May 13, 2021

I had this same problem but then went back and changed the highlight_district object. At first I filtered only for the district. But, when I added the filter for Hispanic/Latino, the vertical lines did not appear.

highlight_district % filter(district == "Douglas ESD") %>% filter(race_ethnicity == "Hispanic/Latino")

Lucilla Piccari

Lucilla Piccari

May 13, 2021

Thank you Timothy, this worked perfectly!

Atlang Mompe

Atlang Mompe

June 24, 2021

Hi David, for some weird reason when I create the highlight district, I get 2 observations and 6 variables? Is there an error that I am doing? I notice the Douglas ESD has 14 observations when I filter the enrollment data.

Atlang Mompe

Atlang Mompe

June 24, 2021

Ignore, I just saw the rest of the video haha sorry for the false alarms

Juan Clavijo

Juan Clavijo

November 28, 2021

Hello! How can I use filter to select not just one district, but say the top 3 or top 5 districts with the largest growth so I can highlight those instead of just one?

A couple options: if you want to select the top 3 districts, say, I'd use the slice_max() function and set the n argument to 3. If you want to specify the districts by name I'd use the %in% within a filter() (e.g. filter(district %in% c("Portland", "Beaverton", "Gresham")). Let me know if that answers your question!

The code that shows up under the Solutions video is only a partial portion of the code needed to get to the solution. It omits the pivot_wider() and mutate() portions of the code that are shared in the Solutions video.

Could you nest a function in the data argument for ggplot, like a "highlight_district" function with an argument for district_id to allow you to recreate this chart many times? I tried to do this with highlight_district % filter(race_ethnicity == "race_ethnicity") %>% filter(district_id == "district_id") }

highlight_district("1980", "Hispanic/Latino")

But it didn't work on its own, let alone within a ggplot argument.

Definitely possible!

Here's a video walkthrough: https://show.rfor.us/xyZgAX And here's a link to the code I created: https://gist.github.com/dgkeyes/eec7fcbd13a24cd19e46bcb320bbcb42

Hope that helps!

JULIO VERA DE LEON

JULIO VERA DE LEON

May 4, 2022

Hi!

Could you please explain a little further what changes when you add the "`" symbol?

Thanks!

Charlie Hadley

Charlie Hadley

May 5, 2022

In programming languages we need to ensure that things are "syntactically valid" otherwise code won't work. For instance variable names and column names can't contain spaces in R.

Unless! We wrap the column name in backticks. That allows us to have column names that aren't syntactically valid otherwise. This includes column names that begin with numbers. That's why we write 2018-2019

Rachel Nicholson

Rachel Nicholson

December 2, 2022

When I pivot_wider I am not getting a percent for the school year. I get for 2018-2019 and for 2017-2018. What am I doing wrong?

David Keyes

David Keyes

December 2, 2022

Can you post your code to a gist and share the link so I can see it?

Rachel Nicholson

Rachel Nicholson

December 6, 2022

I tried a few different things and eventually used "distinct" to eliminate the duplicates and now it's working great. Don't know if that's ideal but it worked.

Rachel Nicholson

Rachel Nicholson

December 6, 2022

Here's my code. I'm getting a response that I have one dbl for 18-19 and two dbl for 17-18. I seem to have an extra row of 17-18 data but don't know how to get rid of it or how it got added. https://gist.github.com/AlyssaCarr/0ecf6ee0b5f91586213ae5a1710a5471

Thanks for the note on statistical interpretation. Will tackle 'Inferential Statistics using R' after this. Also the list of colors in R link is useful! 😊🙏🏼

Zain Asaf

Zain Asaf

June 22, 2023

I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object 2018-2019 is not found. Here is the link to my code: https://github.com/zainasaf/zain_rin3 project/commit/7211a8f62f3ce24b0622a67334e196befee85898

Zain Asaf

Zain Asaf

June 22, 2023

I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object 2018-2019 is not found. Here is the link to my code: https://gist.github.com/zainasaf/df9e2a8709660366e88e2f07ab6d1642

David Keyes

David Keyes

June 28, 2023

It looks like you have a typo in your code. Check out the last line and replace 2018-19 with 2018-2019 and 2017-18 with 2017-2018. That should fix it!