# Use Color to Highlight Findings

## This lesson is locked

### Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

• Identify one school district that has had a lot of growth in its Hispanic/Latino population from 2017-2018 to 2018-2019

• Create a new data frame called `highlight_district` and only include this district in it

• Use the `highlight_district` data frame to create a new `geom_line()` layer on top of the other data

• Make sure this new layer is a bright color and all other layers are some type of light gray

Just FYI, to have R not use scientific notation, run this code: `options(scipen = 999)`.

If you want to see the list of all named colors in R, Gina Reynolds has put one together.

The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).

## Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

#### Catherine Roller White

For some reason, R seems to be reading data as non-numeric (I think). I've been running this code:

enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic or Latino") %>% pivot_wider(id_cols = c(district, district_id), names_from = year, values_from = percent) %>% mutate(growth = "2018-2019" - "2017-2018")

I receive the error message: Error: Problem with `mutate()` input `growth`. x non-numeric argument to binary operator i Input `growth` is `"2018-2019" - "2017-2018"`.

Both the 2017-2018 and 2018-2019 variables show up as so I'm not sure why they are being interpreted as non-numeric. In case it's relevant, 2017-2018 and 2018-2019 both show up in green in my code, but it looks like they showed up in black in your code.

#### David Keyes

Try replacing the " around 2018-2019 and 2017-2018 with `. The last line should become:

mutate(growth = `2018-2019``2017-2018`)

The ` is used for non-syntactically correct variable names (e.g. ones that start with a number). The " makes R treat the text within it a string. So R was complaining that it didn't know how to subtract the text 2017-2018 from the text 2018-2019. Does that make sense?

#### Catherine Roller White

Excellent! I was trying single quotes and double quotes, but I didn't think to use ` (I don't even know what to call that symbol.) Thanks so much!

#### Matt M

what key on the keyboard is ` ? I've tried

1. the apostrophe ' (which changes the font to green)

2. the ` which is next the same key as the tilde next to the 1 at the top.

3. pasting in your suggested code. But I get errors on both.

4. ' the apostrophe error mutate(growth = '2018-2019' - '2017-2018') "Error: Problem with `mutate()` column `growth`. i `growth = "2018-2019" - "2017-2018"`. x non-numeric argument to binary operator"

5. `by the tilde mutate(growth =`2018-2019`-`2017-2018`) Error: Problem with `mutate()`column`growth`. i `growth = `2018-2019` - `2017-2018``. x object '2018-2019' not found

6. your pasted code mutate(growth = `2018-2019``2017-2018`) Error: unexpected input in: " values_from = 'Percent of Total')%>% mutate(growth = `2018-2019` –"

Full code chunk: enrollmment_by_race_ethnicity_wide % filter((race_ethnicity == "Hispanice/Latino")) %>% pivot_wider(id_cols = c(district, district_id, race_ethnicity), names_from = year, values_from = 'Percent of Total')%>% mutate(growth = `2018-2019``2017-2018`)

#### David Keyes

It should be the ` on the key that is also for the tilde. Does that not work for you? If not, are you sure you have variables called 2018-2019 and 2017-2018 in your code?

#### Matt M

In seeing my comment above, I see now that the errors were that:

1. I had misspelled Hispanic in the filter and
2. I was not using the minus from the number pad, but the - from the hyphen. It runs now. Thanks.

#### Vuk Sekicki

By the way any good and practical statistics course to recommend?

#### Lucilla Piccari

Hello! I get two straight vertical red lines at each side of the plot, in correspondence with the beginning and end of the period of time plotted. What can I do differently to not have that?

Here's my code:

enrollment_by_race_ethnicity %>% filter(race_ethnicity == "Hispanic/Latino") %>% ggplot(aes(x = year, y = percent_of_total_in_school, group = district)) + geom_line(color = "gray67") + geom_line(data = highlight_district, inherit.aes = TRUE, color = "firebrick")

#### David Keyes

I'm not getting that when I run your code. Can you create a screenshot of your issue? Feel free to upload to imgur.com and post a link.

#### Timothy Ewers

I had this same problem but then went back and changed the highlight_district object. At first I filtered only for the district. But, when I added the filter for Hispanic/Latino, the vertical lines did not appear.

highlight_district % filter(district == "Douglas ESD") %>% filter(race_ethnicity == "Hispanic/Latino")

#### Lucilla Piccari

Thank you Timothy, this worked perfectly!

#### Atlang Mompe

Hi David, for some weird reason when I create the highlight district, I get 2 observations and 6 variables? Is there an error that I am doing? I notice the Douglas ESD has 14 observations when I filter the enrollment data.

#### Atlang Mompe

Ignore, I just saw the rest of the video haha sorry for the false alarms

#### Juan Clavijo

Hello! How can I use filter to select not just one district, but say the top 3 or top 5 districts with the largest growth so I can highlight those instead of just one?

#### David Keyes

A couple options: if you want to select the top 3 districts, say, I'd use the `slice_max()` function and set the n argument to 3. If you want to specify the districts by name I'd use the %in% within a `filter()` (e.g. `filter(district %in% c("Portland", "Beaverton", "Gresham"))`. Let me know if that answers your question!

#### Matt M

The code that shows up under the Solutions video is only a partial portion of the code needed to get to the solution. It omits the pivot_wider() and mutate() portions of the code that are shared in the Solutions video.

#### Elan Sykes

Could you nest a function in the data argument for ggplot, like a "highlight_district" function with an argument for district_id to allow you to recreate this chart many times? I tried to do this with highlight_district % filter(race_ethnicity == "race_ethnicity") %>% filter(district_id == "district_id") }

highlight_district("1980", "Hispanic/Latino")

But it didn't work on its own, let alone within a ggplot argument.

#### David Keyes

Definitely possible!

Here's a video walkthrough: https://show.rfor.us/xyZgAX And here's a link to the code I created: https://gist.github.com/dgkeyes/eec7fcbd13a24cd19e46bcb320bbcb42

Hope that helps!

#### JULIO VERA DE LEON

Hi!

Could you please explain a little further what changes when you add the "`" symbol?

Thanks!

In programming languages we need to ensure that things are "syntactically valid" otherwise code won't work. For instance variable names and column names can't contain spaces in R.

Unless! We wrap the column name in backticks. That allows us to have column names that aren't syntactically valid otherwise. This includes column names that begin with numbers. That's why we write `2018-2019`

#### Rachel Nicholson

When I pivot_wider I am not getting a percent for the school year. I get for 2018-2019 and for 2017-2018. What am I doing wrong?

#### David Keyes

Can you post your code to a gist and share the link so I can see it?

#### Rachel Nicholson

I tried a few different things and eventually used "distinct" to eliminate the duplicates and now it's working great. Don't know if that's ideal but it worked.

#### Rachel Nicholson

Here's my code. I'm getting a response that I have one dbl for 18-19 and two dbl for 17-18. I seem to have an extra row of 17-18 data but don't know how to get rid of it or how it got added. https://gist.github.com/AlyssaCarr/0ecf6ee0b5f91586213ae5a1710a5471

#### Hatem Kotb

Thanks for the note on statistical interpretation. Will tackle 'Inferential Statistics using R' after this. Also the list of colors in R link is useful! 😊🙏🏼

#### Zain Asaf

I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object `2018-2019` is not found. Here is the link to my code: https://github.com/zainasaf/zain_rin3 project/commit/7211a8f62f3ce24b0622a67334e196befee85898

#### Zain Asaf

I am having issues with creating a variable for growth in Hispanic population from 2017-2018 to 2018-2019. I use the backticks, but still get an error message that the object `2018-2019` is not found. Here is the link to my code: https://gist.github.com/zainasaf/df9e2a8709660366e88e2f07ab6d1642

#### David Keyes

It looks like you have a typo in your code. Check out the last line and replace 2018-19 with 2018-2019 and 2017-18 with 2017-2018. That should fix it!