Lesson 22 of 33
In Progress

Scatterplots

Your Turn

Complete the scatterplot sections of the data-visualization-exercises.Rmd file.

Solutions

Video

Code

Learn More

Scatterplot Resources

Claus Wilke talks about scatterplots in Chapter 12 of his book Fundamentals of Data Visualization. Michael Toth also has a long blog post about all of the ins and outs of making scatterplots in ggplot.

You can also find examples of code to make scatterplots on the Data to Viz website, the R Graph Gallery website, and in Chapter 5 of the R Graphics Cookbook.

General ggplot2 Resources

Start with chapter 3 of R for Data Science by Hadley Wickham, which shows the basics of ggplot2. The RStudio primers on visualizing data provide another great place to get started.

To see ggplot2 in action, check out the ggplot2 flipbook by Gina Reynolds, which shows each step in building various plots. Both the R Graph Gallery and From Data to Viz show examples of plots and provide code to make them.

Two great books on the fundamentals of data visualization that include ggplot2 code are Fundamentals of Data Visualization by Claus Wilke and Data Visualization: A Practical Introduction by Kieran Healy.

Another good reference book is the R Graphics Cookbook by Winston Chang, which has “more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems.”

Finally, I recommend you print out and keep the ggplot cheatsheet handy. I have one by my desk and use it all the time!

Have any questions? Put them below and we will help you out!

    1. That’s super weird. It seems that the farver package, which I believe is a dependency of ggplot (i.e. a package that ggplot relies on), didn’t get installed. Try installing it with install.packages('farver') and let me know if it works after that!

  1. Hi David,

    Here are 2 versions of code that both produce the scatterplot of height vs weight from nhanes dataset. The first is from your solutions, and the second if following r4ds text. Can you briefly comment on why they both “work”? Is one better than another?

    ggplot(data = nhanes,
    mapping = aes(x = weight,
    y = height)) +
    geom_point()

    ggplot(data = nhanes) +
    geom_point(mapping = aes(x = weight, y = height))

  2. You mentioned that ggplot will automatically remove observations with missing data. If I’m plotting average test scores for mid-term and final exams, for example, and one student took the final but did not take the mid-term, will ggplot remove that student’s data from the graph completely, or will it just plot the final exam and omit the mid-term score that does not exist?

  3. It seems like the clean_names function didn’t work for me–when I start typing the code for the scatterplot, it isn’t suggesting the variable names. This is what I put for clean_names
    “`{r}
    nhanes %
    clean_names()
    “`
    And then I got this message (which looks different from what you got)

    Rows: 10000 Columns: 22── Column specification ──────────────────────────────────────────────────────────────────────────────
    Delimiter: “,”
    chr (13): SurveyYr, Gender, AgeDecade, Race1, Education, MaritalStatus, HHIncome, HomeOwn, Work, H…
    dbl (9): ID, Age, Weight, Height, BMI, DaysPhysHlthBad, DaysMentHlthBad, SleepHrsNight, PhysActiv…
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    1. Hello Ellen,

      It looks like you’ve mistyped the pipe, in the code you provided you have:


      nhanes %
      clean_names()

      Whereas the pipe should be written as %>%, ie


      nhanes %>%
      clean_names()

      Please also not that you will need to make an assignment if you want the effect of clean_names() to be applied to the object, when we run code like this the only outcome we have is printing the output to the console. Assignments are how we change objects.

      Thanks,

      Charlie