Skip to content
R for the Rest of Us Logo

Using AI with R

How to Use AI Effectively

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
library(tidyverse)
library(elmer)


# {elmer} -----------------------------------------------------------------

chat <- chat_openai()

chat$chat("What is the easiest way to make a histogram in R?")

# System prompts with {elmer} ---------------------------------------------

chat <- chat_openai(
  system_prompt = "You are an expert R programmer who prefers the tidyverse."
)

chat$chat("What is the easiest way to make a histogram?")

# {pal} -------------------------------------------------------------------

library(pal)
library(ggpal2)

options(
  .pal_fn = "chat_openai"
)

ggplot(data, aes(x = variable)) +
geom_histogram(binwidth = width, fill = "lightblue", color = "black") +
  labs(title = "Histogram Title", 
       x = "X-axis Label", 
       y = "Frequency") +
  scale_fill_viridis_c()

tibble(data = c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5)) |>
  ggplot(aes(x = data)) +
  geom_histogram(binwidth = 1, 
                 fill = "lightblue", 
                 color = "black") +
  labs(title = "Histogram of Sample Data",
       x = "Values",
       y = "Frequency")
# This code is a small R script that creates a histogram using the `ggplot2` package. Let's go through it step by step:
# 
# 1. **Creating a tibble:**
#    - `tibble(data = c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5))`: This line creates a `tibble`, which is a modern version of a data frame provided by the `tibble` package. It is essentially used for storing data in a tabular format.
#    - The `tibble` here contains a single column named `data` with the specified sequence of numbers.
# 
# 2. **Pipe operator (|>):**
#    - The pipe operator `|>` (introduced in R 4.1.0) passes the `tibble` created in the first step as input to the following function. This helps streamline the operations and makes the code more readable.
# 
# 3. **Creating a ggplot2 plot:**
#    - `ggplot(aes(x = data))`: This initializes a `ggplot` object and specifies that the `x`-axis aesthetic will use the `data` column from the `tibble`.
# 
# 4. **Adding a histogram layer:**
#    - `geom_histogram(...)`: This adds a histogram to the plot with the following specifications:
#      - `binwidth = 1`: This sets the width of each bin (interval) to 1. It will group all the values within each unit into a single bin.
#      - `fill = "lightblue"`: This specifies that the bars of the histogram should be filled with a light blue color.
#      - `color = "black"`: This outlines the bars with a black color.
# 
# 5. **Adding labels:**
#    - `labs(...)`: This adds labels to the plot to enhance readability.
#      - `title = "Histogram of Sample Data"`: Sets the main title of the plot.
#      - `x = "Values"`: Labels the x-axis with the name "Values".
#      - `y = "Frequency"`: Labels the y-axis with the name "Frequency", indicating that the y-axis represents the frequency of each value.
# 
# In summary, the code creates a histogram that visualizes the frequency of each number in the given `data` vector. The plot will have bars colored light blue


# Make my own pal ---------------------------------------------------------

prompt_new(
  "explaincode",
  interface = "suffix"
)

directory_load()

prompt_new(
  "debugcode",
  interface = "suffix"
)

tibble(data = c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5)) |>
  ggplot(aes(x = data)) +
  geom_histogam(binwidth = 1, 
                 fill = "lightblue", 
                 color = "black") +
  labs(title = "Histogram of Sample Data",
       x = "Values",
       y = "Frequency")



# Caveat ------------------------------------------------------------------

# Write code in tidycensus to bring in the percentage of households below 200% poverty level in each census tract in Michigan

library(tidycensus)

mi_poverty <-
  get_acs(
    geography = "tract",
    table = c(
      below_200 = "B17002_002", # Count below 200% poverty
      total = "B17002_001" # Total population for poverty status
    ),
    state = "MI",
    year = 2022,
    geometry = FALSE
  ) |>
  # Pivot wider to get columns side by side
  pivot_wider(
    names_from = "variable",
    values_from = "estimate"
  ) |>
  # Calculate percentage
  mutate(
    pct_below_200 = (below_200 / total) * 100
  ) |>
  # Clean up the data
  select(GEOID, NAME, pct_below_200)

Learn More

To learn more about writing good AI prompts, the article Getting started with AI: Good enough prompting by Ethan Mollick is really useful, as is the {ellmer} vignette on prompt design.

The {pal} package has great documentation on how to use it. The vignette on custom pals is useful if you want to go that route.

And if you want to explore the {ggpal2} package, it is available on GitHub.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Alberto Cabrera

Alberto Cabrera • March 9, 2025

I am following your lesson 2 in regards to using ellmer. I was able to secure a key, which I pasted in the OPEN_AI_KEY in the file as shown below in the Renviron file (I provided the fist sequence of numbers of the key).

OPEN_AI_KEY = "sk-proj-E8WPD-...."

Next I restarted the session, and followed your instructions about uploading the packages and the object chat <- chat <- chat_openai()

I received the following error message:

chat <- chat_openai() Using model = "gpt-4o". Error in openai_key(): ! Can't find env var OPENAI_API_KEY. Run rlang::last_trace() to see where the error occurred.

After that I was unable to run Chat. Subsequently I learned that Chores requires configuring an ellmer chat and even modifying the ~/.Rprofile. Needless to say, I am not a programer and could not figure it out all the meaning of the error messages and how to act upon them.