R for the Rest of Us Community
Public / Community
Public / Community
This is a place to ask questions and get help along the way on your R journey.
In addition to discussions of general questions, you’ll see threads for office hours. These are twice-monthly sessions to help you get unstuck. Ask questions and get live answers from me as well as guest experts. All code for office hours can be found here.
Not yet a member? If you have an account on the R for the Rest of Us website and are logged in, just click Join Group on the top right of this page. If you need to create an account, you can sign up here.
Office Hours Megathread
-
I’ve decided to centralize all office hours discussions here. If you have topics you’d like to discuss during office hours, add them below!
-
-
Hi David,
Happy New Year! Hope you had a wonderful holiday. Quick question regarding iterating to Word from a Markdown template. If I leave a parameter set in my Markdown document (i.e. State = “Washington”) and then run the iteration R script to create reports for all states – will this create an issue? For example, might data unique to Washington find its way into the reports for all other states? It doesn’t seem to, but I’d welcome your response.
Thanks!
-
Great question, Clint! Yes, the parameter in the YAML doesn’t apply if you knit through an external R script. It only applies if you are knitting by hitting the knit button. Hope that answers your question!
-
Great! Thanks, David. The reason I asked is because it appears I need to run my Markdown file first in order to obtain a list of “states” (or programs in my case) before using them as the parameters in the render function in my iteration R script. Is this a recommended workflow? Thanks again.
-
This reply was modified 3 months, 2 weeks ago by
clint.thomson.
-
This reply was modified 3 months, 2 weeks ago by
-
I would just separate it out so that you get the list of states/programs both in your RMarkdown file and in the separate render.R file. Just copy the code to read in the data from which you get the list of programs in two places. Does that make sense?
-
-
On October 23, we discussed a range of topics:
- Ordering in plots vs ordering in tables (4:27)
- How to set the column names when making a table (32:00)
- Custom pagedown templates (see an example here) (37:28)
- Making parameterized reports (see also this blog post) (40:55)
The recording of the session can be found below.
-
On November 6, we’ll discuss creating a custom pagedown template. I’ll show one that I’ve developed and talk about how you can customize it for your organization.
Other topics to discuss? Comment below!
-
Hi David, Quick update on the flextable formatting. I ran with your suggestions and produced the code below.
data <- tibble(
survey_item = c("Item #1 has to do with food people like to eat", "Item #2 is all about travel and places people like to go", "Item #3 is about movies people like to watch at home and used to see in theatres"),
percent_program_agree = c(0.78, 0.6, 0.42),
n_program_agree = c(45, 34, 46)
) %>%
mutate(agree_text = str_c(percent(percent_program_agree),
" (n=",
n_program_agree,
")"))
data
ft1 <- data %>%
flextable(col_keys = c("survey_item", "code", "agree_text")) %>%
bg(j = "code",
i = ~ percent_program_agree >= 0.6,
bg = "blue") %>%
bg(j = "code",
i = ~ percent_program_agree < 0.6,
bg = "red") %>%
width(j = "code", width = .05) %>%
width(j = "agree_text", width = 1) %>%
width(j = "survey_item", width = 4) %>%
align(j = c("survey_item", "agree_text"), align = "left", part = "all") %>%
padding(j = "code", padding.top = 1, part = "body")
ft1
What I’d hope to get happening pertains to the newly-created “code” column. Is there any way to put spaces between the color codes between rows? I tried to use padding but to no avail. If there might be some additional time to discuss this, I’d really appreciate it!
-
This reply was modified 5 months, 2 weeks ago by
clint.thomson.
-
This reply was modified 5 months, 2 weeks ago by
-
-
Hi folks, I enjoyed today’s session. You can find that below.
We talked about creating a custom pagedown template. You can see the code that I created here (just look at the pagedown-template.Rmd file and the associated CSS files). Feel free to copy those and adapt.
We also talked about conditional formatting of columns using the flextable package. I hope that was helpful!
-
The next R for the Rest of Us office hours session will be this Friday, November 20 at 10:00am Pacific time. I’ve got two things I want to go over (but please also submit ideas below and I’ll do my best to cover them):
Heather Lewis, a primary school teacher, has asked how she could use R to produce regular reports on her students’ progress. I’ll take some sample data she has sent me and generate an RMarkdown report, which she will then be able to rerun at any point when she has new data. In addition to providing a refresher on RMarkdown (or an intro if you’re new to it!), I’ll go over a couple concepts that should be helpful even if you don’t work in education: pivoting data from wide to long and using the newish across() function in the dplyr package.
The second thing I’ll go over is recreating one of the most interesting visualizations I saw in the wake of the US presidential election. This New York Times visualization shows the the swing in county-level votes for president from 2016 to 2020.
-
Hi David, this got my attention as I’ve been wondering about how to make the transition from (sometimes data intensive) crystal reports to Markdown in my organisation. There could be anything from dozens to tens of thousands of lines in the output when the crystal report is opened in Excel.
Is it possible to use Markdown for this volume of data, and is there any way of being able to filter or play around with Markdown output as is easily possible in Excel?
Would be great to be able to set up reproducible Markdown reports to replace crystal development for each new report.
I will try to join the call, but I’m 8 hours ahead in Ireland, so 6pm will be giddy kids time so may have to catch up on it after.
Thanks a million,
Paul
-
Hope you can join, Paul! I totally understand giddy kids, though (I’ve got 4 year-old twins). The session will be recorded so you can watch that later.
I’ll do some future sessions at earlier times to be more accessible for you and others!
-
To your question, I’ll show you how you can export the data from RMarkdown to Excel files for people who want to see those.
-
-
Here’s our session from today!
I went over a question from Heather Lewis about how R can help her to improve the efficiency of her workflow. I demonstrated how she could use RMarkdown to automatically generate reports on her students’ progress. In the process of doing this, I demonstrated pivoting data from wide to long in addition to showing how RMarkdown works. The code I generated for this report is here.
In addition, I demonstrated how to recreate this New York Times visualization that shows the the swing in county-level votes for president from 2016 to 2020. It was a bit more involved than I expected! You can watch this starting at 56:30 and you can see my code here.
-
It was great talking with everyone today.
We started off talking about a few tips based on the data that @jordan-trachtenberg shared, including:
- Separating
filter()
functions into multiple lines to make debugging easier - The %!in% function from hrbrmisc (and a great tip from @francis-barton and @milan-sherman on using ! combined with %in% to filter all items not in a group)
- Using the .keep_all argument from the
distinct()
function.
We also discussed making functions (at around 22:00). We did this using an example from this project I’ve been working on related to wildfires here in Oregon (the rendered version of this is here). The RMarkdown file I showed is here (cc: @meena-patil ). I also have a blog post about making your own functions, which is a good place to start if you’re new to the concept.
Finally, we worked on helping @brandi.collins work with her data (around 41:00). We worked on reshaping her data to make it tidy in order to make analysis of it simpler. The code we created is here. If you want to read more about tidy data in general, check out this article.
https://vimeo.com/487375375/2f13c0fea1
- Separating
-
Hi David!
Hope this finds you well. I am looking forward to attending your office hours tomorrow. I do have one question which sounds like it might fit with your topic:
Lately, I’ve been generating lots of summary tables. One aspect varying from table to table is the variables I enter into the group_by function. For example, if you were working with Census data, you may wish to have:
-Table 1 reporting average age and % unemployment by State
-Table 2 reporting reporting average age and % unemployment by State and County
-Table 3 reporting reporting average age and % unemployment by State, County, and City
If the group by variables are different for each table but the statistics reported are otherwise identical – could you use a single function to generate the three tables mentioned above?
I’ve tried to create some basic code to conceptually illustrate what I’m hoping to do:
data %>%
group_by(state, county, city)
myfunction <- function(group_by_vars) {
data %>%
group_by(group_by_vars)
}
myfunction(group_by_vars = state)
myfunction(group_by_vars = c(state, county))
myfunction(group_by_vars = c(state, county, city))
If you have any existing resources that touch upon this, please do direct me to them. I look forward to exploring some solutions to this question soon!
Take care,
Clint
-
Yes, this is a great question! We can definitely discuss this tomorrow.
If you want to read up on this before tomorrow, check this out.
-
Hi folks, here’s a recap of our session today.
We began by making a function to gather race/ethnicity data from the American Community Survey. We adapted the get_acs() function from the tidycensus package in a way that will automatically bring in this data whenever I need it. We discussed using the … to pass arguments from a user-generated function to one in another package. Code for this is here.
We then talked about using variable names as arguments in functions. We created a function that can pass one or more variables to use to along with a group_by(). Code here.
We then talked about using the flextable package and used this as an example of how to learn new packages.
The full video is below for your viewing pleasure!
-
Hi David,
Quick question. If I have a dataset of 40+ variables and several rows that are entirely NA, how I can create a variable that will easily identify these rows so that I can remove them? I suppose this would be similar to using the COUNTA function in Excel and removing all with a COUNTA of 0.
Thanks!
-
Great question that I can go over on Friday! If you want to get a head start, check out the get_dupes()
and remove_empty() functions from the janitor package. -
Hi David,
Thanks! I’ll give these a try. Unfortunately, I may not be able to attend the session on Friday – will the recording be posted afterward? If you could go over how to remove entirely blank rows, and rows that are blank save for perhaps an ID code (and how the remove these rows), that would be great. This would be a useful trick to learn.
Take care,
Clint
-
Yup, I’ll post it in this thread!
Do you have a sample dataset you could share with me so I could use it tomorrow?
-
Hi David, Great! I just sent you an email with a mock dataset.
-
Great session today, folks! I just posted the video, which you can find here.
We walked through setting up git/GitHub from scratch. It was surprisingly easier than expected!
The Git for Humans slidedeck I shared can be found here.
The Happy Git with R book is here. It’s the best reference for everything related to R + Git.
And the article on generating credentials to connect RStudio + GitHub is here.
One thing to note is that, while it worked for me to connect using just a username and password in today’s session, GitHub will not allow this in the future. So, if you’re planning to use GitHub, you definitely need to follow the instructions to set up and use a personal access token (PAT).
One resource we didn’t discuss today, but that is really good is from the 2020 R for Excel Users workshop by Julie Lowndes and Allison Horst. They discussed Git/GitHub in that workshop and have a really nice overview of how it works.
The other thing that we discussed, following up on a question from @clint-thomson
is removing rows with NA’s. In addition to the examples we gave in the session, another idea I came up with afterward is to use the naniar package. You’ll see in the code I wrote that I added a few lines using naniar to show how you might identify rows with all blanks minus the first ID column.
Log in to reply.