R’s Killer Feature: RMarkdown

Recently I’ve been talking with people considering learning R about RMarkdown. In my experience, when people think about switching to R, they often think about it as a direct replacement for the tool they currently use (i.e. Excel, SPSS, SAS, Stata, etc.). R can do everything that these tools do, of course, but RMarkdown is one of the features that is most likely to fundamentally transform their work.

When I try to explain RMarkdown to people, they often struggle a) to understand what it is, and b) to understand why it is valuable.

What is RMarkdown

In the past, I’ve described RMarkdown as “the tool that you never knew you needed, but once you learn it, you’ll wonder how you ever lived without it.” Here’s a short video to show you what it looks like.

Why is RMarkdown Valuable?

Why is RMarkdown valuable? I’ve come to see its benefits on a continuum, from simple (doing your current work better) to more complex (doing your work in new ways).

Use One Tool from Start to Finish

In my experience, the most convincing reason for newcomers to consider RMarkdown is that you avoid switching between multiple tools. No longer do you do your data wrangling and analysis in SPSS, your data visualization work in Excel, and your reporting writing in Word — now you do it all in RMarkdown. This lowers the likelihood of errors created in switching between these tools (something we may be loath to admit we’ve done, but, really, who hasn’t?).

Focus on Content, Not Formatting

Moving higher up the list of reasons to use RMarkdown, we find the value of avoiding thinking about formatting too early in the process. If you care at all about how things look, you’ve likely spent time while writing adjusting fonts, colors, etc. This slows down the writing process and is best done at the end of writing. Writing in RMarkdown, which is a very simple, text-based process, forces you to focus exclusively on your writing (i.e. the content of what you’re saying, not how it looks).

Ensure Consistent Branding

I was talking recently with a potential client who complained that she struggled to get her staff to correctly use their organization’s Word template for reports. In situations like this, there is a really nice solution, namely having RMarkdown use a reference document so that any document you knit to Word takes on your organization’s style.

Embrace Reproducibility to Save Yourself Time

Going one step further, working with RMarkdown enables reproducibility. This is something I’ve written about previously, as have others. In a recent conversation, Dana Wanzer told me how reproducibility in R helps her:

Maybe I get a few more consent forms for students so now I can add a few more people into my data. Maybe another school gets their data to me in at a later date. Before, that would mean re-running everything manually. Now, I just update my data file, maybe adjust my filter code a little bit at the beginning, and then re-run everything.

– Dana Wanzer

Looking for a video to share with others that demonstrates the value of reproducibility? Here you go (hat tip to Jenny Bryan for bring this to my attention).

Produce Many Reports from One RMarkdown Document

When you start to go deep with R and RMarkdown, you realize that the possibilities are huge. One thing that blew my mind when I first learned about it was parameterized reports. Say you have data on 10 different programs and you need to produce a single report for each one. With RMarkdown, you can write code that automatically does this, creating 10 reports as quickly as you can create one. The Urban Institute had a really nice write-up last year of how they do this in creating fact sheets for all 50 US states.

Use Version Control (and Never See a final-final-final.docx File Again)

The next level of benefit of using RMarkdown comes from incorporating it with git. Using a version control system like git enables you to keep a record of your work (i.e. version control). As Jenny Bryan writes in her book Happy Git with R,

Git is a version control system. Its original purpose was to help groups of developers work collaboratively on big software projects. Git manages the evolution of a set of files – called a repository – in a sane, highly structured way. If you have no idea what I’m talking about, think of it as the “Track Changes” features from Microsoft Word on steroids.

– Jenny Bryan

Use RMarkdown and you’ll never have to have files called report-final.docx, report-final-final.docx, and report-final-for-real.docx. You can keep one report file and refer back to the history to see how that file has changed over time (and if you got rid of something you want to restore, it’s easy to do!). Learning git is not easy (definitely use the Happy Git with R book), but it is definitely worth it.

Collaborate with Colleagues More Efficiently

Now, you just heard me say git. If you’re like most people, you probably think that’s synonymous with GitHub. I used to think this. They’re not the same, but they work closely together. GitHub facilitates working with others, another benefit of using RMarkdown.

You can create shared repositories that you and your collaborators have access to. You can each edit documents locally and then you push changes to GitHub. If there are conflicts in the files (maybe you and a collaborator both edited the same section), you decide which version to use, and move on. If you’ve moved from an emailing Word documents back and forth workflow to a shared Google doc workflow, using GitHub is the next step in that journey toward collaboration. More and more researchers are using GitHub to collaboratively publish work (for example, R for Data Science).

Communicate Results in New Ways

A final benefit of using RMarkdown is that it enables different types of communication. I’ve worked with organizations to move away from 100-page reports that no one reads to thinking about sharing results through online, interactive reports, dashboards, and more. RMarkdown makes this type of reporting easy. There are packages for creating interactive charts and graphs, online maps, slides, and dashboards — and they can all be created starting with RMarkdown.

Are There Any Downsides?

Yes, you’re thinking. This all sounds great. There must be some downsides you’re hiding. Here are the two major downsides I’ve encountered in switching to RMarkdown.

There’s a Learning Curve

Learning R and RMarkdown does not happen overnight. It takes time to learn how to use these tools. Throw in git/GitHub and the process is even longer. Know that moving to this workflow is good in the long term, but likely to be a bit painful in the short term.

Commenting is Limited

The one thing I miss most from using a Word/Google Docs workflow is comments. RMarkdown does not have the ability to add comments in the same way that these other tools do. There are workarounds. You could, for instance, add text patterns that only indicate comments (similar to how editors use TK to indicate text to come later) so that you can search for them later. In any case, know that you won’t have the ability to comment on text in RMarkdown in the same way you do with other tools.

Update: As Emily Kothe helpfully pointed out on Twitter, she has created a package called rmdrive, which allows you to go back and forth between RMarkdown and Google docs.

I haven’t used the package, but it does look promising!

Reader Interactions

Comments

  1. I have tried to setup GitHub repository…. another fail. I am experiencing an issue because my local git directory is on my D drive. I am having the dickens trying to set it up. As for RMarkdown, particularly with version control looks AWESOME! I had no idea what RMarkdown was. I thought is was for publishing web pages.

    Thanks David.

Leave a Reply

Your email address will not be published. Required fields are marked *