This blog post has gotten so much interest that I’ve turned it into a full-length course. Check it Out
Using Git and GitHub alongside RStudio has the power to revolutionize how you work in R. But getting everything set up can be a challenge. Join me as I walk through everything you need to do in order to use Git and GitHub alongside RStudio.
But first, some background ...
What is Git? What is GitHub?
I remember when I was starting out learning R, Git and GitHub were things I had heard about, but only vaguely understood. I had a sense that they were about collaboration and sharing code, but beyond that … 🤷
So … what are Git and GitHub? First of all, they are two separate things:
Git is open source software for version control. Using Git, you can do things like see all previous versions of code you’ve ever created in a project.
It’s possible to use Git without using GitHub, though most people combine the two. Being able to have a record of all of the changes you’ve ever made to your code both locally and on a remote website is powerful.
Why Should I Use Git and GitHub?
I have seen three major motivations for people to adopt a Git/GitHub workflow:
Using Git and GitHub serves as a backup. Because GitHub has a copy of all of the code you have locally, if anything were to happen to your computer, you’d still have access to your code.
Using Git and GitHub allows you to use version control. Ever had documents called report-final.pdf, report-final-v2.pdf, and report-final-v3.pdf? Yes, yes, you have. Instead of making copies of files over fear of losing work, version control allows you to see what you did in the past, all while keeping single versions of documents.
Using Git and GitHub makes it possible to work on the same project at the same time as collaborators. Many of the teams I train that are learning R decide to switch to Git/GitHub after collaborating using Dropbox, Google Drive, OneDrive, or the like. The problem they run into is that only one person can work on an RStudio project shared in this way. Git and GitHub have built-in tools that enable simultaneous asynchronous work, a major benefit for those working in teams.
The best resource I’ve found for understanding Git and GitHub comes from this 2016 talk (slides here) by Alice Bartlett of the Financial Times (hat tip to Garrick Aden-Buie of RStudio for telling me about it).
How to Set up Git
Now that you have a bit more of an understand of what Git and GitHub are, let's talk about how to set everything up. Much of what I'll share comes from the excellent book Happy Git with R by Jenny Bryan and Jim Hester. However, at the time of writing (February 2021), some things have changed with regard to credentials. I lay out what I believe is the most up-to-date advice for getting everything set up.
The first step is to install Git. Chapter 6 of Happy Git with R lays out the process for Mac, Windows, and Linux users. I'm on a Mac so Git came pre-installed on my computer. I was able to verify that I had Git installed using the terminal in RStudio.
The next step is to configure Git. This is covered in Chapter 7 of Happy Git with R, though I show what I believe to be a slightly easier process. Specifically, I suggest using the
edit_git_config() function from the
usethis package, which will open your gitconfig file. Add your name and email and close this.
Initialize a Git Repository
Now that you’ve installed and configured Git, you can use it locally. The
use_git() function will add a Git repository (often referred to as a “repo”) to an existing RStudio project. Here I’ll create a new project and then initialize a Git repo.
View Commit History
Now that my RStudio project has an associated Git repository, I'll see an extra tab on the top right: the Git tab. From here, I can see the entire history of changes to my code over time (not many yet!).
Make a Commit and View More History
Git doesn't automatically track changes the way a tool like Google Docs does. Instead, you have to tell Git: I made changes and I want you to keep a record of them. Telling Git this is called making a commit and you can do it from within RStudio.
Each commit has a commit message, which is helpful because, when you look at your code history, you see what you did at each point in time (i.e. at each commit). RStudio has a built-in tool to view your code history. You can click on any commit to see what changed, relative to the previous commit. Lines in green were added; lines in red were deleted.
Connect RStudio and GitHub
The process so far has enabled us to use Git locally. But what if we want to connect to GitHub? How do we do that?
Sign up for GitHub
The first step is to sign up for a (free) GitHub account.
Create a Personal Access Token (PAT) on GitHub
Once you’ve signed up, you’ll need to enable RStudio to talk to GitHub. The process for doing so has recently changed (this is where I see the largest major difference from Happy Git with R). The best way to connect RStudio and GitHub is using your username and a Personal Access Token (PAT). To generate a personal access token, use the
create_github_token() function from
usethis. This will take you to the appropriate page on the GitHub website, where you’ll give your token a name and copy it (don’t lose it because it will never appear again!).
Store Personal Access Token to Connect RStudio and GitHub
Now that you’ve created a Personal Access Token, we need to store it so that RStudio can access it and know to connect to your GitHub account. The
gitcreds_set() function from the
gitcreds package will help you here. You’ll enter your GitHub username and the Personal Access Token as your password (NOT your GitHub password, as I initially thought). Once you’ve done all of this, you have connected RStudio to GitHub!
How to Connect RStudio Projects with GitHub Repositories
Now that we've connected RStudio and GitHub, let's discuss how to make the two work together. The basic idea is that you'll set up projects you create in RStudio with associated GitHub repositories. Each RStudio project lives in a single GitHub repo.
How do we connect an RStudio project to a GitHub repo? Happy Git with R goes over three strategies. I'll demonstrate two of them.
Sometimes you already have a project locally and you want to get it on GitHub. To do this, you’ll need to first use the
use_git() function from
usethis, as we did above. Then, you can use the
use_github() function, which will create a GitHub repo and connect it to your current RStudio project.
The most straightforward way to use RStudio and GitHub together is to create a repo on GitHub first. Create the repo, then when you start a new project in RStudio, use the version control option, enter your repo URL, and you're good to go.
Now that we've connected RStudio and GitHub, we can push and pull our work between the two.
Pushing means sending any changes in your code from RStudio to GitHub. To do this, we first have to commit. After committing, we now have a push button (the up arrow) on RStudio that we can use to send our code to GitHub.
The opposite of pushing is pulling. Using the down arrow button, RStudio goes to the GitHub repo, grabs the most recent code and brings it into your local editor. (Pulling regularly is extremely important if you're collaborating, though if you're the only one working on an RStudio project and associated GitHub repo, you know your local code matches what's on GitHub so it's less important.)
You Did It!
You're now all set up to use Git and GitHub with RStudio!
If you’re looking to learn more and I haven’t yet beat into you the idea that you should check out Happy Git with R, let me try it once more. It’s the best book to help guide you as you go deeper with using Git/GitHub in your R work.
Additionally, materials from a 2019 rstudio::conf workshop titled “R for Excel Users” cover “version control and practice a workflow with GitHub and RStudio that streamlines working with our most important collaborator: Future You.”