In choosing subjects for My R Journey interviews, I tend to stay away from statisticians. My mission with R for the Rest of Us is to demonstrate ways of using R that go beyond how many people expect R to be used (e.g. complex statistical models). But, if you know Sharla Gelfand, you know that they're not your typical statistician.
Sharla is a statistician at the College of Nurses of Ontario, a freelance R and Shiny developer, and co-organizer of R-Ladies Toronto and the Greater Toronto Area R User Group. Their work focuses on enabling easy access to data and replacing manual, repetitive work with reproducible and future-proof processes in R. And, as they themselves put it, the “most statistical thing I do these days is calculate a median.”
Sharla posted an incredible Twitter thread (seriously, read the whole thing) earlier this year, laying out how they and their colleagues had made their process for report writing entirely reproducible using R. A report that had taken hours and hours to produce will, in future years, be done in just minutes.
This example inspired me to reach out to Sharla — who describe themselves outside of R as an eyeshadow aficionado, a shiba inu owner wannabe, a bass player, and a cyclist — to learn more about their background, how they came to learn R, how they use it today, and what advice they might have for folks looking to follow in her footsteps.
Why did you decide to learn R?
I was a statistics major in university and R was really the only option! We didn't have a dedicated programming course, though, so you were kind of expected to pick up bits and pieces in various courses along with way. This was in 2010, before RStudio existed and before the tidyverse was even a twinkle in Hadley Wickham's eye (ok, I can't actually confirm that), so for me, it was a lot of disjointed bits of code without an understanding of the bigger picture and what R could do beyond a linear regression.
For some reason (stubbornness? Greater fear of SAS?) I stuck with R and continued on to use it in grad school and at work. As soon as a coworker told me about this thing called dplyr, things got a lot easier.
How easy or difficult was learning R? What resources were most helpful?
I think that learning R in such a disconnected way was tough, and for a long time I wasn't super motivated to improve or learn more, especially when I was working in isolation. Joining Twitter a few years ago and seeing all the cool stuff people were doing with R was super motivating and a shift in mindset. Suddenly I saw people sharing what kind of work they were doing and what tools they were using. I started to understand just how big the R ecosystem is and how much you can do with R. There's also a weekly newsletter, rweekly.org, that is a great weekly digest of highlights, new packages, and analyses in R.
Joining Twitter a few years ago and seeing all the cool stuff people were doing with R was super motivating and a shift in mindset. Suddenly I saw people sharing what kind of work they were doing and what tools they were using. I started to understand just how big the R ecosystem is and how much you can do with R.
I really am a learn-by-doing kind of person. I don't absorb much from video or book lessons until I need to do something myself, and I'm always super happy when I can track down someone else's blog post to see how they've done it! Things like Tidy Tuesday are super helpful for getting started with R on a specific data set and getting feedback, and it has a really welcoming community behind it to boot.
One meta thing that I think is tough is learning how and where to ask questions. I personally love the RStudio Community for this. Asking on the Community is a win-win — you get help, and it’s there as an artifact for other people to find when they have the same question. The key is to help people help you — the best way to get help is by providing a reproducible example, otherwise known as a reprex.
In what ways has learning R changed your work?
Using R has completely changed the nature of my job. At the College of Nurses of Ontario, we have an obligation to release data to the government and to the public. Previously, this process was tedious and disjointed. It involved getting a data extract from another team, cleaning the data (or, if you have a new coworker - me! - teaching them how to clean it), creating summary tables in SPSS, combining the data with previous years' in Excel, and adding it all to the old report in a word document. Sometimes it even involved sending the data to a different team to make tables and graphics. If you mess up a data cleaning step or miss something, you have to do the whole thing again!
With R, we own the process from end-to-end: data extraction, cleaning, summarizing, and reporting using the tidyverse and packages like RMarkdown, kable, and kableExtra, and ggplot2 to make beautiful, production-ready reports that are just as good (better, if I do say so myself!) than the manually process. Since the whole process is in R, and reproducible, it'll be so much faster to do it all again next year.
What do you think people considering learning R might not appreciate about it?
Learning R really doesn't have to be about learning statistical models or advanced computation methods. I have a masters degree in Statistics and the most statistical thing I do these days is calculate a median. For me, R is a tool to facilitate communication of information, whether it be via Shiny apps or beautiful and customizable documents using RMarkdown and ggplot2. Using R for this is, in my opinion, so exciting and empowering!
For me, R is a tool to facilitate communication of information, whether it be via Shiny apps or beautiful and customizable documents using RMarkdown and ggplot2.