Thomas Mock is a Customer Success Representative at RStudio. He fell in love with R and data science through his PhD research, which he did in Neurobiology at the University of North Texas Health Science Center. Thomas is passionate about growing the R community, and founded #TidyTuesday to help newcomers and seasoned vets improve their Tidyverse skills. In his home town of Dallas, Thomas and his wife run on the local trails, play with their Boston Terrier “Howard”, spend lots of time with their five nephews and families, and are always on the lookout for great new food, especially churros.
Why did you decide to learn R?
Up until 2018 I was a life-long academic, having completed a BS in Exercise Physiology in 2011, completing a Master’s Degree in Exercise Science in 2014, and beginning my PhD in Neurobiology in 2014. I was first “taught” R in an Introduction to Statistics graduate course in 2016. I say “taught” as the course really only covered the most basic commands in base R, we used the R Commander interface and no concept of doing any proper statistical programming were introduced. Within the first month of the course I actually reverted back to doing things in Systat with a GUI as I was so frustrated with not knowing what I was doing in R.
I ACTUALLY began my R journey at the start of 2017 — I was just finishing the 2nd year of my PhD, realized that I did not want to stay in academia and needed an “escape route”. I wanted to build a career around the things I actually cared about from my graduate training: data analysis, data visualization, presentations, solving problems and working with real people. I spent a chunk of time researching alternative career paths and found data science, which was a nice mix of the things I was passionate about, so I decided to dive into learning statistical programming/data science.
I saw that most jobs wanted R/SQL/Python so I tried to learn all 3 concurrently while in my PhD. This didn’t go well. I dropped SQL as I didn’t have a good way of practicing without a local database, and then dropped Python as most of the training resources I could find were built around computer science concepts which I couldn’t readily apply in my daily work. Thus I committed fully to learning R as I at least already had it installed locally and knew I could try and use it for my own work in the meantime. This was a key concept – the only way I had a realistic shot at really learning R while doing my PhD was to use R for my dissertation data analysis. I thus had to shift my whole workflow over to R from using Excel for data cleaning, Systat for statistical analysis and descriptive statistics, and Origin for plotting.
How easy or difficult was learning R? What was easiest? What was most challenging? What resources were most helpful?
Learning in isolation on any topic is always going to be challenging and self-directed programming is no different. I had never really done any programming, much less had any computer science coursework. I was alone in my department as the only person even attempting to use R as opposed to SPSS/Systat so I had no one to work through my problems with, no one to help get me over hard blockers in my R progression, and no one to to help guide me to towards learning resources.
As such, my self-directed learning stalled and I spent all of spring 2017 preparing for my PhD oral qualifying exam. After passing my qualifying exam, I again spent most of the summer of 2017 attempting to teach myself the basics of using R for data analysis, writing code in the R console and attempting to duplicate blog posts/examples. Ironically, the most difficult task I found initially was simply reading data into R – I spent a large chunk of time trying to understand paths and directories, reading in various file types, finding packages necessary to read in specific files (Excel), etc.
I also ended up with very messy storage and scattered .R scripts as I had no consistent file structure or projects. I was still using my GUI workflows for my daily PhD work and not R for core tasks as I couldn’t yet reproduce my existing work due to my limited skillset.
A few things happened in unison that quite literally changed my career trajectory and rapidly increased my learning at the end of Summer 2017.
- I started interacting with the #rstats community on Twitter
- I had my first exposure to the Tidyverse (via Twitter)
- Jesse Mostipak created the R for Data Science “bookclub”, which eventually became the R for Data Science Online Learning Community
- I purchased the R for Data Science book
Learning the Tidyverse was a breath of fresh air and suddenly I had additional experts and learners who I could engage with through the R4DS community. My learning accelerated — I was able to move all of my plotting into R via ggplot2, which actually improved upon the software suite we were paying thousands of dollars for. I also was able to shift all of my descriptive statistics into dplyr, and data cleaning via tidyr. This saved me days of time a month in data cleaning and summarizing, as well as plotting.
By October 2017, I was doing things in R I literally couldn’t do in other software suites. Most notably, one of the first public things I shared was a core dataset I couldn’t wrangle in Excel. By early 2018, I wrote an article called A Gentle Guide to Tidy Statistics in R. I eventually re-posted this on my personal blog, and most recently did a webinar on the same idea.
In April 2018, we (the R for Data Science Online Learning Community) created the TidyTuesday project, which is a weekly data analysis project run on Twitter. This was my attempt to give back to the community that had helped me so much, as I found that I learned best when doing short real-life projects on “fun” datasets, rather than just copy pasting code or basic examples. TidyTuesday has since grown into a great way to apply your R skills, get feedback, explore other’s work, and connect with the greater #RStats community!
I share these specific dates and creations to show where I was at these moments and really drive home the power that the Tidyverse gave me so quickly especially as a self-directed learner. Also to share the sincere appreciation I have for Jesse and the other R4DS members who helped me along the way.
In what ways has learning R changed your work?
R changed my workflow completely as a PhD Student — instead of manually re-running stats weekly for my 1:1 meeting with my mentor, I could generate an R Markdown report programmatically with the results, analyses, and plots. I saved hours a week and days a month by creating and running scripts/functions to clean and aggregate my data, run the many ANOVAs I needed, and save the outputs to Excel for my mentor. I appreciate my mentor being willing to let me learn and use R for all my dissertation work, once I confirmed I could “reproduce” my stats in Systat and in R.
Importantly, I also was able to keep my data in a single software suite (R), in a single data format/arrangement as opposed to having to read data in to three separate pieces of software all of which wanted different versions of wide, long, or summary data. The skillset I learned through purrr also allowed me to scale out my analytics workflow to do things in parallel much faster than any type of manual workflow.
Ultimately, learning R via R for Data Science/Tidyverse and the R4DS Community gave me an “escape route” from my PhD (which I still finished), helped me prepare myself for a career in Data Science, and landed me a job at RStudio helping enterprises use R in Data Science at scale and in production. I still use a good chunk of R on a daily basis, whether for TidyTuesday or for my own projects internally at RStudio.
What do you think people considering learning R might not appreciate about it?
I think what people just learning R don’t realize is that it can “level up” your workflow so much. Simply converting your workflow over to R can save you time immediately as you can script your analyses. It also gives you greater insight and reproducibility in how you came to a conclusion or result. Couple these skills with things like purrr and RMarkdown and you’re now generating elegant reports, websites, graphs, and analyses rapidly and in a reproducible way.
Additionally, the skillset you begin to curate provides additional workflows that you may not have ever considered. Things like webscraping, getting tables programmatically out of PDFs, cleaning up text with stringr, and creating packages to further simplify or enhance your own workflow. At the end of it all you have a much broader skillset that both enhances your daily workflow as well as giving you modern skills to reach additional audiences or even open up additional career opportunities.
Lastly, I also found coding in R to be much more rewarding than GUI interfaces, as you feel like you are actually creating something rather than clicking away repeatedly without really thinking about what you are doing.
Beyond any of the actual workflow changes, speed ups or additional skillsets, the R community itself is the thing that makes us all more successful. Without people freely giving away their knowledge, resources, time, and software my career wouldn’t be where it is today, and I feel blessed to continue being a part of this wonderful community.