Skip to content
R for the Rest of Us Logo

R for the Rest of Us Podcast Episode 5: Alison Hill

In this episode, I talk R Markdown with Alison Hill, the Director of Knowledge at Voltron Data. Before taking on this position, she held data science and education positions at IBM and RStudio. Alison has a passion for helping individuals improve the efficiency, accuracy, and reproducibility of their work through the use of R Markdown. During our chat, she shares her personal journey of learning R and the benefits of using R Markdown.

You can learn more about Alison's work on her website

Learn More

If you want to receive emails when we publish new podcast episodes, sign up for the R for the Rest of Us newsletter. And if you're ready to learn R, check out our courses.

Audio Version

Video Version

In the video version, Alison explains the basics of R Markdown.

Resources Discussed

Transcript

[00:00:00] David: Hi, I'm David Keyes and I run R for the rest of us. You may think of R as a tool for complex statistical analysis, but it's much more than that from data visualization to efficient reporting, to improving your workflow. R can do it all on this podcast. I talk with people about how they use R in unique and creative ways.

[00:00:18] Join me and learn how art can help you.

[00:00:21] /I'm joined today by Alison Hill. Alison is the director of knowledge at Voltron data. Prior to this role, she worked in data science and education roles at IBM and our studio, and she was a professor at Oregon health and science university in Portland with PhD in developmental psychology, quantitative methods and evaluation from Vanderbilt.

[00:00:41] Alison has long taken a keen interest in helping people use R markdown to make their work more efficient, more accurate, and more reproducible. Um, thank you, Alison, for joining. I'm delighted to have you on the show today.

[00:00:53] Alison: /Thank you. I'm delighted to be here with you.

[00:00:57] David: / I know you've just switched to a new role.

[00:00:59] Um, what is, what does a director of knowledge do? What, what does that mean?

[00:01:04] Alison: /Ah, yes. Uh, well, so Tron data is a company that's all about bridging hardware, software in communities. Uh, so the director of knowledge role that I've taken on there is trying to sort of build bridges internally between our different teams so that we can all stay United on the mission. Um, so, uh, you can imagine that a startup has sort of a explosive.

[00:01:28] Of, uh, new employees and that's exactly what we're experiencing, which also means an explosive growth of ideas, um, and also documents . So there's a lot of knowledge being shared, a lot of knowledge that you'd like to have transferred to certain, you know, teams and people and especially knowledge about really highly technical concepts.

[00:01:49] So, uh, Voltron data is all about, um, accelerating, uh, the, uh, patchy arrow, um, open source toolkit. And so we've got some pretty technical people who are working on that and some pretty technical high ideals and things that we're trying to achieve. So I'm, uh, shepherding that information and trying to figure out a way and a strategy for us to be able to.

[00:02:13] Bring all the employees along for the ride and help everybody be able to do their job better. So trying to reign in some of the systems and the ways that we talk with each other, but also the actual content, like what are, what do people need to know in order to feel like they can talk about, um, what we do as a company and what we're aiming for in our vision and our mission.

[00:02:34] So it's kind of a broad purview, but it's been, um, really fun. I've been there for about three weeks and I'm, I'm still in the, like drinking out of the fire hose phase . Um, and, uh, and it's been really exciting. It's also, you know, a lot of people, um, on the more technical side who are really committed to open source and community engagement.

[00:02:53] So I really love that flavor of, um, Voltron data's mission. /

[00:02:59] David: Right? Um, Where does R fit in, in terms of your current.

[00:03:04] Alison: /Yeah. Um, it doesn't fit in much in terms of my day to day working. Um, so right now I'm doing a lot of, uh, kind of knowledge sharing and knowledge transfer in, uh, Google workspaces and notion and, um, reading through a lot of materials that have already been created a lot of resources. Um, but yeah, the, the company as a whole does have this mission of making, making it easier to.

[00:03:29] Big data in whatever language you want to use using whatever, you know, user interface you wanna use. So they want to support right now they're supporting, um, Python and, um, our users through supporting, uh, dly and Lu date, which are some of the core Tidyverse packages. Um, so, you know, the idea is that you're able to work with big data wherever it is, um, and be able to use whatever tools that you want to be able to use.

[00:03:54] So they really wanna support sort of this like poly ggplot workflow. Like, you know, you kind of bring your own language and you can work with, with the data that you need to. So I really love that part of it. And there's definitely some, um, some, some friends on teams that are working on the, uh, Apache arrow R package, uh, as well as, um, some of the, uh, some of the other kind of like, I think of them as, um, sort of glue.

[00:04:17] Software, that's kind of helping make those exchanges between R and some kind of big data systems, a little bit easier. So there's a few open source projects that are going on right now. Substrate is one of 'em, um, DLI support is a part of arrow. Um, and then there's also, um, a Python crew. Who's also working on sort of corollaries to DLI and Python.

[00:04:38] So being able to use a Python package called IBUs to be able to, um, interface with data the way that you'd like to. Um, so it's, it's pretty exciting and it's nice to be able to be in a place where you can really support open source development, but also be focusing on the people at the company, um, who would like to understand all of that, but need it at a level and, um, uh, kind of delivered in a way that's a little bit easier than maybe reading through GitHub issues or GitHub release notes.

[00:05:08] David: /Yeah. One, it seems like the kind of thing that, that runs through all of your work is, um, one of the things is really being able to communicate, um, effectively in bringing together, um, you know, different audiences, that kind of thing.

[00:05:24] And I've asked you on today, cuz I want to talk about, um, our markdown, which

[00:05:28] is if nothing else about, you know, being able to kind of efficiently and effectively communicate.

[00:05:34] Um, so before we actually dive into talking about our markdown, uh, I'm curious about kind of your background and how you got into R and kind of what switched for you. Um, when you did move to R in terms of your work.

[00:05:47] Alison: /Yeah, well, um, so I got into using R at first, uh, when I was a new professor at Oregon health and science university, and I got into it because I had done all of my graduate research, um, in psychology. I had done it all. All the statistical AEs in SAS and that's all my courses had been in all SAS. And I really liked SAS for those who aren't familiar with it.

[00:06:09] It's, it's sort of like a baby stepping to command line tool. You know, you are writing out text, um, to be able to interact with data. Um, but it's a little bit different than, you know, working with a programming language, like our Python, um, so, or Python. And so I used SAS. I was a happy SAS user. And then I joined, uh, O H S U and I realized the cost of a SAS license and the, um, the director of the program that I was in said, you know, look, he had come from bell labs originally.

[00:06:38] So that's where the S language was originally developed and R sort of evolved out of S so he was super comfortable with base R and he was like, look, I use R. It's open source. Um, I think you should try it. And at that point I was in this computational program, uh, uh, working at the center at Hsu. That's no longer in existence, but it was called the center for spoken language understanding.

[00:06:59] And what we did was we trained people who were machine learning, researchers and natural language processing researchers to be able to work with, um, medical and health related data. And to be able to, you know, use that and use comp you know, advanced computational methods to be able to derive, you know, unique insights about, you know, healthcare related, you know, issues like children's response to treatments, um, symptom progression, things like that.

[00:07:23] Um, and so I started learning R and I was like, wow, our students could learn to use R to do the statistical analyses that they need to do. Um, so I found myself really drawn to being able to help people. Do the work that they needed to do to do their jobs? Uh, so I started teaching R um, for data science at O H S U.

[00:07:43] And that was sort of like my hook. Like I felt like it had superpowered my workflows to be able to do better research. And then I started using our markdown because I had collaborators and my collaborators didn't know R I was still like, kind of that like island R user that, you know, I was in a computational department, but everybody else else there actually used Python.

[00:08:02] Uh, so I used R and I was kind of like this lone Wolf where I would, uh, be knitting my little documents and being able to be, you know, create things that were shareable, but also create things that were really dynamic. So I could go into a meeting with a collaborator who. Want to use anything, you know, who maybe was the, the clinical expert subject matter expert on my team and be able to show them like, here's the data, here's what I did with it.

[00:08:24] Here's some visualizations. And we could have really productive meetings where we could iterate very quickly because you know, the subject matter experts could be like, oh, well, but what if we, you know, faceted by this? Or what if we, you know, um, looked at the group in this way, or how is that different if we use this variable versus this variable?

[00:08:40] And it allowed me to just go into my R Markdown document, edit the code as we were talking, and then, you know, regenerate plots as we were meeting, uh, be able to regenerate tables and then I'd be able to give them, you know, an actual artifact, like I'd be able to, you know, uh, knit to PDF or knit to an HTML document that I could share with them.

[00:08:57] And then they'd have that kind of like in their hands and on their computers so that they could look at it later. Um, so for me, it really supercharged my ability to actually collaborate with other people and not feel like I was, uh, kind of doing science in a vacuum.

[00:09:12] David: /That's interesting that it supercharged your ability to work with non our users

[00:09:17] sounds like as much as with at least initially as, as much as with, with our users. you describe it.

[00:09:24] Alison: /and as I sort of became more of an evangelist, you know, I, I started training more people underneath me cuz I wanted to work with, you know, research assistants and graduate students who were using my same tools. So then it was really more fun when you got to have people that, you know, I started teaching classes in data visualization and in data science.

[00:09:39] And then, um, those people would come to work in my lab with me. And then it was like, you know, really supercharged fun at that point because you could be sharing documents. You know, we were using get lab, um, to be able to share our code with each other, but we'd be able to, you know, really quickly iterate and have fun with poster presentations, papers, you know, um, progress reports, anything where we were consuming the data and trying to get a peak at, you know, what was happening, how our research was going.

[00:10:05] Um, it was a lot more fun when I had other people who are our users to, to join in also

[00:10:12] David: /Yeah, definitely. Um, so you mentioned briefly our markdown.

[00:10:17] Um, curious, cuz I always have trouble describing our markdown to, to non our users. I'm curious,

[00:10:26] maybe starting out talking about our markdown, how, how

[00:10:29] you define it when you're talking to people who are not familiar with it.

[00:10:34] Alison: /that's a really good question. Yeah. I, I think of our markdown as something that if data scientists didn't have it, they would've had to invent it because, you know, if you're sitting there and you're writing your code, you might be surprised at somebody who doesn't, you know, maybe interact with code frequently or someone who's, you know, kind of, uh, only consumed the output of people's code that a lot of people just use scripts and, um, and kind of go line by line and create things, but don't necessarily have a shareable, um, artifact of what they did.

[00:11:03] And for me, our markdown was both like a place to do work. So it was kind of an interactive experience because I was using the R studio IDE my integrated development environment. So I could. Run code as I was working. So it allowed me to sort of like, um, uh, code while thinking, you know, so I could write notes to myself and I could try different things and I could iterate quickly, but then also it allowed me to package it up and kind of in a nice little bow and be able to say like, okay, I'm gonna basically print this off.

[00:11:30] It's sort of like a Google doc print to PDF. It's like, I can give you this, um, this kind of fossilized version of my work, and I can also kind of edit that. I can make it, you know, more or less relatable to you. If you don't wanna see my code, I can just like mute all my code and just show you my plots or I can, um, actually use it for teaching materials.

[00:11:49] So I can actually show you, you know, if you want to learn how to code, you can see my code and exactly what it produced. Um, so it's sort of like that Swiss army knife element. And I think that's, what's. About explaining our markdown to people too, is that it's, it's not one thing. It's an R package, it's a file format.

[00:12:04] It's also sort of an ecosystem of multiple R packages. Um, and, uh, and then you've got like all these kind of different verbs around it and all these nouns about like file formats and packages. Uh, so I think it's hard because you can use the word R markdown to, to kind of, you know, connect with any of those concepts.

[00:12:22] Uh, so I think if you're just kind of on the outside, looking in, it's helpful to kind of define it at the different levels. Like it's both a file format, like a dot RMD document is what you need to be in, but that's just a plain text document that just has some like special R chunks in it that allows you to include code with real words.

[00:12:41] David: /Yeah. And so our, our markdown is often referred to as a form of, of literate programming. I'm

[00:12:47] curious. Um, well, what, what does literate programming mean? And

[00:12:51] what's the value of literate programming?

[00:12:55] Alison: /So literate programming is a concept that was developed by Donald. I believe it's N. Nooo, I think there's a pronunciation on his website. Um, uh, and it was really this idea of, um, being able to weave together this code plus narrative. And I think in the original, um, you know, kind of like flushing out of the idea, it was really more for programmers to be able to write more illiterate code.

[00:13:21] And I think the labeling of it, even, I think in some of the original writings, they talked, um, he talked about how he labeled it, literate programming on purpose to sort of give it a little bit of a value judgment so that like, you don't wanna be an illiterate programmer, right? Like you want to be illiterate programmer.

[00:13:36] Um, so, so it was kind of very intentionally labeled that way. Um, but his idea was that, you know, he ended up feeling like he wrote better code and people that he worked with were better code when they were kinda weaving together sort of documentation at the same time. So being able to. Say, you know, not just what you're doing, but why you're doing it and why you're doing it this way.

[00:13:57] Um, and I think that sort of filtered down from the, the programming domain for, for scientists and for data scientists. So anybody who needs to work with data can also kind of be inspired by that idea and think about like, okay, great. Like that's also really helpful concept for me to be able to, you know, package up my own ideas and work and have it have more of an impact.

[00:14:19] So I think it's one of those like great programming concepts that makes a whole lot of sense. But I think, uh, you and I kind of shared this, um, uh, this reference, I think, to the curb effect, like it makes it easier for everyone, you know, to be able to build a system that is more open and accessible to people who are not the person who wrote the code.

[00:14:38] So the original intention was for one developer to be able to see another developer's program and be able to understand it better. But data scientists has sort of co-opted it and said like, okay, here's actually a way for you to be able to understand, you know, the science that I'm doing better.

[00:14:53] David: /Yeah, that makes sense. So, One critique that I hear a lot and this isn't specific to our markdown, but it's specific to our, um, is that taking our takes a while. Um,

[00:15:05] and so I'm curious what makes it worth it and in particular, what, what, in what ways do you think our markdown especially makes it worth taking the time to learn our or R.

[00:15:16] Alison: /Uh, I think it really depends on what you're trying to achieve. So a lot of times when I would talk to, um, new, especially like data scientists, or even researchers who are doing academic research science, um, You know, being able to figure out how to, uh, talk to other people in your groups is really important.

[00:15:34] And so if you are feeling that pain, uh, then I think R markdown is kind of the, the best Swiss army knife solution. It has a lot of benefits for yourself as well, but I think the primary benefits are really being able to, you know, jump back into your own code in three months or even a week. Um, you know, if you take a spring break and you come back, um, you know exactly where you left off, because you've left yourself a nice little trail and it goes beyond commenting code.

[00:16:00] So a lot of people in the scripts will just like comment and they think that's enough, but it's really more than that. Um, and it's also being able to explain like your thought process, why you're breaking things down that way, but also like the output itself. Like if you have a plot, it doesn't stand on its own.

[00:16:14] It really helps to have words around it. Like here's what you're seeing in this plot, or here's what I'm noticing and pulling out from this plot. So I think if you're, um, if you're in that place where you're feeling like maybe your, your work is difficult for you to understand when you come back to it, or if you're in that place where you're, um, you're feeling like you're doing all this work, but it's not really like bubbling up to the level, the next level of like the people that you work with to be able to appreciate and understand it and, um, give you feedback on it.

[00:16:42] Then I think that's kind of a, a good sign that it might be helpful for you to think about a way to more easily share it so that people can consume it. Um, and I think, uh, so I think it kind of masks itself in this sort of shroud of like things aren't quite flowing and collaborations aren't quite happening, you know, what could I do differently?

[00:17:00] Um, but certainly R does have. A high learning curve. I think it's a lot better than when I learned it. Um, I think when I first learned it, there was sort of like the core Tidyverse packages were, um, were out there, but I don't even think Tidyverse as a name was out there yet. So I think it was like I was using Gigi plot to, and I was using some of the, um, you know, dly package functions, but it wasn't all knit together.

[00:17:24] And I think, you know, for me when I was teaching and teaching a lot of beginners, um, I found that to be a more, um, a more pleasant on ramp for people, especially people who don't have any background with any kind of coding before, if they're coming from the place of like, I need to do Excel plus then I think, um, that's a nicer on ramp for them personally.

[00:17:45] I think, I think we might share that ethos but uh, for me, I think that's made

[00:17:49] David: Yeah. /I mean, yeah, it almost sounds like, I mean, Tell me if I'm kind of misinterpreting what you said, but one thing I heard is that in some ways using our markdown, it, it forces you to kind of verbal, well, not really verbalize forces you to type, um, you know, kind of what you're doing. And in that process can actually help you to get clearer about, you know, what it is that you're doing, because if you can't articulate it, you know, in your own, like to yourself, then it's obviously you're gonna struggle to articulate it to others.

[00:18:18] So as opposed to just having a script where you're like, oh yeah, I know what this does, but then if you actually are forced to articulate it, you might struggle our, our markdown basically forces you to do that step of, of articulating it.

[00:18:29] Alison: /Exactly. And you can imagine that even if you, you know, even if you're not able to actually share like the, the rendered artifact of, you know, an, our markdown process, which is usually like some kind of static file that people can't really interact with too easily, but they can consume it. Um, even if it's just like you using it as an interactive workspace, you know, you can imagine that, especially in today's like.

[00:18:49] Fully remote asynchronous work environment. It's so much easier to be able to hop on a video with a colleague, share your screen and be able to walk them through like, okay, here's where I did, you know, some data exploration. And if they say, well, wait, wait, wait, wait, wait. Like, what did you look at missing this?

[00:19:03] Or something like that. You can just hop back and be like, oh, right, right. You know, I can just kind of skip to this little, um, place where I have a bookmark essentially that says like, this is where I did the missing this analysis. And you can say, okay, let's take a step back and talk about missing this first then and go through that.

[00:19:17] And then they can say like, oh, but you might have missed that. You know, maybe this participant or this, you know, this set of participants maybe should be excluded from this analysis because we figured out that the, you know, maybe the, you know, the equipment that we were using, wasn't correctly calibrated before that.

[00:19:30] And we haven't let it, you know, get into the data dictionary. Like there's all kinds of different things that you can get from that collaborative experience that are a lot easier if you've got the code and your rationale behind it in, in one place so that you don't have like, otherwise what you end up having, or at least what I did, which is not optimal is I had a script.

[00:19:47] And then I had like, Little files that were describing what each script did. and we're like, okay, this is where I did this. This is where I did this. And then here's some notes about like, oh, this was weird. I should go back and check on this. And then it, the fusion of those two things like those never line up again for you.

[00:20:01] So if, even if you're like the most meticulous and paranoid documenter, um, that is an unhappy workflow.

[00:20:09] David: /Yeah, definitely.

[00:20:10] That's great. Um, well, if people wanna learn, uh, more about you, um, what would be the best place for them to do that?

[00:20:19] Alison: Uh, my website, my website is a pretty complete resource on me. so it's https://www.apreshill.com/ Uh, and I need to update it but, uh, it has a list of all the talks that I've given along with slides and video recordings, if they were available and also links out to all my projects and then a, uh, I would say that I blog, uh, about every six months, maybe when I get the time I have a five year old. So, um, I am, uh, I am not prioritizing that right now, but you can see some of, uh, the things that I've worked on, both at our studio. And since then, um, uh, on my blog as well.

[00:21:00] David: And just to point out to folks your website is made with, um, blog down. Right.

[00:21:08] Which is kind of a related package to our markdown. Um, and for obviously for making websites. So,

[00:21:16] Alison: for making websites. Yes. So I'm one of the co-authors of the blog down book. So, uh, blog down, creating websites with our markdown is, uh, a great resource. If you want to start, pretending like you're a front end developer you can, you can just kind of bypass the front end developer stuff and just go to making fun, cool websites.

[00:21:35] Um, it's pretty empowering. And if you're in the need for developing a personal website, it's a really great, um, great package for that.

[00:21:46] David: Great. Well, I'll include a link, uh, to your website, uh, and, and the book and the, the show notes for this episode. Um, well, Alison, thank you so much for joining me today and sharing about our markdown.

[00:21:59] Alison: Thank you, David. It was really nice to see you.

[00:22:04] Thanks again for listening. I hope you found this conversation. Interesting. If you have any feedback, I'd love to hear it, David, at our, for the rest of us.com. Thanks.

Sign up for the newsletter

Get blog posts like this delivered straight to your inbox.

Let us know what you think by adding a comment below.

You need to be signed-in to comment on this post. Login.

David Keyes
By David Keyes
February 20, 2023

Sign up for the newsletter

R tips and tricks straight to your inbox.