R for the Rest of Us Podcast Episode 11: Garrick Aden-Buie and Travis Gerke
In this episode, Travis Gerke and Garrick Aden-Buie join me to demystify the process behind developing custom packages in R. Travis is the Director of Data Science at The Prostate Cancer Clinical Trials Consortium (PCCTC), and Garrick is a Data Science Educator and R developer at R Studio.
During the discussion, Travis and Garrick highlight the numerous benefits of having a custom package, including making it easier to access data, automation & documentation of functions, and enhanced learning opportunities for R users seeking to upskill. They also delve into their own experiences working together at Moffitt Cancer Center, discussing how their set of custom R packages helped alleviate data reporting pain points within the organization.
Learn more about Garrick and his work at garrickadenbuie.com and connect with him on Twitter (@grrrck). Connect with Travis on Twitter (@travisgerke) and LinkedIn.
Learn More
If you want to receive emails when we publish new podcast episodes, sign up for the R for the Rest of Us newsletter. And if you're ready to learn R, check out our courses.
Audio Version
Watch the Video Version
The video version has a walkthrough of how to make a custom R package.
Resources Discussed
In my book R Without Statistics, I have a draft chapter on how to bundle your functions together in your own R package. You can check it out here.
Other resources: R Packages by Hadley Wickham and Jenny Bryan.
Transcript
[00:00:00] David: Hi, I'm David Keyes and I run R for the Rest of Us. You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting, to improving your workflow - R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
[00:00:18] Join me and learn how R can help you.
[00:00:22] Well, I'm joined today by Travis Gerke and Garrick Aden-Buie. Uh, Travis is currently the director of data science at the prostate cancer clinical trials consortium, and Garrick is a data science educator and our developer at our studio.
[00:00:38] Um, they actually both worked together previously at the Moffitt cancer center, um, and they developed their, uh, a set of custom packages, which is what we're mostly gonna be talking about today. Um, but before we get into that, maybe if I can start out by just asking each of you, how you initially got into R and what benefits you found from using R compared to whatever tool you had been using previously, maybe Travis, if you wanna.
[00:01:07] Travis: Yeah. Sure. Um, so I love the, the title of part of, part of the title of your series. What is it the R without statistics? Um, so I'm going to be kind of like, not that the way I fell into R but what's interesting is that I'm mostly back to, not in the statistics mode of using R these days. So I first picked up R was probably 2008 or so when I was studying statistics in undergrad, um, at the university of Florida, then I wanted to graduate school and used a lot of R and biostatistics and later epidemiology. So it's kind of, I think I have one of the more traditional paths into using R which is statistician and it's probably the best statistical software out there. Um, and it was of course, an alternative to SAS. I was, I was originally trained in SAS before even are, um, are just much more readable. It's much more cost friendly, so quite happy to be in this space now.
[00:02:00] Um, I'm a former academic. So I, I was faculty at university of Florida and then later faculty at Moffitt cancer center. Um, but then I, I left academia and I'm now in more industry facing roles, currently facilitating clinical trial research, um, doing the data science kind of work on that side. A lot of which isn't really statistics.
[00:02:19] So a lot of it's really in clinical trials, it's a lot of counting and tabling and, um, writing elegant reports in our markdown. So quite, quite happy to talk about the use of R without statistics, cuz uh, I've, I've had a lot of impact in that particular area.
[00:02:32] David: Great. Thanks, Travis. Um, Garrick, maybe if you can talk about how you kind of first got into.
[00:02:39] Garrick: Yeah, I, so probably somewhat similar in the sense that I started. Um, I started using R in grad school. Um, we had a. Uh, you know, at the time I think, you know, machine learning and, uh, AI were kind of cool things to, to, to get into. And so they developed a new course that they were, uh, that a new professor was teaching.
[00:03:00] And I was like super excited for this course. And then it was, uh, just like lots of math and not much programming. And almost immediately I was like, I need like to actually do this. I need to learn. you know, program language is where I can, I can do these things. Um, and, uh, and I, and for some reason, R was the thing that I picked up. Um, I think probably because of the same, that happened to be the same year that, uh, Roger pings, um, Coursera course for R dropped. And, um, and so I did that and I learned, um, I learned R from, from there, it was actually a really great experience cuz um, you know, you have the, the chance to. Learned the programming language, but at the same time I had the, the course that I was in was giving me projects to work on.
[00:03:47] And, um, so I had, I had a way to like, apply what I was learning almost right away and just sort of jump straight to, like now I'm doing things, but I, but ours, the way that our works with data and the way that you, or rather the way that you work with data with R always spoke to me. Um, and it, I feel like it just sort of, uh, you know, working with data and R just sort of like clicked at the same time and. So that, that was probably around 2012 and I've just been kind of, uh, yeah. Figuring out ways to do everything that I can with R as much as possible since then.
[00:04:21] David: Yeah. And I've heard, um, you Garrick, call yourself a full stack R developer, which is not a term I've heard, uh, many or, or any other people use. I I'm curious. What do you mean when you say that?
[00:04:34] Garrick: Yeah. That's so in the web development world, there's this idea of a full stack web developer. And that's a person who there's a very clear division in the web world of like front end, the things that you see in a browser window and then backend, which is like the servers that run, uh, the websites that you're using. And, um, And like, there's this weird division where, where, uh, or natural division where web developers like fall into one camp or the other. And then there are these people who kind of span everything. And, um, initially it was that I was using shiny. Like, I think shiny was kind of for me, the, the real. Kind of shining our markdown were the things that really, um, brought me into the R world.
[00:05:14] And so it was a little bit initially it was a little bit of a nod towards, uh, like, you know, I can use shiny and be a full stack web developer, which is totally true. Um, but also I kind of realized at the, at a certain point, you know, there's. Yeah. And, you know, people kind of think of R as, as a programming language for statistical analysis or that kind of thing. And, um, and there's so much that you can do with R and, um, there's so much that you can so many different places where you can use R or, um, or solve problems with are that, uh, that I, I kind of realized like at, at a certain point, you know, Um, been involved in, in, in lots of different flavors and variations and of you are in lots of places from, you know, um, high performance computing centers to.
[00:06:03] Like to, to, you know, servers and, and for running websites and for, uh, generating reports and for just tinkering or doing like solving problems on my own computer. And I kind of realized, like, I don't know, I, I feel like full stack, uh, developer kind of speaks to that, like a little bit of everything, but also at the same time, there's a little bit of tongue in cheek that I'm not, uh, uh, I'm not exactly a specialist in anything either.
[00:06:28] David: Yeah, that makes a lot of sense. Um, and I, I identify with that as well, because I've, I've found, you know, I like to kind of figure out, you know, how I can use R for this and that. And, and you're right, that, you know, at this point there is so much that you can do beyond, know, just statistics, which is what many people kind of assume.
[00:06:49] Um, Travis I'm, I, I I'd say I'm, I'm far less familiar with the nuts and bolts of your work and what you do kind of on a, on a daily basis with R can you talk through, you know, maybe an example of what a typical day and how, um, what, where are fits into the work that you do.
[00:07:06] Travis: Sure. So I'm, uh, not a full stack. Our developer like Garrick I'm mostly, I'll call it a coattail, our developer, which means I follow around smart people like Garrick and use the things that they develop to actually make cool stuff happen. Um but in practice, in the clinical trial space, um, these days, the, the process of making when, when a trial's done or when it's underway, you have to produce very regular kinds of reports.
[00:07:34] Um, that traditionally has been a SAS dominated field. Um, and then they're often not overly statistical. Um, it'll be things like let's count the adverse events that are happening, um, in each, maybe the treat. Been a placebo arm and let's make sure that the, the patients that are receiving treatment are, are safe and, and we can monitor them and, and, you know, make sure that the trial's going well, um, with regards to safety.
[00:07:59] And so there's, there's something overly complicated there, but in the SAS world, the kinds of reports that you get, um, from that sort of process tend to be. Like pretty, um, they're very, asky looking and kind of just like, like an old dot matrix printer from the eighties. And I, and I think, you know, just
[00:08:18] out of practice, I think there, there are people in the FDA and everyone else that are very good at reading that kind of output because they've been doing it for decades now. Uh, but with our markdown, we, we now have an opportunity to make these things look and feel very modern. Like as if you landed in an HTML webpage and it's interactive and it's, and you know, tables are color coded and. We've been producing things these days, that our safety monitoring boards, um, they'll see our reports and they'll go, oh my goodness. I'm so glad that this, you know, had this particular theme or, you know, was color coded like a 5 38 publication, because it's able to very quickly see that we need to keep an eye on this particular patient or, you know, this particular arm of the study because I can quickly see somethings going on there.
[00:09:00] So that's a really good feeling when that, when that happens, um, that they're actually having a real translational impact on patients that are actively receiving cancer care. Um, so yeah, it's much like care is kind of saying just the, the broad, the breadth of what you can do with, with our these days is so much beyond statistics that I, that I've kind of put to the side and, and I'm just quite happy to make nice looking reports.
[00:09:26] David: Yeah, that's great. Well, let's transition now from talking about the work that you do today to the work that you did when you were both at, uh, the Moffitt cancer center. Um, I'll let you decide who the best person to pick this one up is, but I'm, I'm wondering if you can kind of explain a bit in terms of the work that you did there, and we're gonna talk a lot about, uh, a set of custom packages that you made.
[00:09:49] So I'm curious if you could dive into what was, what were the pain points that led you to think about making a package at Moffitt in the first place? So I'll, I'll lead to you to decide.
[00:10:00] Travis: Sounds good. I could probably
[00:10:02] Garrick: I'll let Travis take the first half of.
[00:10:04] Travis: Yeah. Yeah. Cuz yeah, that's a good
[00:10:06] way. Cuz I was kind of on the coordinating side of like what strategy and then Garrick was very much the technical implementer of, of the strategy. Um, so at, at Moffitt at the time, um, like many cancer centers or even just, um, Trisha hospitals, they were thinking about how do they store their electronic health record data, uh, for research purposes. So any hospital has like this large. Behemoth of a, of a EHR system, usually it's epic or Cerner, the two kind of giants in the space and they, and they're very unwieldy. Um, and, and you can't really for, for regulatory and other reasons directly access those for research purposes often. So usually you have to kind of like lift and shift that data into a separate data warehouse, um, to, to make it useful or even usable at all for, for research. So they were pretty forward thinking at Moffitt at the time, they said, let's, let's move away from the classic kind of on-prem Oracle flavor, data stack like a warehouse. Um, that's been kind of the standard for, for many, many years. And they said, let's instead do the cloud thing, like, you know, like FinTech and other and other spaces are doing. And so we, we spend a lot of time strategizing around how we will put things into AWS and then put it into a snowflake data warehouse, um, so that we could actually. Get more efficient ELTS so extract, load transform for those in the data engineering space. I don't want to go into any kind of jargon, but, um, kind of, kind of a more efficient flow of the
[00:11:41] David: data.
[00:11:42] Travis: So they were going to put everything in AWS and, um, load the data into a snowflake data warehouse, which is many people have probably heard of snowflake these days is just a fancy cloud based data storage system. Um, so all that's set up, uh, and, and the data starts to move into there. But then the next question is, well, well, what next? Right? Like how, how do we access that data? How do we get business insights from it? Like, you know, research data deliveries, do we, how do we facilitate that process? Where an investigator at the cancer center might come to us and say, Hey, I want to know,
[00:12:16] like colorectal cancer patients have a certain kind of genetic mutation. Uh, I wanna see how they're doing on this particular therapy. Like what next? So all the data is in the snowflake and it's in AWS and then we need to facilitate that transaction. And that's where the kind of R piece starts to, to fit in and care can take over
[00:12:37] Garrick: Yeah, I think, um, I think the one piece that I'd like to kind of fill in is that we also like, uh, so Travis was working, um, as the, you were technical director, I guess, of the,
[00:12:51] um,
[00:12:52] Travis: yeah. Scientific
[00:12:52] director. Yeah. Right.
[00:12:53] Garrick: I was a part of
[00:12:54] Travis: Correct. Yep.
[00:12:56] Garrick: director. Yeah.
[00:12:56] Travis: Close enough. Yeah.
[00:12:57] Garrick: Um, yeah. So I think, I think the, the part, the point where we started thinking about developing packages, uh, was around, um, looking.
[00:13:06] So, so we had this core group of, of, uh, can Travis, can you remind me the, the full name of the core?
[00:13:13] Travis: Collaborative data services core
[00:13:16] Garrick: Perfect.
[00:13:17] Travis: that.
[00:13:18] Garrick: data services core. I love the name. Um, so in, in general, researchers at, at Moffitt cancer center would come to this core, um, asking for specific types of data. And the idea would be that the core, the cDSC would, we would go out and then, uh, interface with whatever data systems had that data and sometimes pulling from multiple data sources. Um, eventually, the idea is to have everything end up in one place. Um, but there's a, there's a. It like an intermediate layer where we, um, had to, as a, as a group, we had to understand the research needs the data needs of the researcher, and then translate that into actually. Getting the data from somewhere and putting it together into a structure that ultimately would still be useful to the researcher. And wasn't just like, you know, Hey, I grabbed everything that I, that looked like it was useful and now good luck figuring it out. So like the, the quality of research really depends on that transaction.
[00:14:17] And, um, and the ability of the, uh, of that intermediary group to sort of like really understand the, the research problem and also, uh, and also the data. and it's, that's also something that's much easier to do if you are writing code, um, these, those kinds of requests, uh, come in often they're often repeats.
[00:14:41] So people, you know, you, you learn one thing to solve, uh, a problem for, for one researcher. And then pretty, you know, I, within a relatively short period of time, another researcher will, will want something very similar. And, um, and if you're writing in code and if you're doing that in code, you can build off of what. what you've done before and what you've learned, or you can share knowledge much more easily. Uh, if it's all done through Excel and through clicking and, you know, UI interfaces, then, uh, then every problem you're basically approaching it to relearn it again. Um, or it's somebody new coming and relearning, um, a new, a new thing and you don't really get to like build off of what you've done before.
[00:15:19] So that definitely led to us saying like, how do. Uh, how do, how does we get a, a buy in to learn our, and to get our, these employees to learn our, and also, um, to start using our as a, as a culture and especially knowing that ultimately we were going to end up with data stored somewhere, um, where, uh, where an interface through code was going to be even more important.
[00:15:45] David: Yeah. And, and it is, I mean, for people who don't know, people, who've never made a package, a package is really a, like a set of functions that you create and kind of package up. So I'm curious what. You know, you talked Garrick about the example of people, emailing you say, and asking for things, and then somebody else the next week would email you and ask for something similar.
[00:16:08] Why not just keep a series of snippets of code, you know, somewhere in a file and just send them those, those snippets, like what, what's the value of actually putting it into a formal structure to make a package?
[00:16:21] Garrick: Yeah. There's a lot of, there's a lot of value. I mean, first, so you're you're right. That's definitely. The first thing that you get almost right away is that, uh, it makes it easier to share your code with someone else. So, um, uh, but I, so
[00:16:38] the setup here for us was a lot about sort of the culture, like creating a culture of writing code and of, of learning R and, um, And so for me, the the choice to start using a package was really about making it easy for us to share the code for us to have a place to talk about code that we were writing, and um, for us to share best practices.
[00:17:01] So another, another advantage of, um, of having a package is that you're not just sharing, like with, uh, among colleagues, but somebody who, who maybe is a higher-level of developer. Can kind of lead the way and, and, um, and kind of bring everyone else with them. Um, and so those were, those were some of the, the things that we were really looking for, uh, when we started writing packages, as, you know, a way to, um, kind of bring everyone together around the same code, as opposed to bringing people together around disparate pieces of code that are in various emails or hiding in
[00:17:40] and.
[00:17:41] David: So it sounds like a packet.
[00:17:43] Travis: good point. There's
[00:17:44] David: Go ahead,
[00:17:45] Travis: it's such a good point. There. There's the parallel with the word document. Of strike, where you get, like people are working and emailing word documents back and forth. You end up with like underscore final underscore final fall, two, two final. The same thing can happen with code.
[00:17:58] Um, if you don't centralize it and drop it into a package where people know that that's the source of truth.
[00:18:04] David: That makes a lot of sense. And it's interesting, cuz you both talked about how putting into package was not just about, you know, making it simpler for say for people who are already using R but also kind of getting more buy in and, and getting people on board to, you know, who maybe weren't using our were just using R a little bit.
[00:18:22] Can you talk about how having a package facilitates people kind of upskilling their, there are skills.
[00:18:31] Travis: I could think of one Gary right
[00:18:35] away and
[00:18:35] it's
[00:18:36] Garrick: Um, the go ahead and
[00:18:37] Travis: oh yeah. It's well, it's where the, um, it's, it's why I kind of jumped to here. I'm so glad you brought in the people part of this. Um, because I jumped right to sort of talking about the tooling that because where my mind was going is setting up the snowflake connections and connecting to AWS, or even if it's Oracle, like, it doesn't matter like what the system is, what the data system is. Uh, there, it's so complicated and nuanced to connect to these sorts of resources. Um, because like, It, it just sucks. And, um, and it's, it's really, really nice
[00:19:07] if it's in a package and someone can just like, you know, if Garrick wrote a package and they could just say, connect snowflake, Moffitt or something like that.
[00:19:15] And then boom, they're connected and they don't have to worry about all that stuff that like administrative sort of overhead. Um, and, and it's the need for, and the utility of an internal package, because of course that, that sort of connector function wouldn't really be useful outside of Moffitt cause it's unique to their data resources and, and their connections. Um, so yeah, I mean that, that's one, one key area where we're just having a package that would take, take away the overhead of like, just setting up the R session. And then, so the people got actually write code and, and kind of do the data queries day to day job instead of like handling administrative
[00:19:49] silliness
[00:19:51] David: Yeah, because it seems like what you're saying is then they can get into actually, you know, working with the data versus spending, you know, hours fighting to try to access the data. In some cases, probably failing to do that. And then being like. it, I'm not even gonna try messing around with R this is too hard.
[00:20:09] It sounds like what you're saying is this, this gives them that jump to, to, to dive right into the actual data portion of it.
[00:20:16] Travis: Yeah,
[00:20:17] Garrick: Yeah, it for me, I think it, it creates a number. A number of gradual on ramps for people who are, who are learning or are working on, uh, getting more skills in, in particular around coding. Um, the, the first is, uh, just the advantage of, um, So there are a couple things that you get by having a package that as you said, you're, you're taking, uh, functions that you've written somewhere and you're putting it all in one place, which is nice. Um, but the, the next thing that you get from that is you get to document your functions so you can write the documentation, uh, and the sync text, lets you write the documentation right next to the function. So you document the function in the same place where you write the code that uh, That the function uses.
[00:21:01] Right. And, um, and then that's super helpful because it always like the package automatically then comes with a resource for people who are learning and, you know, they have, they know that they have some, uh, documentation that they can look to as they're using the functions. Um, also by solving some basic problems to get people started early.
[00:21:21] Um, you know, you know, you have to go through the, the, and the initial. Effort of setting up the package and getting going and everything and, you know, installing all the software and all that. But for the most part, uh, once, you know, if you can, you can automate a lot of that too. And then you just sort of say, here start, you can use this function and you get people, um, much closer to solving the problems that they want to solve much faster.
[00:21:46] And so that's, uh, that's huge because then, um, they have, they, they see the value in the utility of writing code quicker. Um, instead of struggling through, uh, basic examples or struggling to get the basic things working by, by having it work, uh, more quickly. And by getting much closer to like, this is why you'd want to be using this in the first place. Um, it's much more motivating. And then the, the next part is once they're used to, you know, how the package works, once they understand how functions work and as they gain more skills, then there's a place to turn when you want to actually learn, uh, like, okay, well, what are you doing behind the scenes? You know? Um, and it's not a, a, it's not a, a black magic box that, uh, you can't look into. It's uh, the, the code for whatever it is that you're doing or that you've learned to do is there it's part of the package and you could go look at it and that gives you an on-ramp to, uh, collaborating together, um, with the people who are, who have been writing the package. And, um, and at that point now you're, you're able to sort of share the code. You're at least able to learn from someone else's code. Um, and then, you know, from there you can start, maybe you notice that there's something that the function, uh, does wrong or should do differently, or you have an idea about a new feature and, and you have a place to put that idea and start working on it.
[00:23:07] And, and then, you know, it's. So like very sneakily and you have these on ramps towards getting people to use code more and to, uh, and to, to, you know, solve their problems and, and, um, and see the value of, of using code. And, and you get that from writing a
[00:23:26] package.
[00:23:26] Travis: And I'm, it's, it's a good point about the, you. Reducing friction early on. I know JD long gave a very cool talk in the 2019, our studio conference, something like empathy and community practice or something like that was the title. Um, but, but he had this figure that stuck with me, which was like, what we really wanna do anytime we're writing code or, or building a community around programming is, is reduce the amount of time it takes for someone to start kicking ass.
[00:23:53] Um, and so that's exactly what Garrick was doing with, with those packages is like they could fire something up and like, just with, with a handful of helpers instead of like slowing through, ah, God, what do I gotta, you know, how do I connect and how do I,
[00:24:07] how do I set up the state of frame and make these things happen?
[00:24:09] Like within an hour they could be like kicking ass, like, oh, this is it. Of course I should have been using R this whole time. Um, because this is, this is so, uh, streamlined and easy.
[00:24:19] David: Yeah. And I love that idea, you know, cuz I think a lot of organizations ha I mean, it's natural that they'll have some people who are say more advanced our users, whereas other people, you know, are just starting or, you know, have only been using it for a bit. And so something like a package allows you to like kind of leverage the skills of those who are, are further along to help those who are just starting out or newer to, to really kind of, you know, move, move more quickly than they would be able to on their own.
[00:24:47] I think that's such a, a good use of, of having a package. Um, One other thing I wanna ask about Garrick I, uh, was, this was last year I asked about, um, what were some of the main things that packages, uh, internal packages that you've seen do. Um, and you responded that, um, the package coerce your coworkers into following your opinionated best practices, uh, by making my version of the happy path as easy and automated as possible.
[00:25:17] Can you explain, explain yourself
[00:25:21] Garrick: Yeah, absolutely. Yes. This is my, definitely my favorite part about writing packages is that, uh, you, you get to define the workflow a little bit. And, um, and so there, so I'll, I'll talk about like the first part is if you're on your own and you are working in a group and you're like wanting it to go, uh, a certain way, like in this case, I was sort of, you. Sort of similar to the, like the loan data scientist, trying to bring other, you know, people along with them, um, you, in those cases you can be like, well, I really want this whole workflow to sort of work like this and you can write functions that make it happen. And, um, and then you sort of teach people how to use those functions.
[00:26:04] And, um, and the happy path is the easy path. So, you know, Especially when, especially when there's a lot of value in that. So when you show people like, oh, this thing that you've been doing is taking a long time and, um, and you can do it like this and it'll be faster, but there's also that little bit of, of. um, behind that, where you have some, as the package author, you have some control over that, that workflow and what that workflow looks like. And so you can kind of, um, coerce people into doing things the way that you, you, uh, you feel is best, but the, so that that's the sort of sneaky version, but the more, um, Collaborative way of framing.
[00:26:43] This is also that if you are in a situation where you, so packages can still be useful when you, even when you don't really know what to do, um, or you don't really, you don't have a defined workflow yet. So, um, an internal package can be a really great place to have the discussions about, well, how are we going to do this?
[00:27:03] So as you approach. Difficult technical problem. Like how are we going to get everybody connected to snowflake and how are we gonna make it possible for everyone to use our, our data warehouse or what are the tables in the data warehouse going to look like having, having a package as a place where you start defining what that workflow and what the workflow looks like gives you a place to sort of. Uh, effectuate your design decisions and see what they look like and, um, and can really kind of bring the conversation together in a way that just sort of like talking through what the process looks like can be, uh, can be a lot more difficult. and um, and so like, I I've seen internal packages be used in that way where, um, Wait a group of people come together and say, we have to do this thing.
[00:27:48] We don't know how we're going to do it. We don't know how we're gonna teach other people how to do it. And, but we can kind of get together and talk about, well, okay, so this is how it should work. And then you start defining functions that lead people onto the happy path that you're designing. And, um, and then, uh, and then you educate them using the help materials that you've written for the package.
[00:28:06] And, um, and it really, it really brings together a lot of different, um, people and processes.
[00:28:13] Travis: You know,
[00:28:14] this, this notion of coercion, I've been thinking about that from package design as, as a user of, of a package lately. Um, and I'm very grateful for it. So, uh, as the, a lot of the field right now is moving back towards basics in the sense of like, we should all just be writing SQL. Um, and so I, I don't, I haven't used like a ton. Cool in the past. Um, but enough to like make me dangerous. And then when I, when I returned to a couple months ago, I realized that by, um, in the design of the deep PLI package, the, the Tidyverse developers chose design elements that more or less coerced me into learning sequel. So that now I'm very good at another language without even having known it.
[00:28:55] And of course they chose that as the happy path, because sequel is a very good language and it's, and it's, you know, the, the structure of it and, and the CAX of it is, is very, very smart. Um, and so they chose the happy path. I learned the happy path without even knowing I was doing that. And later I go, oh my goodness.
[00:29:11] Like, I, it, it almost made me multilingual in a programming sense. Um, just unintentionally, which is, which is really, really awesome.
[00:29:20] David: Yeah, that's great. Yeah. I mean, I was talking to Kyle Walker, developer of tidy census a couple weeks ago, and he was saying, you know, one of his motivations for making tidy census was he wanted to get data programmatically from the census bureau, but also he wanted it to be in a tidy format, cuz he feels very strongly that's the best, um, way to work with data.
[00:29:41] And so, you know, his package, which he initially said he developed just for himself, but has since, you know, become quite popular is, is opinionated in so far as it's not just for accessing census data it's for accessing census data and bringing in, in a tidy, in a tidy format, which I think is a good example of kind of exactly what you're talking about.
[00:30:00] Not obviously in an internal package, but in a package used, you know, more broadly.
[00:30:05] Garrick: And there are also ways with like ti sentences is a great example where there that, that is a complicated data set. It is difficult to download the raw data set from the us census and put it together in the appropriate way. And if you do something wrong in that process, you could end up with, uh, results that are incorrect or misleading.
[00:30:24] And by making sure that it's, uh, done in a way that is. You know, that follows the opinionated path that makes it easier for it to, for the analyses that come out of it to be accurate. Uh, it also is easier to audit because it's in a package and it's easier to, to, make, to supply fixes because it's in a package and the community can step in and say, wait, this, you should, we should do this over here slightly differently or whatever.
[00:30:49]
[00:30:50] David: Cool. Well, um, are there any question before we sign off, are there any resources that both of you might recommend for folks interested in learning about making their own packages?
[00:31:01] Garrick: Yeah, definitely. Um, I, I definitely have to jump in front of Travis here on this one and just say our, um, our packages by Hadley Wickham and Jenny Bryan is excellent. Um, it's a, it's a great introduction. It's a great resource both at introducing you to the idea and the concepts, uh, behind, uh, writing packages. Um, it's also a great resource afterwards when you need to go back and look and figure out what you were, uh, what you're supposed to do and, um, it's available for free online and, they're almost done updating to the next, uh, edition. So
[00:31:36] it's, uh, very up to date.
[00:31:39] Travis: Yeah, and same, um, I, I use chapter two of that book most often the whole game. So I go it's, it's this awesome, awesome chapter where they just go through like, step one, step two, step three. And they don't really describe like, why, like that's what the rest of the book does, but they just say, if we wanna make a package here and like 10 pages is exactly how you do it.
[00:31:58] And I follow that all the time. No matter how many times I've written a package, I still refer to that. Um, and, and make sure I'm not skipping a step somewhere along the
[00:32:05] David: .
[00:32:06] well, , um, if folks do wanna kind of connect with you, see, uh, more about the work that you've done, what's the best place for them to do that?
[00:32:16] Garrick: So you can find me at garrickadenbuie.com, which is probably hard to spell. Or on Twitter? Uh, my username is @grrrck.
[00:32:26] David: Travis, what about you? What's the best place for folks to connect with you if they want to.
[00:32:30] Travis: Oh, yeah. Um, ed Twitter, I'm (@travisgerke) and LinkedIn I'm Travis Gerke.
[00:32:37] David: Well. Thank you both for chatting with me today and I hope, um, this is helpful for folks interested in learning about packages. Um, yeah. Thanks again to both of you.
[00:32:48] Thanks again for listening. I hope you found this conversation. Interesting. If you have any feedback, I'd love to hear it, David, at our, for the rest of us.com. Thanks.
Sign up for the newsletter
Get blog posts like this delivered straight to your inbox.
You need to be signed-in to comment on this post. Login.