Skip to content
R for the Rest of Us Logo

R for the Rest of Us Podcast Episode 8: Matt Herman

In this episode, I chat with Matt Herman about building websites in R. Matt shares lessons from his experience building a self-updating Covid-19 tracking site for Westchester County.

Matt is a Data Scientist at the Council of State Governments (CSG) Justice Center, where he focuses on research and policy analysis. Matt has created automated and reproducible workflows to generate outcome measures and performance indicators for several projects within the justice system.

Connect with Matt: Twitter (@buddyherms); Website: mattherman.info

Learn More

If you want to receive emails when we publish new podcast episodes, sign up for the R for the Rest of Us newsletter. And if you're ready to learn R, check out our courses.

Audio Version

Video Version

The video version has a code walkthrough of how to create a website in R.

Resources Discussed

Transcript

[00:00:00] David: Hi, I'm David Keyes and I run R for the Rest of Us. You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting, to improving your workflow - R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.

[00:00:18] Join me and learn how R can help you.

[00:00:22] I am delighted to be joined today, uh, by Matt Herman. Matt is a data scientist in the research division at the council of state governments, uh, justice center, Matt. Uh, welcome. And I'd love to have you start out by just, um, talking a bit about, um, what the council of state government's justice center is.

[00:00:41] And, um, what's the work that you do.

[00:00:44] Matt: Yeah, thanks so much for having me. So, um, the council of state governments, justice center, or CSG, you might hear me talk about it that way is a nonprofit organization. Um, and we do a bunch of different things, but the, what I'm focused on is doing, uh, research and policy analysis in the criminal justice field.

[00:01:03] And we work and we partner with state governments and all, all three branches of state governments. Um, and a lot of the things that we're doing are analyzing, um, data from different elements of the criminal justice system, uh, police departments, corrections agencies, courts, um, and hopefully turning some of that research into policy recommendations that could guide criminal justice, reforms, and state governments.

[00:01:30] David: Cool. Um, and of course you use R uh, in your work there may, maybe just, uh, I'm curious, can you give an example of like a project you've worked on recently and, and how R kind of fits into that?

[00:01:43] Matt: Yeah. Yeah. We're our research division is about 14 people right now, and we're sort of split evenly between data users and our users. Although we're, we're trying to recruit more to the R side. Um, and, uh, so one project that we just wrapped up working on, um, was based in Montana and it, we were working on a analysis of the, of racial equity in Montana court decisions.

[00:02:12] Um, and so for this project, we were able to get lots of data court records and records from the department of corrections in Montana, um, that they output from their case management system. And then. The R process was all the way of like ingesting it, cleaning it, wrangling it, um, doing some regression modeling, uh, creating charts and graphs and tables for our final report and our presentation.

[00:02:38] So like as far as R goes, like pretty much the whole analytic workflow from getting the raw data to, um, creating reports and presentations happen more or less within our, um, and for this project, one of the, the content of this project was focused on disparities between white and native American people in Montana courts and decision making processes. And one of the findings, uh, that we saw were that pretty consistently, there were these disparities between the native American and white populations where, uh, native American people were more likely to get, um, Sentences to incarceration for similar offenses compared to their white counterparts. Um, and we did a few other analyses at different points in the criminal justice system.

[00:03:26] David: Great. And what is just outta curiosity? Like what does your reporting look like there? Cause we're gonna be talking in a few minutes about kind of websites. I'm guessing you're not building websites. So I'm curious what the reporting looks like.

[00:03:35] Matt: For this project, the deliverables were a couple of, uh, like reports sort of that we ended up writing in word and using the charts, um, generated from our like Gigi plot output to PNG dropped into word. Um, and then also some PowerPoint presentations were very sort of like PowerPoint deck, heavy organization, like, um, and then actually as, as a like bonus deliverable, I ended up making, um, 22, there are 22 judicial districts in Montana and I ended up making a report for each judicial district.

[00:04:11] Um, this like short two pager about data quality, cuz that was one of the elements of the project we were seeing how often race information was collected at different courts. And it varied widely by court in judicial district. So I actually made an, our markdown template that I could, uh, generate these PDF outputs one per judicial district.

[00:04:31] Um, and so that was sort of a, a great workflow. Mark down. And I know work that I've seen that you and your organization has done it's it's so it's such a powerful tool because I like couldn't make 22 of these by hand.

[00:04:43] David: Yeah. Yeah. I, so I assume that was with parameterized reporting. Is that.

[00:04:48] Matt: So yeah, it, it was basically one our markdown template and then the only parameter that was required, there was just the judicial district. And then it filtered the data and generated the charts and all that. Yeah.

[00:05:01] David: Cool. Um, alright, so you've talked a bit about how you use R now, maybe if you could just take a step back and talk about how you initially got into R what you switched from and how it kind of changed your work, uh, when you did start using R.

[00:05:16] Matt: I had a couple of like tentative R starts like in and around grad school where it was. I knew it was something I wanted to try, um, because I had seen really nice graphics and other. Cool stuff produced in R but in my graduate program, this was a master's degree in sociology. Um, none of the classes were in R specifically.

[00:05:38] Our stats classes were taught using sta um, I also took some GIS and geography classes and those were all taught using RGIS and a little Q, G I S but no R but I knew that R could do all of those things. Um, and I, I knew I wanted to learn them. And so, while I was in graduate school, I was interning for an organization called measure of America, which is a nonprofit that uses a lot of interesting census and public data to create maps and like accessible reports about demographic data, and other sort of social data.

[00:06:10] And they were our users there. Um, and so during the course of that internship, I was really lucky to sort of have the time and space to. Learn are without the pressure of like needing to finish something and had had resources and other folks there, um, who could help me. Um, and so I really dove into it during that internship.

[00:06:29] And then during the rest of graduate school, I actually ended up just like doing all my assignments in R instead of STAA, because I don't know, I, I enjoyed it more. I knew that I sort of had this idea that I'm gonna go into the government side of things or the nonprofit side of things where they're less likely to be able to buy state of licenses or whatever it is gonna be.

[00:06:50] And so it just sort of seemed like an, the better path forward while I was already sort of learning in, in that environment. So, um, a lot of it, yeah, it was through that experience of that measure of America. And then just on my own and tons of blogs, stack overflow, you know, our stats, Twitter, like the, the, I think the, um, The like supportive our community and all the content that is tailored a lot.

[00:07:16] A lot of it is tailored to the sort of new learners in our online. Really got me excited about it. I

[00:07:23] David: great. And what were some of the, the major differences for you in terms of the work you were doing? Like, were there things say that you were able to do or do differently when you, when you moved to R versus state or R GIS or whatever other tools you were using? Hmm.

[00:07:39] Matt: um, like in, in graduate school it was, I was able to sort of do comparable things in terms of like running regressions and T tests and all this sort of standard statistics, class type stuff you would learn. Um, but I think where I got excited about it at the beginning a lot was the visualization and the mapping.

[00:07:59] Um, so certainly a Gigi plot, like being able to make beautiful plots is like the plotting is so much. Better in our, compared to Stato or SPSS or some of these other tools that a lot of folks use. Um, and then interactive mapping got so easy. Like, like I think I initially got into making maps in our, using the team map package.

[00:08:24] Um, and there's a really cool feature there where you can sort of specify your map. Um, and then you with like one switch, you can change it from a static map to an interactive map that it like translates that into Leland. I thought that was so cool. I was like, cause I was in RGIS, you know, you, you can sort of do one or the other, but you can't really do both.

[00:08:42] And I, I really got excited about that. I was like, oh wow. Just like with this same syntax and flipping a switch now it's this like neat, interactive map I can cruise around in. So that sort of got me, like, I think excited about it.

[00:08:54] David: Yeah. I mean, I think for a lot of people, um, it's interesting how people, even if their work isn't like hugely data viz focused, the kind of visual side of things is such a draw when they start out. I know for me, like, that's, that's why I got into it. I had seen people make amazing visualizations and I saw that they were doing them with R and I was like, I, I wanna learn how to do that.

[00:09:17] Um, and then it was once I did that, I realized, oh, there are all these other things. I mean, the parameterized reporting that you did before is a great example. You know, things that you can do in R that you just don't even think about until you're, you're in a. Mm.

[00:09:29] Matt: Absolutely. And then, and then another big thing was, um, accessing census data, uh, like programmatically through an API because at measure of America and in graduate school also, like I used a lot of census data for my work and research. And at that time, American fact finders still exists for, like, that was like the way to get information from the us census bureau.

[00:09:53] And it was really hard to navigate website and it's been replaced by data census.gov, which is a different, hard to navigate website. And, um, so I, I found out about originally the there's a R package called ACS, um, for accessing American community survey data. And then it was sort of superseded by, um, Kyle Walker's tidy census, and I got really into that and excited about it.

[00:10:16] Um, and I think that actually like pushed me further into figuring out how, like how our worked a little bit more and got into package development a little. And I started, um, fixing little bugs in tidy census and responding to GitHub issues in tidy census. Um, and just got excited about. The workflow and sort of was trying to tell everyone about it.

[00:10:40] Like my classmates and everything who I saw were struggling with these downloads from the census bureau, I was like, there's a better way. Um, and so that sort of really, I think drove me deeper in into it.

[00:10:51] Yeah,

[00:10:53] David: Yeah, that's great.

[00:10:54] /Um, , so let's talk about, um, a website you made looking at, um, all sorts of COVID related data for Westchester county, New York, uh, which is where that's, where you still are.

[00:11:08] Right for the next little bit.

[00:11:10] Matt: Yep. That's right. Yeah. I'm in Westchester county. It's a suburb of New York city, um, Northern suburb. And, uh, I've been here for about two years.

[00:11:19] David: Great. So, um, I guess maybe first of all, if you could talk about, you know, before we actually dive into the website itself, like where did the idea for kind of making a website with COVID data for Westchester county? Where did that idea come from?

[00:11:35] Matt: um, well, family and I moved here to Westchester county, um, in the summer of 2020. So like that first summer of COVID. And, um, before that I had been in Brooklyn and New York city had done a, the health department in particular had done a really good job of releasing lots of data about COVID case rates and death rates and all sorts of stuff, starting to release data by zip code.

[00:12:01] So you really being in New York, I had a pretty good sense of like what was going on in the sort of pandemic data world for where I was, and sort of using that information. Make decisions about what we were gonna do and where we were gonna go. Um, and then when I came to Westchester, just like one county north of New York, there was pretty much nothing.

[00:12:21] The Westchester county department of health had a Twitter feed that once a day would tweet this like image of the map of Westchester, uh, that had the like total cases by town in Westchester. And then this like table that was clearly copied and pasted from Excel that was like mashed into this image, like, you know, all this.

[00:12:44] And I can totally imagine how someone made this or why they made it. And that the, like, I think this is true across the country. Like these smaller departments of health, weren't never prepared to do this level of like analysis , and so I decided I wanted to try to build something better, um, if there, if the data was available.

[00:13:02] So, um, I sort of scoured the different resources, the Westchester county data that was released of COVID rates by town or municipality in Westchester, the state of New York did some reporting. The New York times did a lot of really good data collection. but the initial, like the initial reason I made it, which was, I, I actually just like wanted to know what was going on with COVID where I was living.

[00:13:24] Um, and then it, it turned into a little bit more because, uh, that was around the same time I knew I was, uh, starting to apply and look for jobs. And I, I also knew that like, if I made it sort of website with all my code online on GitHub that people could look at, it would be sort of a useful portfolio piece that other folks could see sort of what I could make and the type of code I wrote.

[00:13:50] David: That makes sense. So talk about what advantages making a website like this in R offers. Like, you know, I think if someone's not in the R world, they think you, you know, oh, I'm gonna make a website I'm gonna use, you know, I don't know, WordPress or Squarespace. Like what, why, why make something like this in R what advantages does that.

[00:14:12] Matt: Um, well, I, the big thing for me is that I don't really know HTML, CSS and JavaScript that well so the idea to like make a website from scratch, I actually couldn't even do that. But the tools in R for building websites or generating HTML output from sort of familiar R code are so rich and powerful, like you can make standard just sort of like single-page HTML reports from R markdown, or you can make like multipage websites.

[00:14:44] That's what I ended up building. And so for me as like a pretty strong R user, I was doing all of my data collection and manipulation and wrangling and visualization in R. Then I could also just stay in R to create the website and to create stuff that I couldn't make myself in HTML or JavaScript. Um, and the, a lot of the interactive plotting libraries are really good examples of that.

[00:15:07] So there are, are packages that wrap JavaScript libraries like leaflet for interactive mapping or Plotly for, um, interactive other interactive visualizations. And so I don't really know JavaScript, but I know R and so I can write R code and use these R packages that have wrapped the JavaScript libraries to write the JavaScript for me, that I couldn't do myself.

[00:15:31] Um, and so in a way it's just sort of like how I had to do it. if I wanted, if this was the output I wanted to make.

[00:15:42] David: Yeah,

[00:15:42] that makes sense. I think the other thing I was thinking about is the website that you've built, it combines all of the code that brings in the data that, you know, scrapes it or gathers it. And does the analysis builds a visualization and then puts it.

[00:16:01] The website. So you don't have to do that separately and then, you know, copy your outputs to a website somewhere it's all into like every piece of how this website works is integrated in, in, in one place. Um, which makes it really efficient. I mean, like your website updates itself automatically, um, get bringing in new data, I think every day.

[00:16:24] Uh, and it seems like that kind of thing is possible with R whereas if you were cobbling it together with multiple tools might not be as possible. So.

[00:16:34] Matt: I think so. Right. And you don't need to like. Have a, like a, any sort of server that's hosting your data and the, the, the that's the other really nice thing about the hosting side of it is really nice too, because think all of the, all of the options to create HTML output from our generate static HTML.

[00:16:55] So all of the code and the data and everything to generate the website is a static file that then it makes it really easy to deploy or host. So, um, the website that we're talking about, the Westchester COVID site is hosted on GitHub pages. So that's even another piece. Not only is all the code for like gathering, cleaning, creating the visualizations, joining the website in one repository.

[00:17:17] It's also hosted in the same place too. So it's really like, self-contained this like tight little package that for me is not a web developer makes it really easy to handle.

[00:17:30] David: Yeah. Um, so you for this website used, uh, the distill package and I'm curious, you know, there are multiple approaches. If you wanna make a website from R um, there's blog down, um, Probably some others that I'm not thinking of. Um, but there are different ways to make kind of websites or HTML documents actually even book down.

[00:17:53] For example, I was giving a presentation the other day on book down and I was thinking like, there's no reason why you can't make a, a website with book. I mean, that's all, it is just a series of HTML files. Anyway, I'm curious why, um, you opted for distill to make your website versus any other package.

[00:18:10] Matt: Um, I think when I started design, so this was like fall 20, 22. I think when I was first building it, um, distill as the framework seemed really had some really nice features and flexibility that I liked. Um, some of the really. Sort of simple things that are on the website, like having little, um, texts in the margin of a web page that's like built in, that's like a C CSS class built to distill, or the way that you can size or resize images and other sort of layout type stuff that you certainly could do manually in CSS and HTML, but like a lot of that was built into distill already.

[00:18:53] So it seemed like a pretty good framework that could do enough of the formatting that I would want to do. But sort of all staying within distill. Um, I think at this point now, like almost two years later, I might consider building it in, um, Corto or car Corto I guess that's how it's pronounced this like new sort of multilanguage multilingual. Product from our studio, I guess that has a lot of, some of those same features of like HTML layouts from distill, um, like the image sizing and the page sizing and the text call outs and a bunch of other nice stuff is also built into Corto with some other features. So I might consider using that as a platform.

[00:19:35] I,

[00:19:37] David: Yeah, that makes sense. Um, I know I've always personally gone with distill. I've just, I've never actually done anything with blog down, which is the main other one that I hear people talk about, but I just hear people talk about like trying to update something and like breaking their site or, or that kind of thing.

[00:19:53] And distill is really much more straightforward. It seems like, um, for,

[00:19:57] Matt: I think so. And for me, just understanding how it worked like each page is its own our markdown file that gets rendered to an HTML file. And there's a little bit of like config YAML configuration that tells you where to put everything. And other than that, so yeah, like you said, it's sort of. Seems pretty straightforward and simple, which again, as not a web developer, I really like

[00:20:20] David: Yeah, that makes sense. All right, cool. Um, well, this has been great. Matt, thanks for joining me today.

[00:20:25] Um, that was super like, I actually, there were a lot of things about your website that I didn't quite get that I actually get now.

[00:20:31] So, um, yeah. That's, that's awesome. I mean, I know for you, it doesn't you kept saying like, oh, this is like relatively simple, but like, it's actually like pretty complex in terms of

[00:20:42] Matt: Uh, yeah, no, it totally is. But it all sort of builds. Like it starts simple, right? Like when I first built it or if I was just building one page, I mean, the, the, the, that workflow is always the same though, which I like how simple, like get the data from somewhere, clean it up, save it, and then use that data in something to make something else.

[00:21:02] Like that's sort of like that classic our markdown workflow.

[00:21:07] David: Yeah. That's what I always like when I, I mean, when I work with organizations, cuz that that's like, people have to understand the basics first, but like once they don't understand the basics, I'm always telling them like separate out your, your kind of data, importing, cleaning that step, you know, do that in like a data raw or, or folder, whatever, and then spit out your clean data to a data folder in whatever RDS or CSV or whatever.

[00:21:33] And when you are marked down only reference the data folder, um, that's such a,

[00:21:38] Matt: I think your book is gonna be great, cuz like, in terms of like what I use R for it's like very little statistics it's like almost none, like at most like some regression

[00:21:49] David: right, right, exactly. And I think that more and more as ours becoming adopted more widely, there are a lot of people like you, or like me who don't use it for, you know, any kind of complex statistical modeling and it can offer a ton for them. People just need to understand like what it can do.

[00:22:06] So hopefully, hopefully my book will, will show them that that's my goal. Um, alright cool. if people wanna connect with

[00:22:16] Matt: Yeah, I, I have a very limited presence on Twitter. um, buddy Herms is my, um, is my Twitter handle and I, I have a very out of date, personal website that I, that is built. I potentially using distill, um, that I should update I will,

[00:22:35] David: Well, good. Well, thanks again, Matt, for joining me. I appreciate it.

[00:22:38] Thanks again for listening. I hope you found this conversation. Interesting. If you have any feedback, I'd love to hear it, David, at our, for the rest of us.com. Thanks.

Sign up for the newsletter

Get blog posts like this delivered straight to your inbox.

Let us know what you think by adding a comment below.

You need to be signed-in to comment on this post. Login.

David Keyes
By David Keyes
April 25, 2023

Sign up for the newsletter

R tips and tricks straight to your inbox.