R for the Rest of Us Podcast Episode 25: Robert Smith
In this episode, I chat with Robert Smith, a health economist and co-founder of Dark Peak Analytics. Rob shares his unique career path from academia to advising the UK government during the COVID-19 pandemic, where he helped shape public health decisions through data modeling.
We also discuss his work with Parkrun, a free community running event, and how data science can be used to promote equitable access to fitness. Using R, Rob and his team analyzed participation trends and developed models to help Parkrun expand into underserved areas.
Listen to the Audio Version
Watch the Video Version
You can also watch the conversation on YouTube.
Important resources mentioned:
Academic Papers:
https://wellcomeopenresearch.org/articles/5-9
https://www.sciencedirect.com/science/article/pii/S0033350620304066
https://www.sciencedirect.com/science/article/pii/S1353829221001222
https://link.springer.com/article/10.1186/s12889-024-20420-0
https://www.medrxiv.org/content/10.1101/19004143v1.full.pdf
Connect with Robert on:
LinkedIn: Robert Smith
Learn More
If you want to receive emails to help you on your R Journey, sign up for the R for the Rest of Us newsletter.
If you're ready to learn R, check out our courses.
Transcript
[00:00:25] David Keyes: In this episode, I sit down with Robert Smith, co founder of Dark Peak Analytics. We talk about his journey from academia to advising the UK government during the COVID 19 pandemic. Then we move into talking about Parkrun, which is a community effort to encourage more people in the UK to be active.
Rob worked with the folks at Parkrun to help promote equitable access, and he used R to do this. It's a fascinating story, so let's dive in.
[00:00:55] David Keyes: I'm delighted to be joined today by Robert Smith. Rob is one of the co founders of Dark Peak Analytics, a small consulting company that specializes in data science for health economics. Rob is a health economist based in Sheffield in the UK. His research focuses on the methods used to estimate the costs and benefits of public health interventions with a specific interest in micro simulation modeling in R.
Rob, welcome and thanks for joining me.
[00:01:24] Robert Smith: Thanks for having me. It's great to be here.
[00:01:25] David Keyes: So that's, that's the kind of, you know, two sentence overview of who you are. I wonder if you could flesh that out a little bit and give a little bit about your background and the kind of work that you do.
[00:01:37] Robert Smith: Yeah, sure. So I, um, I actually started out, uh, did my undergraduate in economics, was going to go down the kind of the typical, I guess, sort of typical finance route for someone who's done a degree in economics, um, but got very interested in, in health. Uh, my wife's a family doctor, a GP. Uh, so I got very interested in that.
So I made a transition over to working in finance. Uh, some health services research, health economics, trying to improve the way which we make, uh, commissioning decisions, uh, and in the UK, that's a lot of that's through, uh, of the state funded, um, healthcare service, the NHS, um, and so I did a, a PhD focused on public health economics, which is essentially how do we, how do we make decisions around what interventions, uh, what products, what, uh, diagnostics to fund, um, and, So I spent four years doing a PhD on that topic.
And during that time, the pandemic hit. So I took a break, uh, for, well, no, it's not so much a break, very intense, uh, two year period working for, for the UK government. Advising on on pandemic policy. Um, I suppose I was, I was doing a PhD on kind of mathematical modeling of public health interventions at a time when that was kind of the big question.
Um, so spent, spent 2 years doing that and then returned to to finish my PhD and, uh, and and work full time then on Dark Peak Analytics and sorry, go on.
[00:03:13] David Keyes: was just going to say, I wonder if, I mean, I didn't actually. put together that you worked for the UK government during COVID. I wonder if you could just talk briefly about the work that you did, what, what it looked like, because that must have been, you know, a really, um, interesting time to, to be doing the type of work that you do.
[00:03:33] Robert Smith: Yeah, so that was in terms of, uh, experience of policymaking. Uh, at pace, it was incredible for someone from academia, where often things can be like a little bit on the slower side, like you're having an impact, but that impact might be 5, 10 years down the line, and you're gradually shifting, uh, shifting the dial on certain topics and discussions.
And you may have to wait for kind of your window to open politically for your topic to be the big thing. Um, and often that might take 10, 15 years. um, with the pandemic, this was the question. And so the work that we were doing within the UK Health Security Agency, where I was, I was based, uh, was, was feeding directly through that day.
Um, so some of the modeling work we were doing were feeding through that day into, into number 10. And into, to slide packs to, to the Prime Minister and, uh, Secretary of State, um, Treasury, um, all within 24 hours of the work being done, uh, which, you know, you can imagine if my area of research before was, was modeling non communicable disease, um, if you try and have, you know, time with the Prime Minister to talk about physical activity is, is, is not going to happen.
Um, so, yeah, so it was incredible from that perspective and also to see the different competing interests in Whitehall. Um, and just kind of the political machine at work was, was very interesting. And, um,
[00:04:58] David Keyes: Sorry, what's
[00:04:59] Robert Smith: so Whitehall is kind of the description of, I guess, the equivalent of maybe Capitol Hill and, you know, the uh, the kind of area in which policy is being made.
Uh, so in the, in the, in London, that's, that's Whitehall. Um, and so understanding how the relationships between, say, the Treasury and the Department for Health and number 10 and the Cabinet Office all fit together, uh, as an academic coming in with previous experience in, um, kind of consulting and academia.
I'd never sort of seen that. And so really understanding how, um, even really small things like, um, who gets to sit at the table with the Prime Minister. Um, and there's a list, there's a list that's drawn up and just getting onto that list, even if you have the best piece of analysis, the best piece of work, you might not be in the room and there's people guarding that.
Um, yeah, it's very, very interesting. I found it intriguing.
[00:05:57] David Keyes: I wonder, to the degree, I don't know what you're able and not able to talk about, but to the degree that you're able, if there's an example you could give of some, you know, type of analysis or some work that you did that, you know, ultimately had an impact at that level.
[00:06:13] Robert Smith: so I think we had a big inquiry in the UK, uh, which took a year and a half, two years, I think, to take place, and it means a lot of stuff, a lot of stuff's out in the, in the open now, which is, is great. Um, we, Uh, built models internally. So there were models that were commissioned out to, or modeling work that was commissioned out to the universities.
And in the UK, um, the, the kind of big three that were impacting policy were Imperial, Warwick, um, and, uh, London School of Hygiene and Tropical Medicine. Um, and they were producing analyses that, um, that were fit to the, the data that was coming in. Um, very complicated models, took a long, long time to run.
Um, and then reports had to be written up and so on. And so there was obviously a lag between when the question was posed to when, um, decisions could be made, um, and understanding those very complicated models was, was difficult and conveying them to politicians who might. not have studied mathematics since they were, you know, 18, 19 years old, and suddenly they're sat in a room with a professor of mathematical biology who's trying to explain what a SER model is.
It's very tricky. So we, we built a, uh, essentially a simplified version of the model down. With a user interface, which allowed those stakeholders, so politicians, key figures, um, advisors from number 10 to play around with, with the model, with a simplified version of the model and kind of understand the key drivers.
Um, and the idea wasn't that it was a really accurate predictive tool. Um, I mean, I think we overall discovered that the models. Most of the models were kind of limited in, in how kind of accurate as such they could be, but they at least enabled us to articulate the key drivers of what was going to influence, um, outcomes.
Be that, you know, if, if mixing was, was too high, then you'd expect a higher peak or so on. Obviously, this is, this is still quite a controversial, um, area of research and it's, you know, everyone has their own, their own opinions on things, but, but nobody wanted, uh, for example, hospital admissions to be higher. To rise above a level that the healthcare service could could cope with. And so there was a lot of modeling done around. How do we keep occupancy below a certain level? And what are the thresholds that. Would be acceptable.
[00:08:42] David Keyes: Wow. That's fascinating. I mean, I'm just thinking, you know, the, like you said, you were, you were in academia and going from that with, you know, the huge time horizons to doing that type of work where, you know, the impact was going to be basically immediate. It must have been, must have given you whiplash to, to make that
[00:09:03] Robert Smith: I think a lot of the, uh, the academics were quite quite shocked at the the pace change. Um, but then also really invigorated by the fact that having immediate impact. Um, and it's a real chance and I mean, a lot of people sort of made careers out of it, um, which is is fantastic. You saw some really good researchers, you really stepped up and, and.
You know, published a ton of really high impact work in the space of six to nine months and they're now professors that, you know, top universities So, Um, yeah, fantastic in some ways.
[00:09:33] David Keyes: So you kind of parlayed your experience then, uh, in the pandemic and started, uh, Dark Peak Analytics. I'm wondering if you could kind of give me the backstory for how you decided to start it.
Tell me a bit about the, the typical type of work you do and how it's kind of changed over the years since you began it.
[00:09:56] Robert Smith: Yeah, so it started, um, very organically. We actually, we were both PhD students at the time, uh, myself and, uh, and Paul, and we, uh, got asked to help on a project that. The team was struggling to build, uh, this model in, in, in Excel, um, and so we got asked to help, um, kind of advise them, actually Excel's not really quite the right tool.
For what you're trying to
do. Um, and, uh, and so we said, yeah, we can do this in R and we were naive PhD students and like gave them an outrageously low quote and they smiled and said, okay, fine.
And then we spent a really, really long time building something very big and complicated. Um, but that led to loads of other work and it kind of naturally progressed from there.
Um, and I think in our industry, um, uh, health economics. There's. It's been a reliance historically on Excel because typically the models that were built were quite simple, um, and I'm going to say the word model a lot, um, but when we're talking models, we're, we're talking not statistical models of relationships between data, but, um, simulation models that try and estimate.
Potential costs and and and health outcomes associated with different strategies. Um, and the, because of the way that the, um, the kind of history of health economics as as as panned out, typically, these were very simple. So they were just, what do we expect the cost to be over the next, say, 10, 20, or the patient's lifetime?
And how do we expect their health to be, to be affected. Um, and they started out very, very simple, almost a kind of decision tree. You know, if we do X, then we expect this to happen. If we do
Y, this is the total. Um, and gradually over time, the complexity has increased to the point that now we're running probabilistic analysis, um, Running individual level patient simulations, um, agent based models, um, the type of models that, that were used throughout the, throughout the pandemic that take days to run.
Um, but there's still this over reliance on, on Excel in the industry. And so, um. Our company and an academic group that I'm still part of have an objective of trying to shift the industry away from, from Excel, which we don't think is really fit anymore for, for the types of models that we're
building towards R.
And to be honest, we're not wed to R, um, you know, if people shifted to Python, that would be great. Um, you know,
any, any programming language, the main thing is that that Excel just isn't, um, functional for what, what we're trying to, to achieve anymore.
Um, and so we've, we've really specialized in, in building models that, uh, don't fit into the standard templates that a lot of the big consulting companies have.
Um, so that they're not the simpler end, uh, they're a bit more complicated. They might require a, a micro simulation model. Or they have, um, you know, they're expected to be run for lots of different countries, and so being able to build a shiny app and just flick between countries and things is really useful.
Um, and another case is where we anticipate having to rerun and update the models a large number of times. And we saw
this a lot again during the pandemic with data being updated daily. Um, and it's not really the case in our industry that we need to do them daily, but if you have some analysis that you want to run every month or every three months.
Um, having to kind of do a lot of copying and pasting into reports is obviously not ideal. Um, so there's a lot of trying to, to transition the industry towards R because of the models themselves, but then also the tertiary benefits. So, uh, things like being able to automate reports and, and have user interfaces that people can log onto anywhere in the world.
Um, and so we're kind of specializing in building apps with, with Shiny and, uh, using Korto or Markdown to, to automate reports.
[00:13:54] David Keyes: Yeah. It's funny. You mentioned, you know, you're like, Oh, this industry is all Excel and we're trying to move away from it. So many people I've talked to say that exact same thing. They're like, Oh, well, my industry is Excel. And, you know, we're really trying to get people to move away. And I don't, I'm not saying that this is what you are implying, but.
I think a lot of people assume, Oh, every other industry is way more advanced, and they're using, you know, R or Python or whatever, but we're still stuck in Excel. And I think the reality is, pretty much everybody is still stuck in Excel, and it's kind of rare, you know, the, the, the places that have actually moved to R or Python are actually the exceptions rather, rather than the rule.
[00:14:33] Robert Smith: Yeah, we saw that in government as well. I mean, um, there were big steps taken during the pandemic again to, uh, to, to make that shift. And I think, I think people are recognizing it. And there's a, there's a few drivers, which I'm sure we'll talk about, uh, later on. But, I mean, um. Obviously the, the, what do you call it, the AI hype or the AI reality, um, that, that
we've seen, that, that is certainly, I think, contributing a lot to shifting people towards script based analysis and modeling, because it's just way easier to get a, an LLM to advise how to update or, or
add some functionality to a model in, in R than it is in Excel.
Um,
so that, that's definitely helping, and also it's reducing the barrier, because people that are completely new to it can ask. Chat GPT, how do I, you know, do this
small piece of analysis and often it'll get them 90 percent of the way there. So,
um, certainly bringing down barriers.
[00:15:28] David Keyes: yeah. I wonder if you could talk about, give an example of, you know, the types of projects or clients that you work with to help, um, because I get it at a high level, but I'm curious if you could give an example to help, you know, make it more concrete.
[00:15:43] Robert Smith: Yeah, so one of the earlier, uh, projects we worked on, uh, with another company was, was for the WHO looking at the long term costs and consequences of, of FGM, so female genital mutilation, and looking at, um, the implications of eradicating the practice in various countries around the world. Um, and so that's kind of a, uh, Uh, a bit of an abnormal project, but kind of an interesting one, which illustrates, you know, the potential long term costs and then therefore the benefits of eradicating, um, uh, something, um, other projects include, uh, physical, physical activity policy.
So, if you would say going to build a, uh, a new cycle lane. Uh, you might want to know so what's the impact on, um, on commuter times is important. Uh, what's the impact on, on, on carbon emissions? Um, but you also want to know. So if, if your population then becomes more active, there's a, there's a benefit to that in terms of their long term, long term health, and that needs to have some value associated with it as well.
So, um, so there's a lot of different considerations on the public health perspective. Um, but, yeah. A huge chunk of the work, certainly in the UK for health economists is looking at pharmaceutical interventions. Um, and that's kind of a, a, a massive industry in itself, which is
if you have a new pharmaceutical product, how do you get access to certain markets?
How do you get that into the NHS? How do you get that into, um, you know, the, the, the Dutch, uh, healthcare system or the, um, not so much the US 'cause you have a different system, but, um, certainly in the uk, uh, you have to show. not only is the, the, the drug effective, um, compared to comparators, but it's, is it cost effective?
So per pound spent, uh, per incremental pounds, so additional pounds spent on this, this, uh, drug versus another one. Um, what is the additional health gain? So does it provide kind of bang for buck? Um, and so
that's a lot of the, the health economic work that's going on, um, is, is looking at exactly that. So running
simulations of of patients over time and saying how much extra health does this, uh, intervention.
Um, and, and it's not always a pharmaceutical product. Sometimes it's a, uh, sometimes it's a sort of diagnostic or a different strategy, different treatment pathway. Um,
but, um, understanding what the, the impact of that is on, on health in the long term, uh, that enables you to compare on the one hand, the additional costs, the extra cost, um, and on the other hand, the additional health benefits associated with it.
Um, and best case
scenario, best case scenario is cheaper and produces more health, then it's a no brainer, right? There's no
real decision to be made as such, it's very simple. But quite often, uh, as you can imagine, the, the new product is, is more expensive, um, but provides a better quality of life or, or a longer duration of life.
And so we need to run these, these simulations and, and understand that.
[00:18:43] David Keyes: Makes sense. Um, well, going back to what you were talking about a minute ago, which was you were talking about the, you know, physical health, you gave the example of, um, you know, a bike lane, um, being put in, um, I wanted to talk to you as well about a different project that you were involved in, um, called park run.
Um, and you work with this organization to kind of develop some new projects sites for it to help it kind of expand. So I want to talk about what that looked like. But before we do that, I wonder if you could just give me kind of the overview of what Parkrun is and what your initial involvement with the organization was.
[00:19:22] Robert Smith: Yeah, so Parkrun, um, was, is a weekly 5k, timed run that takes place In, uh, local parks around the world. Uh, it was set up in 2004 in Bushy Park, which is in, in central London. Um, and, uh, and took a while to get going. So from 2004 to 2010, I think there may be two or three locations. Uh, and essentially all you have to do is go online, um, at, I think park run.com and uh, sign up.
get a barcode, turn up on a Saturday morning at 9 a. m. Um, it's completely free. You turn up, you do your run with everybody else, and, and often at these events, there's kind of 300, 400 people on a morning, so they're not, they're not massive, but they're also not tiny. Um,
and at the end of the run, you, you scan your barcode, and then you get given your, your time.
And that's all set up kind of on an automated basis now. Um, and so in the UK, this, this, the first one started in, in 2004, essentially as a group of friends who wanted to go and time themselves on a course. And loads of people decided, you know, this is a great idea. We want to come join you. Um, so more and more sites were, uh, were introduced.
And I think now we have about 700 in the UK. I think in the US, um, there's maybe 50 to 100 locations. I'm not, not exactly sure. Um, And, uh, and it's kind of spread worldwide, so it's the most popular, or the countries with the most number of parkruns are, um, generally kind of Australia, New Zealand, um, South Africa, Poland, uh, France, so all over, uh, Europe.
Um, so it's been incredibly popular, um, and it kind of just grew organically. There was no involvement from. From the government or, you know, no real like kind of subsidies or, or provision of, of bursaries or anything like that. It just grew organically. It's, it's free to do. Um, it, you know, it didn't require a huge amount of, of resource.
Um, but over time, um, sport England and the UK realized that they could probably provide some boost in funding, uh, to try and reach. Uh, or create new runs in places that, that were kind of harder to reach or weren't, um, weren't starting them up organically to try and reduce, um, the, the inequality between different areas.
Um, and so they, they provided, uh, 3 million pounds of funding, um, back in 20, uh, 18, and did that on the basis of trying to reduce these, uh, these inequalities in. Um, essentially what we'd found and I can kind of go through some, some, some charts and stuff to show it in a second is that the people who are doing this, these, these runs tend to come from less deprived communities.
Um, and despite the fact that the, the events are quite. Uh, accessible for everybody. Uh, there's certainly a lot higher participation rates from, from areas with, with lower socioeconomic deprivation. Um, and so there was a kind of a public health initiative to try and, uh, reduce this inequality. And so, um, we, Paul and I were, we're doing our PhDs at the time and, uh, and as a kind of pro bono work with, with Parkrun, uh, we're able to identify where. Some new events should be placed to reduce these inequalities the most or to increase the number of runs the most.
[00:23:02] David Keyes: Yeah, that makes sense. So let's, we'll dive into that in just a second. I'm curious, was your initial involvement as a researcher or as a participant in the in Parkrun?
[00:23:14] Robert Smith: I actually helped, uh, set up one of the, the first events. So the, the, maybe the, the fifth event, I think, was, was based in Nottingham. And we actually went and, and helped set it up. I was, uh, I was organizing, running the, the university athletics team, um, at that time. And, uh, and so we went, they asked the, the university team to come on and do a pilot event.
And, uh, and actually we managed to, to finish the event coming into the finish line from different directions. So it was a complete failure of a, of a pilot event. So they, they set us all off running and then we ended up somehow going different ways and ended up coming into the finish line from different, different directions.
So, yeah, so
[00:23:57] David Keyes: I guess that's why you do a
[00:23:58] Robert Smith: that's why you do the pilot. Exactly. Exactly. So, but yeah, but now I can say now it certainly runs. So, uh, yeah, so I didn't get
told off by the organizers. It's a very successful event now. Um,
but, uh, yeah,
so I was a pretty, um, uh, decent runner. And, uh, and so I kind of had a active involvement in it, um, uh, through, through the university team.
And then my mom was actually done about 280 of them. So she's, uh,
she's, she's pretty keen. She goes every week.
[00:24:29] David Keyes: That's great. I mean, that makes so much sense. Then you as a runner, plus your, your research, you know, it all kind of fits that you would, You know, be involved with this organization, not, you know, again, initially as a runner, but then later on as someone who could help them think about how do we, you know, increase equitable access, um, to these events.
So, um, you've actually written, you know, several papers and we'll post, um, links to those in the show notes. And you've, I know you've also given presentations about this. I wonder if you would be willing to share just a little bit of, of one of the presentations that you've given to, to kind of walk through what this looked like, and especially focusing on, you know, what the R work involved was.
[00:25:15] Robert Smith: Yeah, sure. So I'll share the screen. Um,
Yeah, so just to start with, this is a collaboration between my myself and my colleague Paul, who we actually ended up co founding the company with. And a collaboration with Parkrun the University of Sheffield where we were based, and I still have a small academic role there advising on various projects.
As a bit of background, so Parkrun started in 2004 and gradually grew. so we'll post these slides up, but there's a video there kind of showing that growth over time. And now there's a huge number of events, so. South Africa, Australia, New Zealand, the U. S. and Europe being kind of the primary locations.
but back in 2018, Sport England put together kind of a pot of money, three million pounds, to create 200 new events. And in particular to boost participation from underrepresented groups and in areas with higher socioeconomic deprivation. And so we thought this was a good time, to look at how this could be achieved.
And so, since that time, we published a series of papers on parkrun. Uh, it's kind of been a bit of a side project for me. The first one was looking at ethnic density. Uh, which is defined in the UK anyway as, as proportion of non white British and the influence of that and socioeconomic deprivation on two different dimensions on participation rates in parkrun.
And then we used that data to help, understand, and inform, where parkrun events could locate to, to improve both equitable access, but also overall access and overall participation, to events and in events. And then we looked in 2020 at the long term kind of trends in inequalities, in distance to, so access and also participation in parkrun.
And so we were asking the question there, how have these relationships changed over time? Are we getting more, equitable or less equitable, and then in 2024, I was interested in what the effect of the pandemic was on participation, because we stopped doing park runs in the UK for about a year and a half during the pandemic, and when we returned there was quite a large impact. So we were, we were keen to understand, um, what things were like and update the previous work.
[00:27:38] David Keyes: Yep, that makes sense.
[00:27:39] Robert Smith: Um, so the first paper, yeah, really focused on ethnicity, socioeconomic deprivation and access. And we're able to show that there were lower participation rates in areas with higher socioeconomic deprivation. But also holding that constant, there were lower participation rates in areas with a lower proportion of the population that were white British. And so we wanted to understand that because from Parkrun's perspective of course everybody is, every community and every, um, group is kind of free to do the type of physical activity they want, that's great, but Parkrun were very, very keen to ensure that everybody felt welcome, um, and that there wasn't some driver that, you know, um, was leading to, to long term, um, Differences between, uh, ethnic groups, um, and so Parkland were keen to see, you know, how that was changing over time and, and things like that, um, and we're able to show that.
[00:28:29] David Keyes: And so in this chart, sorry, just to clarify, like in this chart index multiple deprivations on the Y and then ethnic densities on the X and then participation rate was the, the gradient.
[00:28:41] Robert Smith: Exactly, yeah. So the, areas that have the least deprivation and the highest proportion white British had the highest participation rates. And then as you deviate from that on, on, exactly, yeah, bottom left, and as you, as you deviate from that in either direction, for, for either variable, so you either move to an area with higher socioeconomic deprivation or areas that have higher proportion non white British, we see lower participation rates.
So there's kind of two effects going on simultaneously. It's not just affluence.. There's kind of two different factors. Yep, okay. But that was kind of just a descriptive piece of analysis looking at a snapshot in time back in, in 2018. but then we, repeated the analysis, with more data, which was nice.
So we were able to see how does the That affects change over time, and we're most interested here, we repeated all the analysis with different covariates, but this plot is probably the starkest plot from the paper, which simply shows that the relationship between the different IMDs, so deprivation remains constant essentially over time.
We are not seeing that that gap is narrowing. It's staying roughly constant. So the the least deprived fifth of the country has the same relative, uh, participation to the most deprived, so significantly higher levels of participation, and in the order you'd expect to say, um, the most deprived being the lowest in red there and then up to the least deprived In green. And it's quite discouraging that if you see that by 2020 the most deprived communities in the country had participation rates that were similar to least deprived back in 2013. So we're talking seven years difference.
[00:30:28] David Keyes: so it seems like there's an overall issue that I know the organization wanted to address, which is we'd like to have a wider diversity of folks involved. and so I know that that's work that you got involved in then to help them think about finding sites to ensure equitable access.
[00:30:47] Robert Smith: Exactly. That's exactly what we did in the, in the next paper where we, looked how do you identify locations where we're going to have the largest impact on number one, access, but also, can we try and predict participation based upon the location of new events? so those are kind of two separate questions. And so we used what's called greedy search algorithm. And I'm sure there were people listening who are, um, kind of geographers and from a different background, who will probably have a lot of experience with this and much more than me.
But essentially what this does is look at different green spaces, so parks, across the UK and identify the park that provides either the best access gain or number of runs gain and then say okay let's pick that park, that's our best park, and then look for the second best park with the exact same method.
Assuming that we've selected the first park and implemented a park run there, and then look for the third best park, and so on and so on. Um, the equation's, uh, basically right? And that would be Sorry, go on.
[00:31:51] David Keyes: Yeah, yeah. Well, let's, let's focus on the high level then. So that would be, like, the best park in this case, would be the ones that, or the park that would provide most access to typically underrepresented folks. Um, is that right?
[00:32:09] Robert Smith: Exactly, so it depends on how we prioritize that. So we can either look at access overall, so the park that provides the biggest reduction in distance traveled for the whole country or we can weight different areas and prioritize different areas by their level of, deprivation. So we can say, if you reduce the travel distance for a more deprived community, that's worth more than reducing the travel distance from a less deprived community. And so the weighting that is, is very important.
[00:32:41] David Keyes: So. If I can translate that or repeat it back to you, and you tell me if this is right, like, the idea is when you're thinking about finding new sites, it's not just, you know, you look at a city and you say, the city is this big, we need to space out the events equally. Instead you look at the kind of demographics of the city and say, you know, these are areas with high deprivation, so we should consider putting Um, locations in or closer to those areas with the idea being that that will hopefully encourage participation among groups who have, again, been been less involved in the past.
Is that, is that right? Yeah, that's
[00:33:22] Robert Smith: that's it in a nutshell. I think the um, the exact weighting that you apply is the difficult bit and the kind of more controversial bit. Like how much more is a, uh, is improved access in the more deprived communities worth. Like, it's easy to say, and this is something that policy makers really struggle with in general, is, it's easy to say, yes, we'd like to prioritize that, but then you say, okay, so what's the actual number? And that's really, really hard to do. And so, uh, I'll talk in a minute. We just, um, essentially used, uh, used a square, so took the square of the deprivation score, which is very crude, but gave us a higher weight for areas that were more deprived. But in kind of taking the, the deprivation component aside in terms of the actual greedy search algorithm, what we did was, and I can give the example here of Sheffield.
So Sheffield, when we, we looked at Sheffield, had five park runs. So this is the, the map of the city of Sheffield in the background there. It has had five parkruns, shown with the blue dots here. What we did was we took data from the ONS in the UK, so the Office for National Statistics on the locations of essentially different communities.
So what they do is they split the country up into what's called lower super output areas. And they, each one of the lower super output areas contains roughly 1, 000 to 3, 000 people. And they provide a centroid, so the location of the weighted center of that community. So, for example, in this big lower superalpha area here, we can see that the centroid, the dot, is located all the way over on this side.
of the Lower Superalp area, and that's because nobody really lives out here, this is the peak district, the national park in the UK, and most of the houses are over here. And so what we have is we have a series of longitude and latitudes represented by these dots, which relate to a community of roughly 2, 000 people.
And so then we can do analysis looking at what's the optimal location given that, that we know how the population is split across this geographic region. And so, we can look for every single Lower Superalp area, so every single black dot here, what is the distance to the nearest parkrun event, shown in blue.
And also, where are the other parks in which we could locate events, shown in green. And so we can make that calculation for every lower super output area, what is the geodesic distance, so the distance as the crow flies to its nearest parkrun event as it currently stands. And then we can say, what if we put a new event in this park?
What does that do to our total? And so I just provide one example here, so for this dot, for this lower super output area over here, it reduces the distance from 5k, which is what they had to travel before, to
2k,
and so we can sum that across all of the lower super output areas. For most of them they won't be affected because only the areas in which this is now closest parkrun, so all of the areas essentially between um, This new event and the other ones.
Everything else won't really be affected in terms of access because they already have one closer. But for all of these, this will now be the closest event and so will improve access. And so what we did is we ran that for the entire country, for every park. So we said, which is the park that reduces the sum of the distances traveled weighted by number of people in each of these lower super upper areas?
Um, and then just looped through that, that search algorithm algorithm and said, so give us the number one park, the one that improves it the most. Then we're gonna put a park run there. So put a blue dot there, and then we're just gonna run the analysis again assuming that we've put a park in that location and then find the second best.
And so we just through that, that greedy search algorithm to find the 200 best parks that,
yeah,
[00:37:17] Robert Smith: provide the biggest impact on minimizing the distance traveled or the geodesic distance.
[00:37:25] David Keyes: Yeah, so this is just about reducing distance, right? This doesn't take into consideration the deprivation measures that you had discussed before, right?
Exactly,
[00:37:36] Robert Smith: exactly. So what we then look at is, So in Sheffield we can see that we've got these five events, and we can look at the distances traveled to these five events. But we can also see other characteristics of the Lower Superalp area, so each one of these, these small communities.
So we can see the deprivation of that Lower Superalp area. So here we show in red, the more socio economically deprived communities, and in green, the less socio economically deprived communities. And in Sheffield, historically, it was a, an industrial town in the north of England, so, uh, there were a lot of factories that blew up smoke, and the prevailing wind in Sheffield is from west to east, and so the smoke tended to blow to the east and so more of the kind of lower socioeconomic, more of the kind of cheaper housing essentially was in the, historically in the east of the city.
That would be a kind of factory workers and, and people in, mining and things like that, whereas the kind of wealthier, more affluent, factory owners and Master Cutler and that kind of thing would be in the west of the city. And that's propagated right through till now. So Sheffield really is a kind of a city of two halves.
Sure. Um, and so we can then look at and w eight the runs by, index of, multiple deprivation. so, sorry, weight the areas, so we can, we can provide a weighting on each area, so rather than maximizing the sum of distances traveled, we can, uh, maximize based upon weighted distance traveled with the weight being whatever our, our weight is, that we deem appropriate on multiple deprivation.
and just to show very quickly that relationship, so this is the participation rate, and this is the deprivation. So you can see, you can see already how closely correlated those two things are. So, um, we've got the areas that have higher socioeconomic deprivation have much lower participation rates.
And so that's what parkrun we're trying to, trying to address. So, so we looked at these four things. So, so how good is overall access, not looking at The socio economic groups and generally overall access was what I think personally really good. So, 50 percent of the population of England are within 3km of a parkrun.
So they live within 3km of a parkrun event. roughly 70 percent of the population live within a parkrun of a parkrun. So they live within 5k of a parkrun event. Um, and so that's, I think that's like, generally shows how many events there are. And the events tend to be clustered in urban areas where most people live.
So pretty much everybody can get to a parkrun event you know, by public transport or even a lot of people, huge proportion of the population just by walking. And generally access is quite equitable. And actually what we found when we, when we looked at the data is, as socioeconomic deprivation increases distance to the nearest event decreases.
So, actually, people living in more socioeconomically deprived communities have better access. And a lot of this is driven by the fact that, um, People living in more socio economically deprived communities tend to live in denser populations. And denser populations tend to be better at supporting local events, because you've just got more people who might be interested in setting up a running event at the local park.
Um,
[00:41:03] David Keyes: whereas And is that showing it before or after the new sites were added? So this
[00:41:09] Robert Smith: is before. This is before, yeah.
[00:41:11] David Keyes: Before, okay.
[00:41:13] Robert Smith: So, so actually access is very equitable, at least geographically. It depends how you define access, right? So, geographically, yes, of course, if we start looking at access in terms of is the event at a time that is, um, suitable for different people what about, you know, the people who live in less deprived communities are more likely to have a car and so can drive to events?
Then there's a different question. But just looking at geography, just looking at distance to event, actually we're very, I would say very equitable. But participation, as we've just, we've discussed, clearly isn't. so participation rates in, in the more deprived communities are much, much lower.
and very inequitable, um, on, on a number of different dimensions. And so that's what we, we were trying to, to understand. That's why we did this analysis looking at firstly, how can we maximize overall access and then can we provide a weight to prioritize areas with, with higher socioeconomic deprivation, uh, and then also we use the previous analysis that we've done looking at what the drivers were of participation and then use that statistical model within the prediction of, um, what would happen if you placed a new event in a certain location to then say, if you place a new event here, we think that you will get x extra runs, so so many extra runs. And so then we're able to rank order both on geographic access, but also on predicted participation increase.
[00:42:43] David Keyes: Yeah. and I can see there that, you know, you have a shiny app, which putting up now, but I, I'm just curious. assume a bunch of these sites were then added. Is that right?
[00:42:54] Robert Smith: Yeah. So later on, We have a whole series of of new events. We actually got a bit derailed by the pandemic, which is a shame.
So they were just starting to add events. This is 2020 we're publishing and they were adding these events and then the pandemic hit and they all had to stop. But we got a cut, we got a few in, um, and one particular one, uh, one of the team at Parkrun were really nice and fed back to us which of the 200 locations they were able to locate and start new events and so really nice of them they started a new event in Bradford and messaged us to say, we've created this new parkrun event in this park and worked with the local people there to set it up and it's, it's sustainable so it's still going. It's, I checked before this podcast just to see and it's still going and really successful so yeah, that's really good.
[00:43:42] David Keyes: Yeah, because I was, I was going to ask like kind of how accurate your predictions have been, but given the, you know, COVID pandemic and all that, it seems like that's probably a question to be asked maybe in a couple of years.
[00:43:56] Robert Smith: Yeah, I think so. And one of the things that, so we were able with the map to identify, you know, our suggestions, but there will be a lot of cases where
We
[00:44:07] Robert Smith: say we want to maximize total runs, so we were able to rank order and place on the map where we think the new events should be located, so for example I know that Chesterfield now has one, so they did place one in this location, but one of the things we wanted to do was just flag, you know, we're just saying here is a potentially good location that is not really close to another one, it makes sense to have one here, and then there'd be some qualitative analysis done.
To say, okay, so is this appropriate, you know, can we talk to local communities and see whether there actually is demand here? So was more kind of a recommendation of here's a potentially good site.
[00:44:45] David Keyes: It was like the first step.
[00:44:46] Robert Smith: Yeah, exactly. And so they would use this tool, the team, and they fed back on using the tool and sometimes they would log in or go on and zoom in and say, Okay, so maybe that park, that particular park, there's some reason there can't be an event there.
But actually there's a park next door and yeah, let's try and set one up here. So it was a nice, yeah, and good to kind of have it be interactive and be able to see kind of where we recommend these new events go. Yeah.
[00:45:18] David Keyes: Cool. We'll definitely post a link To that shiny app so folks can check it out and play with it and see see what it's all about. Was there anything else? Oh, yeah
[00:45:29] Robert Smith: Yeah, so one of the that we've been able to To begin with, with everything being done, done in R, is we're able to, alongside the, publications and linked within the publications, was share all of the code, and the aggregated data, to be able to recreate all of the analysis we did for all four papers are open access and open source. And so the team at ReproHack, I'm not sure if this is a thing in the States, but in the UK, there's a, website called Reprohack where you can, you can log on and look at an academic paper that's been provided and then try and replicate it.
And so they have these kind of hackathon sessions to try and replicate academic research, which we think is really nice. And so Paul and I have been to one, it was, it was great fun. Um other colleagues have been to them and kind of replicated our work and, and one tweeted, about it, which is really nice thing to get on like a Thursday afternoon, like someone's just replicated our work.
Um, so it was really cool. and then the other thing that's come about because of this is, another colleague with the surname Smith, so no, no relation, um, used the exact same code and method to do the same analysis in Australia. So we replicated our work from the UK in Australia To help inform where new events should be there.
So yeah, so this is really really nice .
[00:46:44] David Keyes: So I think that's a great example of how your academic training, your background, your experience plus, of course, your interest in running kind of all came together and you were able to use the power of R in order to you know, take on a really interesting set of projects.
So thanks for sharing that. If folks want to learn more about you, the work that you do, dark peak analytics, what are the best places to send them to?
[00:47:11] Robert Smith: So we have a a website which we can, we can link with you. And then we also have a GitHub account where we make a lot of our, our work, open access, open source.
So a lot of our training materials on building models in R and Shiny are open, source, so we can link there as well. And then of course like LinkedIn and various other things that we're active on posting, as I know you are about kind of the work we've done in R and, in public health in particular.
[00:47:36] David Keyes: That's great, and we'll make sure to post links to all of those things in the show notes, as well as, of course, links to the various papers that you talked about during the presentation. So, yeah, Rob, it was great chatting with you. Thanks again for coming on and talking about the work that you do in R.
[00:47:57] Robert Smith: Thanks, it's a pleasure. And yeah, thanks for running the podcast. It's it's fantastic.
Sign up for the newsletter
Get blog posts like this delivered straight to your inbox.
You need to be signed-in to comment on this post. Login.