Skip to content
R for the Rest of Us Logo

R for the Rest of Us Podcast Episode 13: Ahmadou Dicko

In this episode, I talk with Ahmadou Dicko, a statistician based in Senegal working with the United Nations High Commissioner for Refugees (UNHCR). Ahmadou shares insights on utilizing data-driven approaches to address development obstacles, especially within humanitarian settings. He explores the innovative packages and strategies developed by his team using R for data management, analysis, and communication. Among these innovations is robotoolbox, an extensive R package designed for accessing and handling Kobo Toolbox data in a tidy format.

Learn more about Ahmadou by visiting his website and connect with him on GitLab, LinkedIn and X.

Learn More

If you want to receive emails when we publish new podcast episodes, sign up for the R for the Rest of Us newsletter. And if you're ready to learn R, check out our courses.

Audio Version

Video Version

The video version has a walkthrough of some of the packages that Dicko and his team have created to facilitate their work with humanitarian data.

Resources Discussed

Transcript

[00:00:00] David Keyes: well, I'm delighted to be joined today by Ahmadou Diko. Ahmadou is a Senegal based statistician who works for the United Nations High Commissioner for Refugees. In this work, Ahmadou and his team have developed unique strategies using R to manage, analyze, and communicate with data.

[00:00:41] In this conversation, I'll ask Ahmadou about his work and he'll give me a walkthrough of some of the packages that he and his team have created to facilitate their work with humanitarian data. Ahmadou, welcome and thanks for joining.

[00:00:53] Ahmadou Dicko: Thanks for having me. I'm really thrilled to be a part of this particularly talking to you about, you know, Imagine. Data and what we're doing. So really excited.

[00:01:02] David Keyes: Great. Well, maybe we can start out, um, I want to get some information about your background, and I want to, I'm curious kind of how you got into R, but maybe actually starting out, if you don't mind just giving an overview of the type of work that you do today, kind of what does your, your daily use of R look like?

[00:01:19] Ahmadou Dicko: Very good question. As you said, I'm a statistician working from UNHCR, so the High Commissioner for Refugees. So we are basically working on four disciplines. displacement, and I'm working from what we call a regional bureau. So I'm covering several countries in a specific region. In my case, I'm covering West and Central Africa.

[00:01:38] So the whole West Africa, Senegal, Mali, Burkina, and also Central Africa, all these countries like Cameroon, Central African Republic. And I think if you watch the news, you know, there's a lot of conflict, um, and conflict means a lot of displacement. So we are trying really to work with this population to find solution.

[00:01:56] Also international protections, and so we are dealing with a lot of data and I think for the organization, statistics is, um, it's not new, but, uh, definitely we have more and more new roles like data scientists, statisticians like me and was not there back then, and I think we are really trying to push for a new way to use data to improve and be basically more efficient with it.

[00:02:20] So my role is definitely like some sort of managerial data scientist. So creating data product, uh, processing information to make sure that we do proper location, use the data the right way. So basically a bunch of predictive models, working on our official statistics because we also run a lot of surveys.

[00:02:39] And, uh, so it's involved also a lot of methodology and I'm really, really, uh, It's serving that actually because I think getting the right data is key. So, yeah, so a lot of, a lot of work, really, really exciting work to be honest. And a lot of challenges because we are working in really fragile situation where you don't have much.

[00:03:01] And, uh, where also it's really difficult sometimes to do a lot of things, you know, I mentioned surveys, how to run a survey when you have conflicts around. So how all these type of challenges actually. So that's pretty much what we do usually as statisticians in the organization.

[00:03:15] David Keyes: Yeah, that's great. I wonder if you could give maybe a specific example of a project that you've worked on recently, just to help kind of make it concrete for people.

[00:03:25] Because I think, you know, when I think about the UNHCR, I think a lot about. programmatic work, like going and helping people who are refugees directly. But obviously that's not the work that you are directly involved with. So yeah, I'm wondering if you could give me a specific example to help understand the work that you do.

[00:03:41] Ahmadou Dicko: Yeah, sure. We have situation where we have population of refugees that in area, I mean, for Like a couple of years, even more, and I think they are in a situation where also they need help. It's not like, uh, let's say the onset of a crisis, but they are still in another country in need of, international protection and support.

[00:03:59] one thing we did recently in Burkina Faso, uh, where we have a lot of internality space people. was to run a survey that we call a result monitoring surveys where we collect a lot of indicators actually on them and to see, you know, like the level of well being and, uh, understanding the needs that's really key and important.

[00:04:19] This is like a basic household survey, but I was mentioning the challenge of doing it in an area where, I mean, you don't have access to many, many, conflicts, uh, zone in it. The rest is really fragile also, it's really complicated. But, nonetheless, uh, with my team and the, the work we, we did there, we managed with really good partners on the crowd in Burkina to collect data on this population.

[00:04:45] And, uh, we are currently processing the data now. The, the goal will be really to create some sort of reports for the senior managers to understand exactly the dynamic of the situation because it's not the first survey. We run similar survey in the past to understand things improve, got worse. whatever reason, and also, what can we do now to improve, because of course we also have a lot of programs that we are doing.

[00:05:07] Is it working or not? Something we'll start doing more and more, and I'm also really excited about this. It's definitely evaluating what we are doing, we would like to push this further and efficiently use every, uh, single dollar we're using, for, for the response.

[00:05:23] So that's one example is basically collecting data, analyzing data, and giving information to a senior manager to have, you know, all the information they need to improve the life of this population. Some of them are in need of documentation. That's basically the only thing they need, you know, to just keep some of them in jobs and everything.

[00:05:41] And then you have to understand also the dynamic of the local job market and everything. It means analyzing also data from the host countries. So there's like a variety of analysis you can do with the type of data we are collecting. But the idea is always pushing for definitely solution and finding a way to improve the life of these people.

[00:05:59] And Maybe one of the big changes is really to also be more and more data driven. The right way, of course, because you always have human in the loop. But at least to get insight that you would not be able to probably get in the past. And now with the tool we have and the capacity, we have way more information.

[00:06:17] David Keyes: Yeah, that makes a lot of sense. Um, great. Well, let's go back just a little bit because obviously these are a lot now. I'm curious kind of what your introduction To R

[00:06:27] Ahmadou Dicko: was. Yeah, I I think it's a long story now because I'm a statistician as you said basically moral session slash econometrician and I was doing my grad school here in Dakar, but in 20 I started in And I remember back then I was the only one Using R for whatever reason.

[00:06:47] I didn't know anything about R. That was funny because I think all my, colleagues and even the profs, they were just pushing for Stata. You know, Stata is really like widely used by economists and econometricians. And, uh, I was like really some sort of advocate for open source back then. I still run my Linux box and everything.

[00:07:06] I'm always pushing for open source. And I didn't want to have Stata because it was not like free and it was quite expensive actually. It still is, I guess. And so I was looking for something. So I played with Octave, the free version of MATLAB. I was not satisfied. And I found R. And it was like really online.

[00:07:27] It was funny because I didn't know how to do all of this, but I was really just pulled into this thing. and, uh, yeah, so that's how I started actually. And, uh, we didn't have much in terms of, uh, books and everything, you know. And I was just holding on to one PDF I found online from, I think, the University of Toulouse.

[00:07:46] One prof, and the only source of information I got for like almost a year. And the prof also was not using it, so I was basically on my own. But they were cool enough to let me experiment with it. But, uh, so yeah, I started, I think, yeah, online back then. So it's been a while, I'd say, I guess, right?

[00:08:06] Love it.

[00:08:07] David Keyes: Wow. And I'm curious kind of, you know, over the years, what have been kind of milestones or like changes, in terms of how you've used R in your work life.

[00:08:17] Ahmadou Dicko: To be honest, I think it's going so fast. And, um, I think you will see I still use, uh, Emacs to do R because back then we didn't have RStudio and I'm just so used to it that I'm still using it.

[00:08:31] r first big milestone, to be honest. Having like a proper idea to play with R. Back then we had Team R something, uh, . It was not that good and Max was probably the best, but you have to learn in Max, which is not always easy. But, uh, this of course, I mean the, the impact of us to join everything, you know, the, the, the whole ecosystem and building really the community around the diverse and all these tools.

[00:09:00] I remember when I was starting, I was still thinking should I learn ggplot was the new thing and or lattice, which was like really more established with the book and everything. For whatever reason, I picked ggplot and I don't regret it because it was a really good investment. And, uh, so yeah, I think the ecosystem, uh, more mostly around our studio, they did a lot too.

[00:09:23] Push R to where it is here. Maybe one other thing is like rcpp. And all these packages to, Provide more example on how to blend C in R, even C. And I think I remember back then the issue with R. I was always, Oh, R is slow. R is very slow. People were saying this actually. And also we cannot do big data analysis with R.

[00:09:46] I don't hear many people saying this now, actually, and I think for a long time, which is really cool because I remember back then it was just a thing. And that's why I think rcpp and, and all these packages. Yeah, definitely. Of course, the community, because many of us are still using R because, and thanks to the community.

[00:10:06] David Keyes: Yeah, it's really interesting to hear you say that R was slow and couldn't handle big data because I think now in many ways for people coming from, say, Excel, one of the reasons that they come is because of limitations on data size in Excel. So it's interesting to hear about a time when R was on the flip side of that.

[00:10:29] Um. So in your work now, I know it's not just you and we were talking before, um, we hit the record button about other members of your team and kind of the work that they do. So I'm curious if you can kind of give a, an overview of your team and what their roles are, what your collaboration in R looks like.

[00:10:50] Yeah,

[00:10:50] Ahmadou Dicko: that's it. Thank you. I don't even know if you can say team. We are, we are chatting on a daily basis, working together, but we are also not on the same place. They are all working in ETR, so we are all in the same organization. One is statistician, another one is what we call information manager, but more like something between data science, data visualization.

[00:11:10] And all this work On R is really with these two people, so the first one is Hicham. Hicham is a statistician working in Panama. We have a regional bureau covering the Americas. And the other person I'm working a lot with is Cedric.

[00:11:23] Cedric Vidon, he's also, he's the information manager in Geneva. And, the go to person for everything data visualization. And, uh, design, also GIS. He's really good with this thing. And, and he loves R, so, so that's really cool. And basically why I'm saying, uh, not a team per se because I think it was just like something ad hoc and organic, you know, just naturally We just found ourselves talking more and more Oh, maybe we call if you can start working on this.

[00:11:52] I have this issue. I have the same issue. Let's do this together, right? So it was not like let's say our supervisors and everything say, oh, you should work together or something from the top down It was definitely something organic But it's also because of many other people for example, one colleague, Edouard, Edouard Le Goupil, who pushed a lot for this internal R community within the organization, our chief statistician and many people really also Making sure that you have the right environment to do this type of collaboration.

[00:12:19] But yeah, I think, yeah, these two people are the, probably the two people I'm working the most on this type of project. Maybe I will quickly present, but, um, but it'll just be a brief presentation because I think the thing is, I collaborate with them, but they are usually the expert on this thing. They know way more than I do on, on, on, on this.

[00:12:39] But, uh, but it's

[00:12:40] David Keyes: cool. Well, that's really interesting that it wasn't You know, you working, you weren't put together with them, you know, by your organization, it sounds like the R users kind of found each other, um, and found ways to collaborate, given that you're all R users and have an interest in it.

[00:12:59] So that's, that's really interesting. Um, talk about, you work, of course, a lot with humanitarian data. And. Of course, you use R a lot, so I'm curious, you know, why is R an effective tool for that type of work versus any other tool that people doing that work might consider using?

[00:13:21] Ahmadou Dicko: That's a good point.

[00:13:22] Uh, it's a really good question, actually, and, uh, I'd probably be very biased because I have a little bit of strong opinion about this, but it's not really R per se. For me, it's more like, uh, open source. So it can be Python tomorrow, it can be Julia or whatever. Even if I prefer Julia to Python, but anyway.

[00:13:41] But it can be any of these tools. Actually, I believe that the humanitarian principle, and when people think about humanitarian work, aid and everything, I think it goes well with the open source philosophy also. I was lucky enough to have always worked in the public setting. I was a researcher before being a humanitarian, so I always worked on the use of data for social good.

[00:14:05] And I thought to myself, um, it makes sense to push for this community based tool building around because OpenSource you have a community behind, you have people, they are sharing and you have a lot of all these things that I feel like really goes hand to hand with what we do also as humanitarians. So that's the first aspect.

[00:14:27] And the second aspect, actually, as a statistician, I think R is just natural. You can do a lot, you have tons of library and of course now in many graduate programs people are trained in R. So, it's much easier also to find other people using it and to collaborate with them. So, yeah, now it's much easier.

[00:14:48] Back then it was more complicated we are still pushing to have R having like a more prominent role in the humanitarian data world, actually. I see more and more, for example, lately, R being pushed in the pharma. And I know the pharma industry was more like SaaS. It's still SaaS, to be honest.

[00:15:06] But I see, like, more push, and I think I would love humanitarian, the humanitarian world also to be the same. And I think we are just trying to showcase what we can do, because the best way to show that the tool is working and you can do the work. is definitely saying This is the tool I built, this is what it can do, this type of report, this type of analysis So in terms of just capacity, flexibility and what you can do, it's just limitless, right?

[00:15:31] It's just like your imagination, your own time and also surrounding yourself with People have different skills and they can support and help you and also with the same passion I think it's probably the hardest, you know, finding the right person to work with. You're gonna do everything by your own And as a humanitarian data worker, we don't control too much our agenda.

[00:15:48] I can talk to you now and they call me tomorrow and say, Oh, there's this thing. You have to go to this country and that's it. So the small amount of time we have to work on this thing, I think, having really dedicated people passionate about it is really key. And, and that's why also, I think I was lucky enough to have worked with a lot of colleagues in humanitarian data space.

[00:16:10] It's really good in R and also using it to make a difference on the field and in terms of capacity, yeah, building on top of all this, uh, nice tools from the tidyverse in terms of data manipulation, data viz with ggplot and all these tools, but also sort of analysis from, uh, classical stuff, machine learning, econometrics, you name it, you have everything in R.

[00:16:33] So I feel like R is really a really nice tool. And to date, we have so many courses, books, everything in a very large community.

[00:16:42] Maybe the one, one thing that is a little bit missing or not is like for us, by us, I mean, usually I don't pay classical mental organization, UN or not, or NGO, they will rather have a contract with someone rather than, you know, and this is probably the same for a company, also a private company.

[00:17:02] So having like more companies like our studio in France, think R and even you, what you're doing, you know, with the consulting work and everything, it is key because then you have someone that can, you know, you can call if something is not working. And I think probably need more and more people around and also pushing to work with humanitarians.

[00:17:21] I think that would probably be also one of the changes because people like myself, repatriation is one thing, but that's at some point you need to sign a contract, you need to talk to senior management, they need to put us on resources and everything. And that's probably also where I feel like we, there is something that can be done to, to, to just be at the next level, have maybe way more people using it.

[00:17:43] David Keyes: That makes a lot of sense. Just out of curiosity, what other data tools are you seeing? I mean, you mentioned, for example, in pharma, there's a pretty strong push to move to R, um, from SAS, like you said, in that industry, in the humanitarian data space, what, what are the main tools that you see people using?

[00:18:01] Ahmadou Dicko: Excel. Excel is king. And I think, yeah, and I'm not really an Excel pasher, to be honest. I use it to look at the data, to do things here and there. Not real analysis, but I think it's just everywhere. And when you install your machine, at least work machine, you have Excel. So when you think about it, imagine someone they can send, like, after an earthquake somewhere, which just has a laptop and everything.

[00:18:25] Having Excel skills is really like, you know, something key. But Excel and GIS, we do a lot of mapping, a lot. And so you have like Esri, all the Esri things, because of legacy and everything. I will say it's biased, it is my own opinion, it's not really my organization or whatever. For example, I would push for QGIS, because it's open source and it's really, really good now.

[00:18:47] It's not what it used to be back then, compared to RGIS. But yeah, uh, Excel, RGIS. And, then now all the BI tools, Power BI heavily used in some organizations, can be, some others are using Tableau, for example, WFP, they use a lot of Tableau for, since Power BI. And it has changed the game, to be honest, because I remember Pre and now, what, what it is now.

[00:19:09] Um, back then to do a dashboard, it was more static dashboard and it was mostly with Adobe tools like Illustrator and stuff like this. You take your time to design your, your document. I think people still do do this for some documents, but now it's much easier with Tableau or bi.

[00:19:28] So yeah, Excel, power, bi, Tableau, uh, one of these Gs tool P Gs or, or RGS probably the main. Tools for humanitarian data people.

[00:19:39] David Keyes: Yeah, that makes sense. Uh, great. Well, I want to switch gears just a little bit and talk about, um, some of the work that you've done. So, you and your team have developed several packages to work with data from various sources that are, I think, um, you know, relevant to working with humanitarian data.

[00:19:56] Um, you've built one to work with, um, Kobo. Um, then one to work with UNHCR data, and another package to work with humanitarian exchange platform data. Can you just talk about the kind of motivation to make those packages? Where did that come from? You know, what have the packages done since you've made

[00:20:15] Ahmadou Dicko: them? Yeah, thanks. I usually develop to solve one problem, usually not all the time, sometimes I also like to collaborate with people, if they come with an issue and I find it exciting, I work with them.

[00:20:25] But most of the time I do things just to solve something because it's frustrating for me. And Toolbox is basically the go to tools to do data collection for humanitarian workers. It's basically, uh, you have a server where you host your survey and you have clients. It can even be on your phone, working offline and everything.

[00:20:48] It's really, widely used in the humanitarian community. And not just UNHCR, but many other agencies. So, we expect usually people in that space to know a little bit about Kobo Toolbox. But, when you collect the data, then the data is on Excel on the server. I have my report, I do my survey, I ask the question to people and then the data goes to the server.

[00:21:09] And then I have to go there every time I download. I put it in a folder and I do my analysis. Well, it's not really like an ideal setup for me and I wanted something like faster in terms of just refresh and for example, building pipelines and everything. And luckily enough, I think they were also like working on some sort of API,

[00:21:33] a new version of the API, the version 2, which was like really not great compared to the first version. but we didn't have much, uh, I, I, I don't remember re NR package using the v2 when I started working on this. So I say, well, why not just wrapping this? And to be honest, I was highly influenced by ROpenSci.

[00:21:50] I really love the work they are doing I forgot to mention them when I mentioned like GameChanger in our community, I think. pushing for reproducibility and all these packages to access data is like something that's really resonate a lot with me.

[00:22:05] So yeah, I was in first one package called, uh, IODK, on our website and ODK is like basically like Kobo toolbox, more or less Kobo is some sort of fork of ODK. So I was saying, wow, this is really cool what they did actually. And I was really impressed. And so when I had the opportunity, I said, yeah, why not working on this?

[00:22:24] But I really took my time. It was really slow, like one fight at a time, one, one commit at a time until it was like really usable. And I just pushed it, pushed it to crack, but I'm, I'm using it really on a daily basis. It's really, uh, really, really super useful. And I'm really happy to see that also other colleagues, even outside of Unitia.

[00:22:43] In my own community are using it and it's funny because there's a lot of silent user You don't even know how many people are using your package usually you just know when it's not working You have an issue or sometimes people say hi. Oh, I'm using it's really cool But most of the time you are you are surprised when you say oh I used it But find it really useful for my for my line of work I can do a parameterized report with it and Yeah, that's a really helpful,

[00:23:08] David Keyes: Well, I would say I was almost a silent user. I was working with a client recently. Um, she works for an NGO called Everyday Peace Indicators and they, had done, um, surveys. They do post conflict surveys in countries where there's been war. And so they did some surveys in Colombia and Sri Lanka using Kobo to collect the data.

[00:23:32] And I, I said, I think there's a package. I, I've heard of this package. I've never used it. Um, and I was trying to get us to use it. It didn't actually work out because they needed to do a bunch of things with the translation after the data came in. I don't know, maybe there are ways to do that built into Kobo Toolbox, but given our time frame, we weren't, it, it didn't make sense to use it.

[00:23:55] But

[00:23:56] Ahmadou Dicko: Oh.

[00:23:57] David Keyes: that's how I came across it

[00:23:59] Ahmadou Dicko: Oh, that's nice.

[00:23:59] David Keyes: looked like a, uh, A great package. So just overall, I mean, it seems like I've used, for example, the, um, there's a Qualtrics package to access data or Google sheets for, I think of kind of in a similar vein where you have data that lives in some source and then as opposed to going and downloading the data, you can just access it directly.

[00:24:25] It sounds like your package, rcobot2blocks, is That's that exact same thing for Kobo. Is Exactly. That's exactly the exact same thing. And then after that, you just put you a design decision here and there, but that's exactly it actually. Qualtrics is a good example. And, but for Google, I was not familiar with it actually. So I'll probably look it up, but yeah, Qualtrics is a really good example actually, the way they did it, but we don't use Qualtrics.

[00:24:52] Ahmadou Dicko: So, uh, I mean,

[00:24:54] David Keyes: I know it's expensive. Yeah. And I don't know. Offline, you know, how much it works offline. Obviously you have specific needs in terms of the places where you're doing surveys. Great. Well, maybe if you, um, don't mind putting your screen up, I'd love to have you kind of give a walkthrough of what it looks like.

[00:25:13] And actually this, I will say this, I've never seen someone work in Emacs, so this will be the first time that I see that. So I may actually start out by asking you some questions about how that works. Yeah,

[00:25:22] Ahmadou Dicko: um, sure. No problem. So let me, um, let me know when you can see my screen.

[00:25:30] Um, yeah, okay, good. And I think you mentioned, uh, HDX, which is the Human2Data platform. It's basically the go to platform where we go to collect data. It was also one of the first packages, actually, I think, I developed. But I didn't push this one to CRAN because I think I was working with them back then and, uh, and they have a really good team.

[00:25:51] They also really changed the game of the Human2Data space. to really, really change the game and are still doing an amazing work.

[00:25:57] And I think one thing we go to find there usually is like layers. Uh, let's say I want the admin boundaries.

[00:26:09] They won the African Cup of Nations, right, so, let's see, Cote d'Ivoire.

[00:26:14] David Keyes: So we'll give them special, yeah, special

[00:26:16] Ahmadou Dicko: dispensing. Exactly. So that, basically, usually when you go there, you have something like this, right, and then you can come and you download the data and play with it. So the idea of the rsgx package, I can quickly show here, rsgx package, it's also based on secant, so there is a lot of work done in that So I already set it up, and I want to pull this data. Cool. So I will do pull data set, and I can just put the name of the data set, this one. Pretty well. And you have a data set object and the data set object is basically like this page. Boom. So you can now pull this file and the file in the, let's say the The jargon, the way they speak in CKAN is a resource, so I will get the second resources.

[00:27:04] Get resource number 2. And resource number 2 is a geodatabase, and I have a lot of layers in it. I want to know which layers. Let's say I want the admin 2. So, let me find the name of the layers first, right. And this is the name of the layers. Cool. So I will just find admin 2. Admin 2 is here. I copy, and I can just come here, do read resources, layer, equal, and I put the layer I want.

[00:27:37] And, uh, it's using the sf package for everything spatial, so I will load the sf package, and that's pretty much it. And I have a spatial polygon, I can just put Cote d'Ivoire. And, uh, I need to, and, and of course I will load the title device because I want to use digital, maybe something else. And, uh, I can look it up again.

[00:28:07] That's it. All right, so I can do a map now. ggplot,

[00:28:14] um, not, I don't even need this actually, gmsf. All right, and that's it. I have a map. Now, one other package that we developed that is also related to visualization. is UNHCR team. So UNHCR team, this one, and you see Cedric is the main author, is basically how to follow our guidelines on data visualization and make sure that you have the same recommendation translated into ggplot.

[00:28:45] So it's basically something related to branding. I think you, you, you did, you are doing a lot of work in that space. I can see from the report that you have on your page and everything. So the idea here is the same is definitely to have like. Uh, UNHCR branded, uh, charts. So when you have it, you just do temp UNHCR and since it's a map, I don't want the grid and the axis.

[00:29:14] Okay. This is more like a little bit of a style, but, uh, we are still working on a mapping style guide. And this is, uh, I mean, when you have the chance to chat with Cedric, probably go deeper into this. So you see in few line of code, I can just do a map. And play with it and do manipulation, join and whatever, and then share it.

[00:29:32] And that's the beauty of R. To be honest, back then you have to go, um, on, uh, HDX, download it, open it on QGIS or RGIS, and do something else just to have just a quick overview. So that's one thing. So that's RRTX. And, there's also one package I didn't mention is Akhlet, it's a project like a slash research project NGO working on, on conflict.

[00:29:56] And widely used is like, like the GoTo tools if you want to follow real time conflict. And I think with this package created just a small interface, which is small function if I want to have a sense of all the conflict. For example, in code four, uh, I put code four. I hope I don't have an issue. And, uh, let me check the structure of the data.

[00:30:22] That's it. And what I can do actually is, uh, Okay, I think I have latitude and longitude. I can put it in this map. Just for fun.

[00:30:35] David Keyes: Let's see. So this package is showing conflicts that have occurred? Yeah, exactly.

[00:30:40] Ahmadou Dicko: I think it's based on this project, acled, which is really, really useful if you want to know just more about, like, Conflict all over the place is definitely, the go to place.

[00:30:49] They have this nice analysis and everything, and also a way to export the data. They have an API. They are building a lot of nice tool actually, even a prediction tool for conflict. And uh, yeah. So the name is Akhlet. Um, yeah, really cool. So the R conflict location and event data project is really, really, really useful for my line of work because I'm working on false displacement.

[00:31:13] So conflict is really One of the reasons why people are leaving, uh, well, the, the actual, you know, place of residence. So here, just to make it fast, I have, I can just transform this one. I will just take, let's say 2023 data. Um, let me see. I still, I think, I think I still have plenty of data in 2023. All right.

[00:31:42] It's fine. Okay, let's do this. I do this, and I will turn it into a sf object. sf, coordinate, I think that's longitude, latitude, and projection, geographic, so not projection. So I will call it, uh, so I think it looks like it's working. I'll call it cvsf, and that's it. And let me see if I add it here, if it will work, cvsf.

[00:32:21] All right. Okay. So here, each point is one violent event that led to something. You also have access to the fatalities actually. So you can just do a size fatalities. and, um, but Cote d'Ivoire is quite, it's really a quiet country, it's really calm compared to the rest, so if you look at the event, it might not be really what we have, for example, in a country like Mali, Burkina Faso, Karen, whatever, so it's really, um, As I said, the national, the Africa Cup of Nation was really, really amazing.

[00:32:57] Did really good job. So it was just like one example of like different packages used, you know, uh, together to do something, right. So I led RGX to pull the, the layers to do my mapping and this one to, uh, to collect conflict data and just put all of this again. And that's why I like, you know, the and how expressive it is.

[00:33:18] You just put the pipe and then you just put things together. SF of course, which is a huge upgrade compared to sp uh, sp, I'm sorry, I'm speaking French, SP r Gs and everything, so, yeah. Yeah. Alright, so, well,

[00:33:33] David Keyes: and I think it's really amazing here, I mean, ahead, you know, this looks. It's pretty simple, but when you think about what this, what you just did, if you were trying to do that not using the packages that, that you have here, um, even just to access that data, you know, it would be going and downloading it and figuring out what's the right one.

[00:33:56] And you can now just use that code and, you know, go back. Tomorrow or next week or next month, next year, whatever, and, and update this, which is pretty

[00:34:07] Ahmadou Dicko: incredible. No, yeah, that's the main reason why we are really sticking to this type of tool and we are pushing.

[00:34:13] We're also doing, I think, internally in the organization, we are trying to do as much training as we can. To other colleagues that because you have a lot of colleagues They really want to learn about R, about Python and we have really more and more people that really just eager to push and learn And I think that's also what I like about your approach, you know The way you you show R and not to frighten people. We are just like a handful of statistician organization just I don't know how many, not, not that many, you know, and most of the people who are working on data don't have really that profile and some of them are not even computer scientists.

[00:34:48] So, but they are really excited. They are, they have this thing they want to learn and for me, that's the most important thing. And I strongly believe that everybody can learn this thing even easily if they just have this passion about, I want to learn and I want to improve the way I use the data and everything.

[00:35:05] So, yeah, we are pushing to have more and more people actually using because of what you said, because I think it will save time. And I think time is key for the line of work we are doing. Less time on this means more time to do other things or more analysis to support people in it.

[00:35:22] David Keyes: Yeah, that's great.

[00:35:23] Um, I wonder, do you have an example of, of using the Arcobo toolbox that you could show? I think this one I mean, I will say as much as for me selfishly, I'm just curious because again, the project that I work on, we didn't quite use it, but I was, um, it looked very interesting.

[00:35:39] Ahmadou Dicko: Yeah, I think what I will do here is more to show, um First, how it works, how it is in Kobo.

[00:35:46] I mentioned trainings and this is my personal Kobo server where I have most of the service here, mostly tests for, for the package. I'm trying to make sure it covers like all the things, but, uh, and I think one year, I remember we're trying to work on a training for the whole continent of Africa.

[00:36:08] So it was me and two statistician. Now they are no longer in our bureau in Africa, because in Africa, we have three bureaus. We have my bureau covering Western Central Africa. We have one bureau covering Eastern Horn of Africa, and we have one bureau in the southern part of the continent covering, you know, South Africa, basically.

[00:36:27] So Africa is split in three bureaus. So we decided to join forces. It's okay. Let's do a training on R. And before doing this training for all UNHCR colleagues working in Africa, we send this survey, basically, on capacity building. And if you look at the form, I think you already opened Kobo, so you have an idea of how it looks.

[00:36:49] So this is the web form. This is not the form if you use the phone, but if you have, you use your browser to fill the form, that's something you have. So where do you work? Which area do you work for? So we have like information manager, operational data management and stuff like this. And which option best describes your knowledge of R?

[00:37:09] I don't have any knowledge. I'm good. I'm advanced. And what you love to learn. So you will make really simple, like two, three questions to just have a sense of the people that will join the training. Yeah. So a basic, you know, uh, pre training server or something. And the data is here actually. You can go on Kobo, download, and export to Excel and download and play with it.

[00:37:30] Or you can use, uh, something like Kobo Toolbox. And the idea is basically that each survey identified by like some unique identifier, right? This is the UID. Sure. And this is what I will copy actually, and I can put it here. I can just parametize my report. Boom. I can put it here. I add my title and I will use UNH here.

[00:37:52] Down each. Here down. It's here. So we are building some set of templates, actually, using our markdown. So we have html. We have, Word document, PowerPoint, Sharingan, and I think so I will just do the HTML one, simple one. All right. And I have this and I, and basically here, because I want to just go step by step and then I'll run the server after that.

[00:38:20] Um,

[00:38:21] so the idea of, uh, getting access to your data using Cobo is first getting just your, your id. And with it you can directly use the main function. The main function is COBO data so basically what you need to do is just to have your unique identifier, the one from the survey. And you can use, of course, the parameterized report, uh, option of rmarkdown, and you can also use with GuaTu. And basically when you have it, you are set because, uh, the way to read the data in Kobo using RoboToolbox is just to plug either plugging directly your ID in the Kobo data function or using an asset.

[00:39:03] But the easiest is definitely just to plug your ID here. So let's maybe run this and go through the code. I will show you quickly how it works. All right. Good. As you can see, when you use Kobo Data, you have a data frame, and once you have a data frame, you can do whatever you want with it, right? So, that part, I think it's, it's okay.

[00:39:26] Now, what we did here, not to just write many things, is like, if you see, for example, where you work, which country, is similar to what we have in the original, in the original survey. And actually, to pull this information, we use, um, Here, . And I was saying that, uh, the Robot Toolbox package, relies on the label package and basically you have to type of label.

[00:39:51] You have label for variables. It have label for values. Mm-Hmm . And here you can even play with multilingual data. Have a, um, a vignette on the show, how to play with multilanguage surveys. And we do a lot of surveys like this actually. Yeah. Because you can work in an area where people speak Arabic, some others speak English, and so on.

[00:40:12] So you have to make sure that you can switch. And the package is also meant to work, uh, also that way. Basically, you have a log option on the COBOL data, and then you can just switch the language. You can also use the COBOL log to check the available language, and that's it. And, uh, and then you can switch on and off between the values and the label using the label package.

[00:40:35] You can also access the variable label. For example, this one, what's your name? In French, nom? And so on. And, uh, and I didn't invent anything. I just, you know, just use the label package because I think it's kind of a standard now. The haven package is using label under the hood to pull data from SPSS data and whatever, right?

[00:40:57] So I'm just doing the same thing because I still, I think it fits. And that's also one thing behind this package. It's like, when something is working well elsewhere, I don't try to just implement it. I just use it as a dependency. Same for the dm package. You have like really complex surveys where you repeat questions and then you have like multiple tables.

[00:41:18] But they are linked. So basically in Kobo, when you use Kobo data out of it, you don't have one data frame But you have a list of dm package And then you can play with everything that is available on the dm package to filter automatically the data, automatically join automatically join the data and everything so and and I think I like the idea of relying on really this type of strong packages instead of just I'm just trying to code myself the same logic when it's working, when there, so, so that's the thing with this, with this package actually.

[00:41:52] But for the rest, to be honest, it's just like, well, once you have a data frame, then everything that you know about the data frame you can do, you know, here, for example, playing with the data, I know where they work, they work mostly in country operation and not regional bureau like myself. . And I can see from this map, uh, joining the two data, this is a map for from our natural earth.

[00:42:16] I see that people are mostly from Nigeria, Booka, Faso, and Ethiopia. Mm-Hmm. . And I can, and I even process, I think text data using the tidy text package. Mm-Hmm. . And I see the most widely used word for when they were talking about the wishlist, what they want to learn and everything. And, uh, visualization, reports, analysis, data manipulation, and so on.

[00:42:40] And, uh, and also, you know, just finishing with this hitmap to understand the link between how cool they are in R and what position they have in the organization. I think it was quite useful for us before doing the training to just have this, short report without too much text. Just, you know, uh, visualizing the data and having like this bar chart here and there.

[00:43:01] And in terms of code, it's just like, even the sections are directly from the survey. And you can push further the logic and having even the text, but we didn't do this fancy thing. I know Hicham and his colleagues from Panama et al. They were really good at doing this type of parameterized report.

[00:43:20] Where everything was programmed, even like the markdown text and everything. But, uh, it was too meta for me, so

[00:43:25] David Keyes: I did the simple thing. But if

[00:43:29] Ahmadou Dicko: you have data with Kobo, please hit me up. I mean, we'll be happy to have a discussion with you and how to set it up. But it's really easy because then you have a data frame and then you can play

[00:43:38] David Keyes: with it.

[00:43:39] Yeah, and I mean it does seem like it just facilitates getting that data so, so easily. I mean, I think the challenge, I know the challenge for the client that I was working with was She, she's based in the U. S., but works with some contractors who are in Sri Lanka.

[00:43:55] And the problem was they had taken the data out of Kobo and then done the translations, um, in Excel or something else. So at that point it didn't really make sense to, to access it. Um, but, but yeah, I absolutely see what you mean.

[00:44:09] Ahmadou Dicko: Um, yeah, thanks. Thanks. And, uh, yeah, really useful for us, actually, humanitarians.

[00:44:16] And I think something I didn't mention, we have a group. I think humanitarians, I don't, I hope you're not making fun of us, but we are maybe the last people on Earth still using Skype. Maybe I'm wrong. . No, seriously. We try to kill Skype so many times, but we are, we are well stuck there. Yeah. So in Skype we have like few communities of met data people from different, uh, agencies, of course.

[00:44:37] Okay. And we have a group, we have in our group with 500 people. Wow. Actually working for different agencies actually. And sometimes we share these things and because Cobo is wide used by many other people. So you have a, we discuss, we tried to move to Slack. It didn't work. We tried many things, so I don't even know, but we're still stuck.

[00:44:54] In Skype, Microsoft one they decide to kill in. So yeah. Hey, and we just wanted to mention that we have a community on, on Skype. Okay.

[00:45:01] David Keyes: That's great. Um, well this is super helpful just to see. You know, the value of a package like this or, or the other packages that you showed is just. That you can access that data so easily, and at that point it's just a data frame that you can do whatever else you do with data frames, um, so all of the, you know, tidyverse syntax that you're familiar with is going to work just the same with this once you bring the data

[00:45:25] Ahmadou Dicko: in.

[00:45:26] Yeah. And, and one thing usually also when we talk to people, they say, Oh, you look at this figure, we can pull this from, from art quite easily. You can reproduce this report. That's why also we are trying to, uh, to push maybe one last package I didn't show is like the refugees, the refugees package. And, uh, basically if you are working on a report involving official statistics, uh, from UNHCR.

[00:45:52] Usually shared in this platform we call Refugee Data Finder, then you have the same data in this package actually. And, uh, it's a pure data package, so we don't pull something from an API, because it doesn't need that refresher data, it's refreshed officially twice a year. And you have the data, it's a nice format, then you can play with it to do some sort of, uh, analysis, uh, doing some reporting and everything.

[00:46:15] So just wanted to mention that one because it's also quite useful. And I think usually journalists are asking and using for, uh, this type of, because it's, it's like official statistics. We have 110 million possibilities placed in the world. So they want now to go deep dive in it and understand a little bit this official statistics.

[00:46:33] So this package is also quite, quite, quite useful. And I think. And isham is the main, the main author of the package.

[00:46:40] David Keyes: So, yeah. Great. Well, that's, that's great. We will, we'll list all of those packages, uh, in the show notes. So if people want to check them out, they can definitely do that.

[00:46:51] Um, great. Well, um, that was really useful. Super interesting to see how you use it. Um, if you want, you can, you can actually stop sharing at this, unless there's anything else you wanted to share at this point.

[00:47:08] Ahmadou Dicko: No, I think it's fine. I can go on for a long time, but I don't want to. I'm really excited about humanitarian data science and this type of thing.

[00:47:19] And I think we are just scratching the surface of it. Yeah, of course. We are so behind, and we are also looking at what other people are doing in other industries. That's why I'm also looking at a lot of your work. Your report, the parameterized report, and your work on branding. This is just great.

[00:47:36] Definitely what we need. When I look at your, your, your PGS work and all this report, this is just what our senior managers want. Yeah. And that's why people are investing, you know, within Adobe tool and all this tool to just have this report. And if you can do it from R, they don't really care too much because for them it's the end product.

[00:47:55] And I think you can save a lot of time and money actually if you invest in this tool. And in terms of quality, reproducibility, having less error because it's less cookies. So I, I see, I see tons of benefit actually of switching and people, but it comes with, we need to do a lot of trainings, making sure that, and that's the thing, but I think it's a, it's a, it's a good investment.

[00:48:18] David Keyes: Well, and what we always tell people too, is if you're just going to make one report, it's not worth doing it in R, but if you're doing any kind of parameterized reporting where you're going to make dozens or hundreds of reports, you don't want to do that by hand and, you know, Illustrator. InDesign or whatever.

[00:48:34] So at that point, that's, that's when it, when it makes sense to invest in something like the type of work that we do,

[00:48:41] Um, yeah. Great. Well, Ahmadou, um, thank you very much for coming on, for, for talking about how you use R in general and doing the walkthrough. It was really, really useful. So thank you for joining us.

[00:48:54] Okay. That sounds good.

[00:49:48] That's it for today's episode. I hope you learned something new about how you can use R. Do you know anyone else who might be interested in this episode? Please share it with them. If you're interested in learning R, check out R for the rest of us. We've got courses to help you no matter whether you're just starting out with R or you've got years of experience.

[00:50:08] Do you work for an organization that needs help communicating effectively with data? Check out our consulting services at rfortherestofus. com slash consulting. We work with clients to make high quality data visualization, beautiful reports made entirely with R, interactive maps, and much, much more. And before we go, one last request.

[00:50:29] Do you know anyone who's using R in a unique and creative way? We're always looking for new guests for the R for the Rest of Us pod. If you know someone who would be a good guest, please email me at david at rfortherestofus. com. Thanks for listening and we'll see you next time.

Sign up for the newsletter

Get blog posts like this delivered straight to your inbox.

Let us know what you think by adding a comment below.

You need to be signed-in to comment on this post. Login.

David Keyes
By David Keyes
March 20, 2024

Sign up for the newsletter

R tips and tricks straight to your inbox.