Building ODK into a biocultural monitoring system for Indigenous communities

Hello everyone! :wave:

I am a geographer and technologist that works with Indigenous communities to build and use digital tools that help them achieve their goals in territorial defense and land management with as much autonomy and as little dependency on outside support as possible. I have worked in the field with Indigenous communities in South America (especially Suriname, Brazil, Colombia, and Ecuador), in Canada, and in East Africa (in particular Kenya), and remotely have worked with communities across the world.

I am a long-time ODK user that has gone from helping Indigenous people use ODK Collect in the field to now integrating the toolkit more closely into a biocultural monitoring system for communities, and I thought I'd share what I'm working on, which involves working with self-hosting ODK Central, integrating with the API, and incorporating some new features like Entities :slight_smile:

Background with ODK and open source tools

I actually started using ODK back in 2015, when I was with a non-profit organization called the Amazon Conservation Team, working on a project to help communities in Suriname and Colombia to use ODK to streamline data collection for land management. The status quo at the time was that community members would collect GPS coordinates using a handheld Garmin GPS, and write down the UTM coordinates on a paper form with a description of the geopoint. As you can imagine, this process entailed a significantly high degree of error, a high learning curve for non-technical users, and a ton of manual processing later down the road. Once we found out that GeoODK and later ODK Collect could be used to collect exactly the same data but with better questionnaires and without needing to write down coordinates, we co-created ODK Collect forms with the communities in their own languages and with visual labels, and held a series of train-the-trainer workshops so that communities could be in a better position to manage their own data collection processes. You can read about that here.

While this streamlined the data collection process substantially and the community members were quite happy to have an easier tool to work with, the problem that we faced at the time is that these communities are located in very remote parts of the Amazon rainforest or in high altitude areas, so sending submissions to an aggregate server proved to be very difficult due to the low quality or total absence of connectivity. For villages that were fortunate to have a satellite WiFi, any submission with a media attachment would most often time out, leading to a lot of waiting and frustration. For entirely offline areas, community members had to actually send their devices to a connected area, and wait for it to be returned (this was before ODK Briefcase). And that was just for uploading the data. To actually receive the data back in a tangible, useful format (e.g. visualized on a map or referenced in a chart in a report), the communities had to wait even longer because that basically depended on us, the technical team of the project, to have to download the data from ODK Aggregate, process it, and find a way to bring it back to the community.

To solve this problem, we tried a couple of things. We found out about Portable OpenStreetMap (POSM), and actually implemented this in the field using Intel NUCs in a few places, pretty much exclusively to use the ODK Aggregate server on there. That worked out pretty well, but still required someone locally to know how to utilize the server, work with tabular data, and visualize it on a map, and that was still a lot for Indigenous communities with limited experience with digital technology. And well, around the same time, Esri's Survey123 came into the picture, and the Amazon Conservation Team embraced that tool since they had a non-profit partnership with Esri, and the ArcGIS toolkit lets you visualize the data on a web map and web apps instantaneously (at a cost, of course, and requiring usage of cloud services.)

For my part, these and other experiences in the field led me to take a career pivot from mainly helping communities apply existing technology to being more hands-on in building the technologies, and directly co-designing them with communities. Steeped in the realities of communities being entirely offline, and also increasingly demanding full control and sovereignty over their data and expressing concerns about using tools from big tech companies like Google or Esri, I started Terrastories, a free and open-source tool for mapping place-based oral histories that runs in the browser and can be hosted online, but also entirely offline on either a mesh network or via a WiFi hotspot. I also worked with Digital Democracy and contributed to the co-creation process of Mapeo, a FOSS tool for offline field mapping that stores data in a decentralized way on the devices themselves (i.e. no centralized server) using peer-to-peer sharing protocols, and with a UI that was entirely co-designed with Indigenous communities in the Amazon. At Dd, we also started the Earth Defenders Toolkit project, which is a platform with hands-on guides and case studies about the use of digital tools for an audience of community members, and now has an offline deployment that bundles together the platform and some of the most important tools for Indigenous communities, and serves those in a way similar to POSM.

A return to ODK: Central, Entities, and more for biocultural monitoring

Currently, I am working with Conservation Metrics where we are working with a US-based non-profit called Nia Tero and three of their Indigenous partner communities to build a biocultural monitoring system that will allow them to track their own self-determined vision of well-being via indicators and metrics about their communities and their territories. The system as currently envisioned will combine data collection tools (including both Mapeo and ODK) with data visualization tools such as Apache Superset and customized Mapbox/Maplibre maps to allow communities to instantaneously explore and make sense of their field data using different views. We are also working on a workflow to circulate change detection alerts (for example, about encroaching deforestation, logging, gold mining, or other threats) to the devices being used by Indigenous communities in the field, either in the format of offline background maps or as a GeoJSON attachment to a survey.

Some of our user requirements are working with primarily (if not free) open-source tools that are self-hostable, can work offline, are translatable, allow communities to own and control their data, and can be operated with as little dependence on outside support as possible.

While there are other mapping tools out there being used by Indigenous communities, ODK/XLSForm continues to be the go-to tool for non-geospatial data collection, so it will be part of our toolkit on that basis. However, one of the neat things for me as a returning ODK user is to see how many useful mapping features have been built into the toolkit since I last used it! For example, offline maps (in either raster or vector format) are a huge asset for Indigenous field data collection workflows. The ability to change the colors of past submissions on a map is really great and helpful as well. And then there is Entities, which I'll get to in a bit. I will look forward to seeing even more functionality for maps in ODK (and would also be glad to help think through future features) - for example, being able to add an mbtiles via the app UI directly, style vector data (e.g. by adding a style.json file), and maybe give the option to add a label to points on the map so they can be more easily distinguished when there are numerous points in one region.

One of the tools we are building is a messaging bus service that hits the APIs of the respective data collection tools, downloads all of the submissions and media attachments into a secure data lakehouse owned by the communities, and places them in an optimized format for retrieval by the other services (such as Superset and Mapbox). The goal of this is to essentially automate the whole process which I described above of downloading data from ODK Central and formatting the data for usage in a third party tool. Using the service, the ODK data once submitted will be download and appear almost right away in the third party services, much like how it works in Esri. One cool thing that we've already built for Superset in particular is taking the XLSForm translations and using them to create bespoke charts and dashboards in each of the languages, which may be Western or Indigenous.

We are still in an overall research phase for the biocultural monitoring system, where we are looking at what's already out there that we can use, before we start narrowing down and building more. Here are some of the things we are tinkering around with and considering that relate to ODK:

  • Entities: Entities can be a game-changer for monitoring for Indigenous communities, because it allows the user to revisit the same place or incident and report on the status of what happened there. Take the example of an oil spill (a sadly common incident in the Ecuadorian Amazon). Once encountered, community members can create a geopoint for the location where the incident occurred, and then report back on the status of the pollution on successive visits. We are also thinking about using Entities to circulate change detection alerts: once a new change detection alert is created, we can use our messaging bus service to update a GeoJSON file on an ODK Central server, so that ODK Collect users can download it, view the alerts on a map, and submit reports on whichever incident they are visiting in the field. Generally, I'm really thrilled about Entities and feel that it's a feature that most, if not all, data collection tools are currently lacking.

  • Other ways to visualize ODK submissions: according to our user interviews thus far, the most helpful ways to visualize field data are using maps, charts & graphs, tables, and a gallery of media attachments. (Notably, these are the services provided by the KoboToolbox server as well.) We would like to find open-source, self-hostable services that can do these things and that can be integrated with our messaging bus tool to automate the process of bringing in ODK and other data. Thus far, for maps, we are considering building our own Mapbox/Maplibre tool. For charts & graphs, we're looking at Superset since we have not found anything else for easily making charts and dashboards that is free/open-source (I know Tableau and Power BI are very often used by other ODK users, but these don't meet our user requirements). For tables or a media gallery, we have yet to find the right tools. If anyone has ideas for any of the above, or experience working with ODK data in a tool like Superset, we would love to learn more about that!

  • Form builder: In my experience, one of the big blockers for Indigenous communities in autonomously using ODK is the learning curve involved in making forms. XLSForm is very powerful but does requires an understanding of logic and how data values work, and depending on the complexity of the form, comfort in writing some basic validations with variables and curly brackets. This is usually asking too much, even for some of our Indigenous community users with a higher-than-usual level of experience with technology. Hence, form builders are a very important feature for us. The KPI formbuilder used by KoboToolbox and Ona is quite nice, but it's not yet clear to us how easy it will be to carve this out of the overall stack that either service uses. It's likely much easier to deploy ODK Build, and we're very interested in doing so but would like to see if there is openness or existing efforts to improving the UI to make it more intuitive and easy to use - we might be able to contribute.

  • ODK Central Docker images: For offline deployment, we are using Docker compose to manage the DevOps of all of the different services we are bundling together. ODK Central has a really great Docker deployment setup, but for our infrastructure and using Kubernetes/clustering we could benefit from having published images e.g. on Dockerhub. I see from previous threads that this might get tricky with authentication. But we may want to think through this with anyone else interested, to see if it's nevertheless possible.

If anyone has any thoughts or ideas to share about any of the above, they are most welcome indeed!

This is already quite long so I'll leave it there, but hoping it's interesting for folks (if you've made it this far, thanks for reading!) and that through our use case we can find ways to contribute, and generally I'm really glad to be returning to the ODK ecosystem (big shout out already to @LN who has pointed me to a number of the above features that were new to me). I'll also be happy to keep sharing how things are going as we build things out for the biocultural monitoring system :smiley:

7 Likes

Welcome to the forum, @rudo!

Thanks for this wonderful write-up and the ideas and resources posted. It's inspiring to read about your journey.

We've had recent discussions about making XLSForms more accessible and lowering the (real or perceived) entry bar for newly onboarded form designers.
Your linked resources would be amazing in a showcase discoverable to a wider ODK user audience (ping @Aly_Blenkin).

If you're looking at visualisation and dashboarding options, ruODK opens up the wide world of R based data visualisation - leaflet maps, RShiny dashboards, any JS library can be easily wrapped and used in Quarto documents.

As for Build, a recent PR updates the documentation providing an offline version through docker compose. However, Build is not actively maintained by the core team any longer and lacks a good few of the newer features such as entities (list here), but it's a fantastic starting point, as you can export to XLSForm and add the missing bits.

The Kobo form builder is deeply embedded in the stack, and it would be a non-trivial amount of work to separate it into a standalone tool. (ping @Tino_Kreutzer)

On the upside, I've recently been treated to @LN's whirlwind live demo on building entity based workflows in XLSForm and Central which showed impressively how accessible XLSForm is after all.

Let us know how you go and give us a yell here if you get stuck!

2 Likes

This is a great post, thanks for sharing @rudo!

I agree with @rudo that a graphical formbuilder is an important feature for some first time users. As Florian mentioned, the Kobo formbuilder was created to address this gap, and since it's built around the XLSForm standard and given that Central uses XLSForm as well, it's possible for a user to create a form in Kobo before deploying it in Central or any other XLSForm-compatible tool. It's a free tool so hopefully addresses the need you mentioned.

2 Likes

I'm actually speaking to Rudo today, so the timing is perfect :slight_smile:

Thank you very much @rudo for this very complete and informative presentation of your ODK story, and for the resources you pointed out.

We started at the same time with the same tools and the same desire to use self-hosted and opensource tools!

A few comments on automation and reporting in a SQL context :

We used to directly read data from Aggegate's PostgresSQL database. As all our tools are connected to our own 15 years old PostgresSQL database, things were pretty easy and close to real-time access.
Since we switched to Central, we've developed PostgresSQL functions to automate data transfer from Central to our database and cron jobs to schedule them:

Now that the magic ODK Team has made possible to filter OData subtables, we will be able to increase frequency of the cron tasks and we will be very close to real-time data access.

A few years ago, we needed to create web dashboards to display charts about chemical and physical characteristics of coastal lagoons.
We chose Redash (because it can export chart as images), but other tools were envisaged .
We didn't tried superset and we now have a lot of queries and dashboard in Redash.
AS Redash is now restarted as a community project, I think we'll continue a moment with it.

And even if we intensively use Redash, I really appreciate Metabase which can be installed as a server or as a desktop tool.
Both Redash or Metabase (I don't know for superset) miss an Odata connector that may be a game changer. A opensource dashboard tool easily connected to Central would be great.

Thanks again for you inspiring post !

1 Like

Hi all, thanks so much for the kind words and welcome! :slight_smile: I'm really excited to read all of your thoughts and suggestions. I will be digging into all of the resources that have been shared next week. (And yesterday I had a great call with @Aly_Blenkin and @issa about the system we are building, our user requirements, Entities, and more!)

I thought I'd share a bit more about what we are using to read and process data from XLSForm servers so far. We have built a custom messaging bus tool using the Thespian actor concurrency library, which can exchange data with multiple sources in an automated fashion. Since our system entails the usage of a number of different tools (e.g. ODK, Mapeo, Superset, Mapbox maps, Twilio, and more), each with a different API or expected data retrieval method, this is a useful way for us to create components to interact with each service and translate the messages.

We have already created components to read data from the ODK Central and KoboToolbox APIs, and store those on a SQL data warehouse. Then, tools like Superset can read the data from the warehouse for dynamic visualization. As of today we've already got it working with a few forms from both ODK Central and KoboToolbox, and it's running pretty nicely! Here is one sample view visualizing form submissions collected using one of the Collect apps + Enketo:

So far, we like what Superset has to offer, but I will check out Redash, Metabase, and ruODK. I also look forward to checking out central2pg and the pyODK tool! We still have to figure out an optimal way to download and store media attachments, so I'm sure there is a lot to learn there.

One of our big infrastructure challenges is that we are trying as much as possible to build a toolkit that can be self-hosted entirely, and ideally also in offline environments. Many of our Indigenous partners want this because of the lack of internet connectivity in their villages, or a desire to have full control over their data. So a self-hostable formbuilder would be a nice feature for us. It's great that we can add Build to our docker compose setup, and I'd definitely be interested in a conversation around what it might take to keep that project alive, if there is interest. That said, some members of our team have indeed used the Kobo formbuilder as a starting point before cleaning it up and adding further complexity in XLSForm, which is a good workflow overall when internet access is not an issue.

Thanks again, all! Looking forward to continuing the conversation.

2 Likes