Webhooks in ODK Central

martijnvaandering · December 19, 2022, 3:18pm

1. What is the general goal of the feature?

making integrations possible by creating webhooks.

2. What are some example use cases for this feature?

Automating incoming requests and trigger automated processes like forwarding the items into PowerBi/Teams/Slack you name it.

3. What can you contribute to making this feature a reality?

I'd like to build this into ODK/Enketo. But i'd like some guidance from an architect on where to code this.

Cheers, Martijn

LN · April 19, 2023, 7:18pm

Thanks for writing this up @martijnvaandering and for the interest in possibly implementing webhook support. The first step will be to do some requirements gathering to really understand the workflows that you and others want to support and to drive out where webhooks would fit in in relation to other data sharing options there are currently.

Here are solutions that people currently use for data sharing from Central to other systems:

OData connection from PowerBI/Excel/Tableau. This can be configured entirely through those tools' UIs including basic auth. Refresh is done via polling either on demand or periodically. The full data document is downloaded each time. See docs
JSON request (OData document). A script or service can request this periodically similarly to PowerBI/Excel/Tableau above. It can either ignore types or get schema information from the OData metadata document like OData consumers, from the simplified schema endpoint, from the XForm, from the XLSForm.
JSON request using a $filter on updatedAt to only request new submissions. Similar to above but requires maintaining state. E.g. pyODK: Using cursors to efficiently pull new data only
CSV request. Similar to JSON above, can also be filtered, but usually a less convenient format than JSON.
Raw XML submission request. Usually this will not be as convenient as the methods listed above for building integrations.

The common thread with all of these above is that they require some system to poll Central.

Push vs. pull

The idea with webhooks is that Central could push notifications to external systems instead. There are always tradeoffs between pushing and polling. Having Central push updates is attractive to external systems because it may save them bandwidth and complexity. However, that comes at the cost of additional complexity for Central. Implementing webhooks well requires having strategies for retrying failed calls, stopping gracefully in the case of misbehaving target services, logging attempts to contact target services, etc.

One way to mitigate the downsides of polling are to introduce a middleware service that polls Central from the same machine and then makes webhook calls to target remote services. That middleware component can handle business logic around e.g. what to do in case of failure. This can be a small custom service (with the downside that this requires software development) or something like OpenHIM (which may introduce more complexity than desired). There are also no or low code solutions for this like Zapier, If This Then That, Activepieces (open source), OpenFn.

The tradeoffs between pushing and polling also depend on the specific needs. For example, if you have a dashboard that needs to be updated once a day, polling daily may be a strictly better choice. If your dataset is relatively small (something like <10,000 submissions x 200 questions), many efficiency considerations won't really matter.

How you can help

All of this said, for anyone interested in webhooks including @martijnvaandering and @ibrahim_i3 (from Adding webhooks to a form), it would be helpful to know the following:

What system would you like to integrate with?
What action on that system would you like to perform?
What Central activity would you like to use as a trigger? E.g. a new submission comes in? A new submission or edit comes in? A submission's approval status changes? Etc.
What information from Central would you like to get when the action is triggered (e.g. just the instanceID of the new submission, the submission's data, the submission's data and approval status, the submission's data and metadata, etc)
How close to real-time does that action actually need to be? (e.g. someone's life might be in danger if a Slack message is delayed by more than 5mins vs. a batch weekly email would be sufficient)
What is your fault tolerance? (e.g. I need most submissions to get to the target system but dropping a few here and there is not a big deal vs. someone's life might be in danger if submissions are dropped between systems)
What have you tried so far?

@martijnvaandering, you specifically mention PowerBI. I think the OData feed mentioned above is your best bet in that case. I don't believe PowerBI exposes a webhook endpoint.

For Teams, it would be helpful to know the answer to the questions above and specifically what Central activity you would like to use as a trigger and what ODK data you want to use within Teams.

For Slack, you could use a custom workflow to poll Central up to daily. If you need near-real-time Slack messaging, you could use a layer like I described above to poll Central and then make Slack webhook requests. I can think of at least one person doing this now and can ask them for more specifics if that would be helpful.

You also mentioned Enketo specifically and I wanted to address that. Enketo Express is a service built to work with servers that implement the OpenRosa API. If you don't want to use the OpenRosa API, you could fork Express to send to an arbitrary endpoint or build your own lightweight wrapper around Enketo Core. At this time, we are not interested in a contribution to Enketo Express to add submission to arbitrary endpoints.

tertek · November 24, 2023, 5:03pm

This topic got very relevant for a current business case:
The user is required to send emails to participants based on participant (in)activity. More specifically, it is a requirement to send out emails (given an email address will be supplied) for these purposes:

confirm a participant's form submission
invite a participant to a form submission
remind a participant to fulfill a form submission

TL;DR / Background: Call our API endpoint when a submission comes in, at a specific point of time or periodically based on a condition AND submit submission data to be used for email sending on our end. We have to use our own server (and not something like Zapier) because there is no way to authenticate SMTP from outside of our network.

What system would you like to integrate with?

our custom API endpoint, to perform GET/POST requests

What action on that system would you like to perform?

read data, validate data and send emails based on that data

What Central activity would you like to use as a trigger? E.g. a new submission comes in? A new submission or edit comes in? A submission's approval status changes? Etc.

Case 1: a new submission for Form F comes in
Case 2: [Not a Central activity - time based] Define an explicit point of time, and a specified set of data with conditions, e.g. a collection of submissions from Form F that fulfill condition C
Case 3: [Not a Central activity - time based] Define a timer for a limited duration, that checks periodically for Interval T if a Condition C is true for each submission of Form F and returns a collection of those that are true. The Condition can be described with Comparison and Logical Operators with referencing the current instance (and its fields). For example:

this_submission.some_field_name = false

What information from Central would you like to get when the action is triggered (e.g. just the instanceID of the new submission, the submission's data, the submission's data and approval status, the submission's data and metadata, etc)

Case 1: the data specified, e.g. some fields from the submission and the instanceId
Case 2 & 3: specified fields and instanceId of all submission whose condition is true

How close to real-time does that action actually need to be? (e.g. someone's life might be in danger if a Slack message is delayed by more than 5mins vs. a batch weekly email would be sufficient)

both cases: a delay of 5 mins seems legit, as long as the queue is immediately visible

What is your fault tolerance? (e.g. I need most submissions to get to the target system but dropping a few here and there is not a big deal vs. someone's life might be in danger if submissions are dropped between systems)

three retries seems legit till failure, as long as jobs get logged

What have you tried so far?

We are protoyping a script to send emails by periodically calling the ODK API, checking conditions on relevant forms and fields. A webhook-provided solution would be in any case more reliable, sustainable and independently usable for non-developers.

@aurdipas

spwoodcock · November 15, 2024, 4:40pm

Quick follow up to this.

I really need to utilise webhooks in my project to trigger an action when an Entity has it's status key updated.

I am looking at developing a lightweight Postgres NOTIFY/LISTEN service that triggers a POST to a given webhook URL, upon update of an Entity. Most likely it will be written in Golang and deployable alongside Central as a single binary (standalone, or within a container).

Further details to track progress: https://github.com/hotosm/fmtm/issues/1841

Ideally I want to make this as generic as I possibly can to benefit other users in the community that need this functionality (including both submission and entity triggers).

The project probably won't be started for a good month, but I'll post updates here if any!

zizi · November 17, 2024, 11:41am

@spwoodcock not sure if you might have come across this blog post https://www.crunchydata.com/blog/real-time-database-events-with-pg_eventserv

spwoodcock · November 17, 2024, 1:30pm

Love this! Thanks Crunchy have so many awesome tools

On first inspection I'm not sure it covers what I need though, for a few reasons:

It's more for interfacing directly with the client, to receive notifications. Instead I need to update a value in another applications database.
Having a persistent web socket is a bit heavyweight compared to a simple webhook call.

Not saying that pg_eventserv isn't useful in this context! It may be exactly what somebody needs. Say they need other users to be notified real time when another user makes a submission (like a manager being notified as data comes in). I would be keen to test this!

The application I develop already has real time notifications to users when data changes. However, I need to trigger an update to the data upon submission to ODK (via webhook).

Would be great to document both approaches somewhere!

mathieubossaert · November 18, 2024, 8:35am

Hi Sam, thanks a lot for it and for considering also submissions
I was looking for such a tool to automate data retrieving from Central to our database (using pl/pgsql or pl/python function). As I didn't found, I still use psql calls within cron tasks.
It works really fine but a pg_listen() approach would be even more efficient, only asking our server to work when necessary, and closer to a realtime workflow.

punkch · November 19, 2024, 11:58am

@spwoodcock I've skimmed through the linked github issue, but don't overrule the polling option completely. Central has a brilliant audit log api that has info for just about everything that happens serverside. It supports start and end parameters, so if you store the last time a scheduled pull ran successfully, you can pass it as start for your next run and do it on a pretty aggressive schedule. Or if it has to be a trigger, may be make only one on the audits table for a more generic approach.

spwoodcock · November 20, 2024, 9:01am

This post is gold! Thanks so much @punkch

I have honestly never even looked at the audit logs API - assumed from the name it wasn't useful to me. That was silly!

I think you are right that any trigger based webhook-calling service should be using the audit logs, configurable by audit log event types available

If webhooks-calling was ever integrated directly into Central, I guess this would be the place!

chun_hing_yap · November 27, 2024, 11:58pm

I just try to understand deeper in particular for submission.create/update and entity.create/update action in audit log

For entity create/update, it does show the dataset and the uuid of the entity. But , lack of project id.

On the other hand, submission create/update only will show instance id. It is lacking of form id and project id.

With this limited info, how to get the submission and entity data when the api requires project id and form id input? Any idea?

spwoodcock · November 28, 2024, 4:39am

It should be easy to get the entity or submissions via it's ID directly in the database tables (if creating a webhook in the way described, I am assuming direct database access):

Monitor audit logs in DB.
Get ID of form, submission, or entity from event.
Filter relevant table by ID to get details of form, submission, or entity.

I haven't looked at the data included in the audit logs yet though, so perhaps the data updated is included as part of the entry, making the above unnecessary.

If getting logs entirely by the API that's a different story!

I understand the API URL structure is aligning with REST, but as UUIDs are globally unique, it could be easy enough to have /entities/{uuid} to access the data without a project id. Maybe I am missing something

spwoodcock · January 20, 2025, 7:33pm

Just a quick note to say that work is in progress here:

The logic is almost there. I just need to finalise, test, and maybe add some API auth logic too. Hopefully done in a few days

The idea is to deploy this alongside ODK Central as a lightweight service, attached to the ODK Central database. It hooks into the audits table (audit log) using LISTEN/NOTIFY and then calls a remote webhook on Entity update and new submission creation.

odkhook -db {db_connection_string} -webhook {webhook_url}

I just wanted to check, is using the names odk-webhook and odkhook acceptable for this, or a conflict for the ODK branding?

I used these, as it's quite specific for ODK, intended to be used alongside Central. But I could rename if this is a problem

Sadiq_Khoja · January 22, 2025, 12:39am

HI @spwoodcock

This is great work, thank you. Regarding the name, I will let our legal expert @yanokwa to answer :).

I tried using the service and have few suggestions/questions, they are probably under your radar:

Is there a way to change the logging level to verbose or debug? When I ran the docker, it showed {"time":"2025-01-22T00:21:15.897236988Z","level":"INFO","source":{"file":"main.go","line":64},"msg":"listening to odk-events channel"} but my web-hook was not called. Also in the database, I don't see any new trigger on the audits table.
What should web-hook be expecting in the request body? Will it contain the submission/entity xml/json or just their UUID?
What do you think about something like https://github.com/Nextdoor/pg-bifrost which uses Postgresql logical replication instead of LISTEN/NOTIFY that doesn't need any trigger to be created.

spwoodcock · January 22, 2025, 7:57am

Hi @Sadiq_Khoja, thanks for the feedback!

I was perhaps a little preemptive posting, as it was just a non-functional proof of concept. I wanted to let people know progress is undeway

Yesterday I updated a lot that isn't comitted yet, including extracting the entity and submission data for sending in the request body.

I will definitely check the logical replication approach you mention - thanks for the heads up!

I will update here soon

yanokwa · January 22, 2025, 10:41pm

Thanks so much for checking! Would central-webhook be acceptable? Then you avoid the issue altogether. The title could then be "Webhook for ODK Central"

spwoodcock · January 24, 2025, 2:48pm

Of course!

That's perfect, in line with the central-xxx naming of the ODK repos too.

I just updated it

Hoping to finalise and release today.
If it works nicely / people like it, I'm definitely happy to donate this to the ODK org (if it's useful and wanted!)

spwoodcock · January 29, 2025, 7:46pm

Apologies for the delay everyone - as always, something urgent gets in the way!

I made the first release of central-webhook just now:

All of the details should hopefully be in the repo, but please let me know if documentation needs updating anywhere / things aren't clear

A small example is included in the README too, for a FastAPI server webhook.

I will add some links here to the fully implemented version in our app soon, as it would be best to run this via a container orchestrator to better handle the service lifecycle.

There are a couple of issues to work on (would love to update to use the logical replication approach mentioned by @Sadiq_Khoja), but for the most part everything is working as intended!

Edit I realised I need to update that ascii art to say 'Central Webhook' now! Will do that soon

There are limitations to this approach.
Currently only 3 events are supported:

New submissions

Submission review (submission update)

Entity update (update to entity property values)

In future we could consider a full replication of the audit event stream into another database, allowing the ultimate flexibility to use any audit event we wish.

spwoodcock · January 30, 2025, 12:26am

As promised, here is a usage example, integrating the lightweight (10-20mb, low memory footprint) central-webhook container as part of a docker compose stack:

github.com/hotosm/fmtm

Integrate central-webhook service for triggering entity status updates in FMTM database (real-time updates)

development ← feat/webhook

opened 11:10PM - 29 Jan 25 UTC

spwoodcock

+234 -56

## What type of PR is this? (check all applicable) - [x] 🍕 Feature - [ ] 🐛 B…ug Fix - [ ] 📝 Documentation - [ ] 🧑‍💻 Refactor - [ ] ✅ Test - [x] 🤖 Build or CI - [ ] ❓ Other (please specify) ## Related Issue Related to #1841 ## Describe this PR - Add the [central-webhook](https://github.com/hotosm/central-webhook) to local compose stack. - Add a route POST `/integrations/webhooks/entity-status` to be triggered by the webhook and update the entity status in our db. - Uses the new X-API-Key header form of login from the integrations router. TODO in the next PR - add this to prod compose setup, including environment variables config everywhere. ## Screenshots ![image](https://github.com/user-attachments/assets/3387f615-ca88-4a13-9b79-4926afb1e61c) ![image](https://github.com/user-attachments/assets/fb3d9dda-d6eb-4da4-a412-7da5b7612643) ![image](https://github.com/user-attachments/assets/2bf30e5b-715b-429a-801a-a2241eff6b52) ## Review Guide - Update the status of an entity by mapping it in ODK Collect. - The building should turn green automatically (without needing to 'sync status'). ## Checklist before requesting a review - 📖 Read the FMTM Contributing Guide: <https://github.com/hotosm/fmtm/blob/main/CONTRIBUTING.md> - 📖 Read the HOT Code of Conduct: <https://docs.hotosm.org/code-of-conduct> - 👷‍♀️ Create small PRs. In most cases, this will be possible. - ✅ Provide tests for your changes. - 📝 Use descriptive commit messages. - 📗 Update any related documentation and include any relevant screenshots. ## [optional] What gif best describes this PR or how it makes you feel?

There is also some code showing the usage of the webhook to call a FastAPI Python endpoint, using an API key

Edit 12-02-2025
^^^^^^^^^^^^^^^^^^^^^
Reporting back, this has been working pretty well for our use case on our staging server, no issues so far

Considering moving from release candidate --> full stable release.

Once further testing is carried out, and the stable release made, what would the ODK team say to adding this to docs as an optional extra during Central setup?