Webhooks in ODK Central

1. What is the general goal of the feature?

making integrations possible by creating webhooks.

2. What are some example use cases for this feature?

Automating incoming requests and trigger automated processes like forwarding the items into PowerBi/Teams/Slack you name it.

3. What can you contribute to making this feature a reality?

I'd like to build this into ODK/Enketo. But i'd like some guidance from an architect on where to code this.

Cheers, Martijn

Thanks for writing this up @martijnvaandering and for the interest in possibly implementing webhook support. The first step will be to do some requirements gathering to really understand the workflows that you and others want to support and to drive out where webhooks would fit in in relation to other data sharing options there are currently.

Here are solutions that people currently use for data sharing from Central to other systems:

  • OData connection from PowerBI/Excel/Tableau. This can be configured entirely through those tools' UIs including basic auth. Refresh is done via polling either on demand or periodically. The full data document is downloaded each time. See docs
  • JSON request (OData document). A script or service can request this periodically similarly to PowerBI/Excel/Tableau above. It can either ignore types or get schema information from the OData metadata document like OData consumers, from the simplified schema endpoint, from the XForm, from the XLSForm.
  • JSON request using a $filter on updatedAt to only request new submissions. Similar to above but requires maintaining state. E.g. pyODK: Using cursors to efficiently pull new data only
  • CSV request. Similar to JSON above, can also be filtered, but usually a less convenient format than JSON.
  • Raw XML submission request. Usually this will not be as convenient as the methods listed above for building integrations.

The common thread with all of these above is that they require some system to poll Central.

Push vs. pull

The idea with webhooks is that Central could push notifications to external systems instead. There are always tradeoffs between pushing and polling. Having Central push updates is attractive to external systems because it may save them bandwidth and complexity. However, that comes at the cost of additional complexity for Central. Implementing webhooks well requires having strategies for retrying failed calls, stopping gracefully in the case of misbehaving target services, logging attempts to contact target services, etc.

One way to mitigate the downsides of polling are to introduce a middleware service that polls Central from the same machine and then makes webhook calls to target remote services. That middleware component can handle business logic around e.g. what to do in case of failure. This can be a small custom service (with the downside that this requires software development) or something like OpenHIM (which may introduce more complexity than desired). There are also no or low code solutions for this like Zapier, If This Then That, Activepieces (open source), OpenFn.

The tradeoffs between pushing and polling also depend on the specific needs. For example, if you have a dashboard that needs to be updated once a day, polling daily may be a strictly better choice. If your dataset is relatively small (something like <10,000 submissions x 200 questions), many efficiency considerations won't really matter.

How you can help

All of this said, for anyone interested in webhooks including @martijnvaandering and @ibrahim_i3 (from Adding webhooks to a form), it would be helpful to know the following:

  • What system would you like to integrate with?
  • What action on that system would you like to perform?
  • What Central activity would you like to use as a trigger? E.g. a new submission comes in? A new submission or edit comes in? A submission's approval status changes? Etc.
  • What information from Central would you like to get when the action is triggered (e.g. just the instanceID of the new submission, the submission's data, the submission's data and approval status, the submission's data and metadata, etc)
  • How close to real-time does that action actually need to be? (e.g. someone's life might be in danger if a Slack message is delayed by more than 5mins vs. a batch weekly email would be sufficient)
  • What is your fault tolerance? (e.g. I need most submissions to get to the target system but dropping a few here and there is not a big deal vs. someone's life might be in danger if submissions are dropped between systems)
  • What have you tried so far?

@martijnvaandering, you specifically mention PowerBI. I think the OData feed mentioned above is your best bet in that case. I don't believe PowerBI exposes a webhook endpoint.

For Teams, it would be helpful to know the answer to the questions above and specifically what Central activity you would like to use as a trigger and what ODK data you want to use within Teams.

For Slack, you could use a custom workflow to poll Central up to daily. If you need near-real-time Slack messaging, you could use a layer like I described above to poll Central and then make Slack webhook requests. I can think of at least one person doing this now and can ask them for more specifics if that would be helpful.

You also mentioned Enketo specifically and I wanted to address that. Enketo Express is a service built to work with servers that implement the OpenRosa API. If you don't want to use the OpenRosa API, you could fork Express to send to an arbitrary endpoint or build your own lightweight wrapper around Enketo Core. At this time, we are not interested in a contribution to Enketo Express to add submission to arbitrary endpoints.

5 Likes

This topic got very relevant for a current business case:
The user is required to send emails to participants based on participant (in)activity. More specifically, it is a requirement to send out emails (given an email address will be supplied) for these purposes:

  • confirm a participant's form submission
  • invite a participant to a form submission
  • remind a participant to fulfill a form submission

TL;DR / Background: Call our API endpoint when a submission comes in, at a specific point of time or periodically based on a condition AND submit submission data to be used for email sending on our end. We have to use our own server (and not something like Zapier) because there is no way to authenticate SMTP from outside of our network.

What system would you like to integrate with?

  • our custom API endpoint, to perform GET/POST requests

What action on that system would you like to perform?

  • read data, validate data and send emails based on that data

What Central activity would you like to use as a trigger? E.g. a new submission comes in? A new submission or edit comes in? A submission's approval status changes? Etc.

  • Case 1: a new submission for Form F comes in
  • Case 2: [Not a Central activity - time based] Define an explicit point of time, and a specified set of data with conditions, e.g. a collection of submissions from Form F that fulfill condition C
  • Case 3: [Not a Central activity - time based] Define a timer for a limited duration, that checks periodically for Interval T if a Condition C is true for each submission of Form F and returns a collection of those that are true. The Condition can be described with Comparison and Logical Operators with referencing the current instance (and its fields). For example:

this_submission.some_field_name = false

What information from Central would you like to get when the action is triggered (e.g. just the instanceID of the new submission, the submission's data, the submission's data and approval status, the submission's data and metadata, etc)

  • Case 1: the data specified, e.g. some fields from the submission and the instanceId
  • Case 2 & 3: specified fields and instanceId of all submission whose condition is true

How close to real-time does that action actually need to be? (e.g. someone's life might be in danger if a Slack message is delayed by more than 5mins vs. a batch weekly email would be sufficient)

  • both cases: a delay of 5 mins seems legit, as long as the queue is immediately visible

What is your fault tolerance? (e.g. I need most submissions to get to the target system but dropping a few here and there is not a big deal vs. someone's life might be in danger if submissions are dropped between systems)

  • three retries seems legit till failure, as long as jobs get logged

What have you tried so far?

  • We are protoyping a script to send emails by periodically calling the ODK API, checking conditions on relevant forms and fields. A webhook-provided solution would be in any case more reliable, sustainable and independently usable for non-developers.

@aurdipas

1 Like

Quick follow up to this.

I really need to utilise webhooks in my project to trigger an action when an Entity has it's status key updated.

I am looking at developing a lightweight Postgres NOTIFY/LISTEN service that triggers a POST to a given webhook URL, upon update of an Entity. Most likely it will be written in Golang and deployable alongside Central as a single binary (standalone, or within a container).

Further details to track progress: https://github.com/hotosm/fmtm/issues/1841

Ideally I want to make this as generic as I possibly can to benefit other users in the community that need this functionality (including both submission and entity triggers).

The project probably won't be started for a good month, but I'll post updates here if any!

4 Likes

@spwoodcock not sure if you might have come across this blog post https://www.crunchydata.com/blog/real-time-database-events-with-pg_eventserv

2 Likes

Love this! Thanks :pray: Crunchy have so many awesome tools :grin:

On first inspection I'm not sure it covers what I need though, for a few reasons:

  • It's more for interfacing directly with the client, to receive notifications. Instead I need to update a value in another applications database.

  • Having a persistent web socket is a bit heavyweight compared to a simple webhook call.

Not saying that pg_eventserv isn't useful in this context! It may be exactly what somebody needs. Say they need other users to be notified real time when another user makes a submission (like a manager being notified as data comes in). I would be keen to test this!

The application I develop already has real time notifications to users when data changes. However, I need to trigger an update to the data upon submission to ODK (via webhook).

Would be great to document both approaches somewhere!

1 Like

Hi Sam, thanks a lot for it and for considering also submissions :grinning:
I was looking for such a tool to automate data retrieving from Central to our database (using pl/pgsql or pl/python function). As I didn't found, I still use psql calls within cron tasks.
It works really fine but a pg_listen() approach would be even more efficient, only asking our server to work when necessary, and closer to a realtime workflow.

2 Likes

@spwoodcock I've skimmed through the linked github issue, but don't overrule the polling option completely. Central has a brilliant audit log api that has info for just about everything that happens serverside. It supports start and end parameters, so if you store the last time a scheduled pull ran successfully, you can pass it as start for your next run and do it on a pretty aggressive schedule. Or if it has to be a trigger, may be make only one on the audits table for a more generic approach.

3 Likes

This post is gold! Thanks so much @punkch :pray:

I have honestly never even looked at the audit logs API - assumed from the name it wasn't useful to me. That was silly!

I think you are right that any trigger based webhook-calling service should be using the audit logs, configurable by audit log event types available :+1:

If webhooks-calling was ever integrated directly into Central, I guess this would be the place!

3 Likes

I just try to understand deeper in particular for submission.create/update and entity.create/update action in audit log

For entity create/update, it does show the dataset and the uuid of the entity. But , lack of project id.

On the other hand, submission create/update only will show instance id. It is lacking of form id and project id.

With this limited info, how to get the submission and entity data when the api requires project id and form id input? Any idea?

1 Like

It should be easy to get the entity or submissions via it's ID directly in the database tables (if creating a webhook in the way described, I am assuming direct database access):

  1. Monitor audit logs in DB.
  2. Get ID of form, submission, or entity from event.
  3. Filter relevant table by ID to get details of form, submission, or entity.

I haven't looked at the data included in the audit logs yet though, so perhaps the data updated is included as part of the entry, making the above unnecessary.

If getting logs entirely by the API that's a different story!

I understand the API URL structure is aligning with REST, but as UUIDs are globally unique, it could be easy enough to have /entities/{uuid} to access the data without a project id. Maybe I am missing something :sweat_smile:

1 Like