Webhooks in ODK Central

1. What is the general goal of the feature?

making integrations possible by creating webhooks.

2. What are some example use cases for this feature?

Automating incoming requests and trigger automated processes like forwarding the items into PowerBi/Teams/Slack you name it.

3. What can you contribute to making this feature a reality?

I'd like to build this into ODK/Enketo. But i'd like some guidance from an architect on where to code this.

Cheers, Martijn

Thanks for writing this up @martijnvaandering and for the interest in possibly implementing webhook support. The first step will be to do some requirements gathering to really understand the workflows that you and others want to support and to drive out where webhooks would fit in in relation to other data sharing options there are currently.

Here are solutions that people currently use for data sharing from Central to other systems:

  • OData connection from PowerBI/Excel/Tableau. This can be configured entirely through those tools' UIs including basic auth. Refresh is done via polling either on demand or periodically. The full data document is downloaded each time. See docs
  • JSON request (OData document). A script or service can request this periodically similarly to PowerBI/Excel/Tableau above. It can either ignore types or get schema information from the OData metadata document like OData consumers, from the simplified schema endpoint, from the XForm, from the XLSForm.
  • JSON request using a $filter on updatedAt to only request new submissions. Similar to above but requires maintaining state. E.g. pyODK: Using cursors to efficiently pull new data only
  • CSV request. Similar to JSON above, can also be filtered, but usually a less convenient format than JSON.
  • Raw XML submission request. Usually this will not be as convenient as the methods listed above for building integrations.

The common thread with all of these above is that they require some system to poll Central.

Push vs. pull

The idea with webhooks is that Central could push notifications to external systems instead. There are always tradeoffs between pushing and polling. Having Central push updates is attractive to external systems because it may save them bandwidth and complexity. However, that comes at the cost of additional complexity for Central. Implementing webhooks well requires having strategies for retrying failed calls, stopping gracefully in the case of misbehaving target services, logging attempts to contact target services, etc.

One way to mitigate the downsides of polling are to introduce a middleware service that polls Central from the same machine and then makes webhook calls to target remote services. That middleware component can handle business logic around e.g. what to do in case of failure. This can be a small custom service (with the downside that this requires software development) or something like OpenHIM (which may introduce more complexity than desired). There are also no or low code solutions for this like Zapier, If This Then That, Activepieces (open source), OpenFn.

The tradeoffs between pushing and polling also depend on the specific needs. For example, if you have a dashboard that needs to be updated once a day, polling daily may be a strictly better choice. If your dataset is relatively small (something like <10,000 submissions x 200 questions), many efficiency considerations won't really matter.

How you can help

All of this said, for anyone interested in webhooks including @martijnvaandering and @ibrahim_i3 (from Adding webhooks to a form), it would be helpful to know the following:

  • What system would you like to integrate with?
  • What action on that system would you like to perform?
  • What Central activity would you like to use as a trigger? E.g. a new submission comes in? A new submission or edit comes in? A submission's approval status changes? Etc.
  • What information from Central would you like to get when the action is triggered (e.g. just the instanceID of the new submission, the submission's data, the submission's data and approval status, the submission's data and metadata, etc)
  • How close to real-time does that action actually need to be? (e.g. someone's life might be in danger if a Slack message is delayed by more than 5mins vs. a batch weekly email would be sufficient)
  • What is your fault tolerance? (e.g. I need most submissions to get to the target system but dropping a few here and there is not a big deal vs. someone's life might be in danger if submissions are dropped between systems)
  • What have you tried so far?

@martijnvaandering, you specifically mention PowerBI. I think the OData feed mentioned above is your best bet in that case. I don't believe PowerBI exposes a webhook endpoint.

For Teams, it would be helpful to know the answer to the questions above and specifically what Central activity you would like to use as a trigger and what ODK data you want to use within Teams.

For Slack, you could use a custom workflow to poll Central up to daily. If you need near-real-time Slack messaging, you could use a layer like I described above to poll Central and then make Slack webhook requests. I can think of at least one person doing this now and can ask them for more specifics if that would be helpful.

You also mentioned Enketo specifically and I wanted to address that. Enketo Express is a service built to work with servers that implement the OpenRosa API. If you don't want to use the OpenRosa API, you could fork Express to send to an arbitrary endpoint or build your own lightweight wrapper around Enketo Core. At this time, we are not interested in a contribution to Enketo Express to add submission to arbitrary endpoints.

5 Likes

This topic got very relevant for a current business case:
The user is required to send emails to participants based on participant (in)activity. More specifically, it is a requirement to send out emails (given an email address will be supplied) for these purposes:

  • confirm a participant's form submission
  • invite a participant to a form submission
  • remind a participant to fulfill a form submission

TL;DR / Background: Call our API endpoint when a submission comes in, at a specific point of time or periodically based on a condition AND submit submission data to be used for email sending on our end. We have to use our own server (and not something like Zapier) because there is no way to authenticate SMTP from outside of our network.

What system would you like to integrate with?

  • our custom API endpoint, to perform GET/POST requests

What action on that system would you like to perform?

  • read data, validate data and send emails based on that data

What Central activity would you like to use as a trigger? E.g. a new submission comes in? A new submission or edit comes in? A submission's approval status changes? Etc.

  • Case 1: a new submission for Form F comes in
  • Case 2: [Not a Central activity - time based] Define an explicit point of time, and a specified set of data with conditions, e.g. a collection of submissions from Form F that fulfill condition C
  • Case 3: [Not a Central activity - time based] Define a timer for a limited duration, that checks periodically for Interval T if a Condition C is true for each submission of Form F and returns a collection of those that are true. The Condition can be described with Comparison and Logical Operators with referencing the current instance (and its fields). For example:

this_submission.some_field_name = false

What information from Central would you like to get when the action is triggered (e.g. just the instanceID of the new submission, the submission's data, the submission's data and approval status, the submission's data and metadata, etc)

  • Case 1: the data specified, e.g. some fields from the submission and the instanceId
  • Case 2 & 3: specified fields and instanceId of all submission whose condition is true

How close to real-time does that action actually need to be? (e.g. someone's life might be in danger if a Slack message is delayed by more than 5mins vs. a batch weekly email would be sufficient)

  • both cases: a delay of 5 mins seems legit, as long as the queue is immediately visible

What is your fault tolerance? (e.g. I need most submissions to get to the target system but dropping a few here and there is not a big deal vs. someone's life might be in danger if submissions are dropped between systems)

  • three retries seems legit till failure, as long as jobs get logged

What have you tried so far?

  • We are protoyping a script to send emails by periodically calling the ODK API, checking conditions on relevant forms and fields. A webhook-provided solution would be in any case more reliable, sustainable and independently usable for non-developers.

@aurdipas

1 Like