ODK Drupal Integration

I have a need to persist data collected via ODK Collect in a permanent searchable repository. Searchable = categorizing and tagging the data and later allowing for filtering, sorting etc. on a front-end. I've been considering Drupal (since we use Drupal for other content in the organization already). I understand I can do this custom with the Central and Drupal APIs and Drupal modules. Wondering if there's anyone with experience, learnings with this integration that they can share or point me to.

I've seen this old post (10+ years) already.
Thank you!

For everyone's benefit, here is the complete workflow we are attempting to do. Given the complexity of the forms and the usage environment, this is a challenging project; I will of course share how this went when we are done!

  1. ODK Collect users from the field submit data to Central
  2. With pyODK, export the data from Central and push into Drupal content types via Drupal's APIs. This will be a scheduled automated backend job.
  3. ODK collect user logs in to Drupal and updates certain open-ended questions with long text on a Desktop computer (these are questions with descriptive responses that are hard to type in to the ODK Collect mobile form screen, the users instead note their detailed observations as audio in Collect, and manually input the long form response using the audio recording for input). Then submits the data in Drupal. This and the rest of the below workflow happens within Drupal.
  4. Backend team reviews the data and "sends" the data back to the submitter for corrections if any with suitable comments.
  5. Submitter logs in to Drupal, corrects/updates the data in Drupal, and resubmits
  6. Summary reports, content discovery, drilldown etc happens within Drupal by Backoffice users.
    Note: We cannot auto-transcribe the user's voice because Google transcribe or any other are quite poor with semi-urban, rural accents. This is in India, multiple Indian languages.

Hi @odkchandra,

Welcome to the community! Please take moment to introduce yourself on the https://forum.getodk.org/t/introduce-yourself-here/ topic.

Some time ago @LN has posted this Python script to update submissions of a form with an AI-generated transcript of its audio attachment(s) that covers some topics on your list and also bring the opportunity to automatically transcript the audio files instead of having dedicated staff doing so.

Using the same approach you could also send this transcriptions to other AI models to get deeper automated analisys. :wink:

sure, thanks for the suggestion! We did consider that, but none of the AI models out there transcribe the rural Indian language accents we work with satisfactorily enough for automation to be efficient. We might still decide to transcribe and use it on best-effort basis. So, thanks for the reference, I'll take a look. It seems useful!

I've been working on this transcription task lately as well, and honestly, I'm in the same boat as you - here in India, native accents can be really tough for AI to pick up accurately. I won't sugarcoat it, it’s seriously frustrating trying to get clean, accurate transcripts.

Out of all the AI tools I've tested so far, only Otter AI has managed to give me around 60-70% accuracy with these regional accents. My current plan is to run the initial transcription through Otter, and then layer another AI model on top to refine it further - hopefully pushing the overall accuracy closer to 80-90%.

1 Like