Ongoing ODK sync with our Azure SQL databases via googlesheets and R!

[Note: I'm presenting what I've done in case it helps someone... but please jump to the end to see my areas where I'm inviting help from the community]

Hi all! So some time ago I mentioned that I was embarking upon the daunting task of setting up a system through which we can create ODK forms DIRECTLY from our cloud dbs. Well we've done it! Here's a description:


  1. Sometimes people would create the ODK form first and then worry about the database, which could be a problem in that not all concepts translate cleanly to relational databases (think multiple-selects, repeats, etc).
  2. Maybe we do the same survey the following year, but we change the schema in odk and then it's annoying to change the database to match the new schema (and then don't get around to uploading them)
  3. Since I can get an excel out of ODK... I don't see the value in uploading my precious impact data to the database... which of course means that we can't use it to do some deep impact analysis or data science.
  4. It's very tricky to keep validations, default values, etc syncronized
  5. Sometimes feedback gets lost between different versions of the excel file, difficult/time-consuming to recover.

Use R to create a chain whereby:

  1. ALL form design is done on the cloud db itself
  2. An authorized user sends an email that contains specific metadata to an email address that's monitored by Microsoft Flow. When there's an email of significance, the script checks to see if that user is authorized, and if so, then uses the rest of the metadata in the email to start processing a table from a database:
  3. Create an excel file that follows the protocol of xlsform (translating all the good metadata and lookups from the database)
  4. Create an .xml form from this google sheet
  5. Share permissions to both the sheet and the form based on a permissions table also managed by us
  6. Presto! People have an ODK form that they can use to gather data!
  7. (there is another script that is monitoring the results sheet of every database and will load any new data to the database)

This works really well! but we do have a few niggly little thingies, and here's where you come in:

So the way we are telling people to create fields with both name and label is by using the db field name for name, and then using the database Description field as the label. This works pretty well since the dba's have an interface to easily modify both... well... it works well when the label names are short. For longer questions, or for notes, sometimes our db gets really upset with loooong Description fields and chooses to stop working, so we have temporarily put in a "short-Descriptions-only" policy in place... but clearly this isn't comfortable.

What we COULD do is create a separate file that contains the name field and the label field, and then I stitch the two in ODK right before creating the form. Doing this could even give them a hint field, groups, and whatever else they wanted... but then how do I help the would-be form creators to build off of an updated list (for when the db schema changes)? I mean, I'll check to make sure the fields match anyway and will email them a rejection if someone tries to create a form that doesn't match perfectly... but that's not very sexy... can anyone think of a better way?

Thanks for reading, and please feel free to ask questions or provide feedback/concerns! I'm currently writing this up as a blog article to publish in our company's blog, will post the link when it's done.

1 Like