Central2pg : PostgreSQL set of functions to get data from Central

Some news from this work :wink:

I moved the functions set to github :

Feel free to improve it ! I am not a pl-pgsql guru and I am sure it could be cleaner / safer... I would be glad to discuss about that work and on how to improve it with anybody else.

The filtering capability on submission_date has been removed as it only applies to Submissions table, not to its "daughters".
So we are in a "all data download" scenario and I hope we will be able to improve it later to improve bandwidth and resources consumption.

A top level function was added to simplify the procedure with only one function call, retrieving all data from a given form of a given Central instance to the PostgreSQL schema you want :

select odk_central.odk_central_to_pg(
  'myUserName', 
  'PassW0rd', 
  'my_central_server.org',
  project_id, 
  'form_id', 
  'destination_pg_schema', 
  'geo_question,to_ignore' );
select odk_central.odk_central_to_pg(
  'me@mydomain.org', 
  'PassW0rd', 
  'my_central_server.org', 
  2, 
  'my_form_about_birds', 
  'odk_data', 
  'point_auto,point,ligne,polygone');

This simple query will automatically :

  • ask Central (at my_central_server.org) for the table list of the form "my_form_about_birds"
  • get data for each table composing the form
  • create those tables (one text attribute per form question) in the schema "odk_data" of my database, except for questions listed in the last parameter (geo widgets columns)
  • feed those tables with the retrieved data

And at next call :

  • check for new form questions / table attributes
  • create it if needed
  • insert only new data

With my knowledge of the form, I now only have to query the tables to do whatever I need with the data, and to choose the best frequency for the cron task wich call the function, to serve data as fresh as possible in our internal tools.

5 Likes