Some news from this work
I moved the functions set to github :
Feel free to improve it ! I am not a pl-pgsql guru and I am sure it could be cleaner / safer... I would be glad to discuss about that work and on how to improve it with anybody else.
The filtering capability on submission_date has been removed as it only applies to Submissions table, not to its "daughters".
So we are in a "all data download" scenario and I hope we will be able to improve it later to improve bandwidth and resources consumption.
A top level function was added to simplify the procedure with only one function call, retrieving all data from a given form of a given Central instance to the PostgreSQL schema you want :
select odk_central.odk_central_to_pg(
'myUserName',
'PassW0rd',
'my_central_server.org',
project_id,
'form_id',
'destination_pg_schema',
'geo_question,to_ignore' );
select odk_central.odk_central_to_pg(
'me@mydomain.org',
'PassW0rd',
'my_central_server.org',
2,
'my_form_about_birds',
'odk_data',
'point_auto,point,ligne,polygone');
This simple query will automatically :
- ask Central (at my_central_server.org) for the table list of the form "my_form_about_birds"
- get data for each table composing the form
- create those tables (one text attribute per form question) in the schema "odk_data" of my database, except for questions listed in the last parameter (geo widgets columns)
- feed those tables with the retrieved data
And at next call :
- check for new form questions / table attributes
- create it if needed
- insert only new data
With my knowledge of the form, I now only have to query the tables to do whatever I need with the data, and to choose the best frequency for the cron task wich call the function, to serve data as fresh as possible in our internal tools.