Propagate submission date to child tables

mathieubossaert · July 19, 2021, 12:38pm

Hi to all,

in order to store collected data in our PostGIS database, I am looking for an efficient way to get data from Central.

I developed a set of PostgreSQL functions that work really well as they automatically create database table from per table ODATA query. They also add new attributes to those tables if question are added in forms.

All works really fine but I would like to improve it a little bit. In order to get fresh data in our tools as soon as colleagues are back to office, I query Central twice an hour.
As I said everything works well but each time I download all submissions since the form has been created. It is not so much but it weights approx 1 Go each day (from 8am to 8 pm).

I tried to use the $expand=* option combined to submission date filter. It gives back a light json with recent data but the json is really much hard to automatically transform to database tables.
I was wondering if it would be possible to propagate submission date to child tables so we could use such a filter on those tables.

Matthew_White · August 5, 2021, 7:49pm

Hi @mathieubossaert! I think we do want to provide a way to filter subtables. I'm not sure what the best way to do that is within OData, but I've created a GitHub issue so that we can look more into that: https://github.com/getodk/central-backend/issues/391

This may or may not be helpful, but I also wanted to mention the .../fields endpoint of the API: https://odkcentral.docs.apiary.io/#reference/forms/individual-form/getting-form-schema-fields. I'm wondering whether that would make it easier to use $expand=*. The .../fields endpoint returns an array of fields, including each field's type. You could filter the result for repeat groups and for other groups (groups whose type is structure). If you then iterate over those groups, that should allow you to traverse the $expanded JSON one step at a time (storing the result of each step in a table, in a temporary table, or using a common table expression). That might end up being too complicated, but just wanted to mention this endpoint in case it helps!

mathieubossaert · August 10, 2021, 3:53pm

Thank you Matthew for your attention, for your answer and for the issue you created.

The scenario I can imagine using the .../fields endpoint :

ask Central for form fields of the form
- the very first time to first create the tables (child table under "repeat" type elements)
- and then each time I check for new data : to detect and create new attributes
pull data using the $expand=* option filtered by submission date.
parse / explore the new data json tree to feed the core and child tables.

One advantage I can see is the availability of data types. Actually with central2pg all questions are stored in text fields and I cast it in the type I want in my queries.

But I fear the complexity of the SQL functions to develop in order to explore a big json tree and dispatch it into tables. I will take a look at some complex json_path queries.
My actual set of function is quite simple, really functional and generic. As "child" tables do carry parent ID so it is really simple to transpose data in a relational data model.
So, from a lazy point of view, it would be really easier to be able to filter child tables on submission_date and I will be attentive on the issue you created.
But I am curious so I will try to spend some time this autumn on the approach you suggest.

If another SQL addict want to explore this way, please share it