As a alternative solution to filter data subtables, and as a continuation of this start-up, I now have a first version of a pyODK function that meets our needs (getting only fresh data from Central)
For a given filter, it returns a dictionary containing for each "table" of the form (submission tables + repetition) the json data corresponding to the filter.
It looks for nested navigation links in each pulled json and calls the function recursively on them.
I think it's not really optimized and can certainly be improved, so anyone with more python experience than me is welcome to review it !
def fresh_data_only(project_id, form_id, path, criteria, datas):
from pyodk.client import Client
import re, json
client = Client()
client.open()
if path == '':
return None
url = 'projects/'+project_id+'/forms/'+form_id+'.svc/'+path+'?$filter='+criteria
if re.match(r"Submissions\?.*", path) or re.match(r".*\)$", path):
tablename = 'submissions'
else:
tablename = path.rsplit('/')[-1]
response = client.get(url)
value = response.json()['value']
navigationlinks = re.findall(r'(\'\w+@odata\.navigationLink\'):\s+([^\}]+)', str(value))
for (key, link) in navigationlinks:
link = link.replace("'", "'").replace('"','')
fresh_data_only(project_id, form_id, link, '', datas)
if tablename in datas.keys():
datas[tablename] += value
else:
datas[tablename] = value
json_datas = json.dumps(datas)
return json_datas
fresh_data_only(project_id = '5', form_id = 'Sicen_2022', path = 'Submissions', criteria ='__system/submissionDate ge 2023-03-03',datas = {})
The result in PostgreSQL, when this function is used into a pl/python function is a table with two columns :
- the table name
- and its json
The next step is to integrate this into central2pg to get a lightweight, and high frequency workflow, between Central and our databases.