Using $filter with repeat tables to download OData

mathieubossaert · March 5, 2023, 10:22pm

As a alternative solution to filter data subtables, and as a continuation of this start-up, I now have a first version of a pyODK function that meets our needs (getting only fresh data from Central)
For a given filter, it returns a dictionary containing for each "table" of the form (submission tables + repetition) the json data corresponding to the filter.
It looks for nested navigation links in each pulled json and calls the function recursively on them.

I think it's not really optimized and can certainly be improved, so anyone with more python experience than me is welcome to review it !

def fresh_data_only(project_id, form_id, path, criteria,  datas):
    
    from pyodk.client import Client
    import re, json
    client = Client()
    client.open()
    if path == '':
        return None
    
    url = 'projects/'+project_id+'/forms/'+form_id+'.svc/'+path+'?$filter='+criteria
    
    if re.match(r"Submissions\?.*", path) or re.match(r".*\)$", path):
        tablename = 'submissions'
    else:
        tablename = path.rsplit('/')[-1]    
    
    response = client.get(url)
    
    value = response.json()['value']
    
    navigationlinks = re.findall(r'(\'\w+@odata\.navigationLink\'):\s+([^\}]+)', str(value))
    for (key, link) in navigationlinks:
        link = link.replace("'", "'").replace('"','')
        fresh_data_only(project_id, form_id, link, '', datas)
    
    if tablename in datas.keys():
        datas[tablename] += value
    else:
        datas[tablename] = value
    
    json_datas = json.dumps(datas) 
    return json_datas
    
        
fresh_data_only(project_id = '5', form_id = 'Sicen_2022', path = 'Submissions', criteria ='__system/submissionDate ge 2023-03-03',datas = {})

The result in PostgreSQL, when this function is used into a pl/python function is a table with two columns :

the table name
and its json

The next step is to integrate this into central2pg to get a lightweight, and high frequency workflow, between Central and our databases.