Let say I have a HH survey with a repeat part. I can use $filter option to download only those submissions I need
https://{{myserver}}/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions?$filter=__system/submissionDate le 2020-01-31T23:59:59.999Z
Now I need to download the corresponding part of the submission of the repeat table Submissions.main.hq. In this table have fields: age, name, __id and __Submissions-id
Using any of these fields with $filter doesn't work:
## Not working
https://{{myserver}}/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions.main.hq?$filter=age le 15
## Not working either
https://{{myserver}}/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions.main.hq?$filter=__Submissions-id in ({{comma_separated_list_of_ids}})
Hi @Odil! There's currently only limited support for filtering OData, particularly for filtering subtables.
In general, we don't plan to support filtering on form fields (like age) in the near future. See this forum post for some of the reasoning behind that. However, we support filtering the primary table on specific metadata fields.
We would like to add the ability to filter subtables on those metadata fields. We have a GitHub issue for that feature here, and there was related discussion in this forum topic. With that feature in place, you would be able to filter a subtable on __system/submissionDate. However, we aren't sure yet what the right approach or syntax within OData would be for that.
It is currently possible to fetch a subtable for a single Submission:
Thank you @Matthew_White for these comments. It's helpful. I was trying to use $filter to download only the latest data. If we can use __system/submissionDate also for subtables in the new future that would be really great.
For the time being, to download the latest data I will use OData document parameters $count and $top
This will bring the total number of the data in subtable,
...
"@odata.count": 6
With the next query I can download the submissions from the top which should be equal to the number: [total number of the data in the server] - [count of I downloaded already].
This is the only solution I could find to download only the latest data from OData tables. Hope this solution will help others too.
FWIW for my ETL pipeline I download all submissions every day with ruODK::odata_submission_get, which also downloads media attachments to a local folder. Each run only adds new media attachments, and the actual data only a few seconds (we have 100k+ records, so nowhere near LSHTM scales).
This approach works only because I persist the downloaded media files between runs.
I split the data into new (create), existing (update or skip) and existing and changed downstream (always skip) using an excerpt of the downstream database. The splitting takes a second.
This is to demonstrate my hack around subsetting subtables.