Using $filter with repeat tables to download OData

Let say I have a HH survey with a repeat part. I can use $filter option to download only those submissions I need


https://{{myserver}}/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions?$filter=__system/submissionDate le 2020-01-31T23:59:59.999Z

Now I need to download the corresponding part of the submission of the repeat table Submissions.main.hq. In this table have fields: age, name, __id and __Submissions-id

Using any of these fields with $filter doesn't work:

## Not working
https://{{myserver}}/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions.main.hq?$filter=age le 15

## Not working either
https://{{myserver}}/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions.main.hq?$filter=__Submissions-id in ({{comma_separated_list_of_ids}})

Do you have any suggestions?

Hi @Odil! There's currently only limited support for filtering OData, particularly for filtering subtables.

In general, we don't plan to support filtering on form fields (like age) in the near future. See this forum post for some of the reasoning behind that. However, we support filtering the primary table on specific metadata fields.

We would like to add the ability to filter subtables on those metadata fields. We have a GitHub issue for that feature here, and there was related discussion in this forum topic. With that feature in place, you would be able to filter a subtable on __system/submissionDate. However, we aren't sure yet what the right approach or syntax within OData would be for that.

It is currently possible to fetch a subtable for a single Submission:

/v1/projects/{{projectId}}/forms/{{xmlFormId}}.svc/Submissions('{{instanceId}}')/{{repeatName}}

(Note that xmlFormId and instanceId should be percent-encoded.) However, that doesn't provide a way to fetch data for multiple Submissions at once.

Hope that helps!

2 Likes

Thank you @Matthew_White for these comments. It's helpful. I was trying to use $filter to download only the latest data. If we can use __system/submissionDate also for subtables in the new future that would be really great.
For the time being, to download the latest data I will use OData document parameters $count and $top

...{{xmlFormId}}.svc/{{subtable}}?$count=true&$top=0

This will bring the total number of the data in subtable,

...
"@odata.count": 6

With the next query I can download the submissions from the top which should be equal to the number: [total number of the data in the server] - [count of I downloaded already].

This is the only solution I could find to download only the latest data from OData tables. Hope this solution will help others too.

3 Likes

Hi @Odil

Depending on the amount of data that you expect in your repeat groups, it may also be an option to use the ?expand=* query parameter in order to obtain repeated data directly with the main submissions query. See: https://odkcentral.docs.apiary.io/#reference/odata-endpoints/odata-form-service/data-document

This is available since ODK Central v1.2.

1 Like

Yes, you are right. But this is not a solution when you want to download each table separately.

FWIW for my ETL pipeline I download all submissions every day with ruODK::odata_submission_get, which also downloads media attachments to a local folder. Each run only adds new media attachments, and the actual data only a few seconds (we have 100k+ records, so nowhere near LSHTM scales).

This approach works only because I persist the downloaded media files between runs.

I split the data into new (create), existing (update or skip) and existing and changed downstream (always skip) using an excerpt of the downstream database. The splitting takes a second.

This is to demonstrate my hack around subsetting subtables.

1 Like