Difference or use of 'pulldata' and 'instance'

Devendra_Patankar · September 22, 2023, 6:45am

Please help me to understand the difference or use of 'pulldata' and 'instance' in the ODK XLS form.

LN · September 22, 2023, 10:12pm

That's a good question and I'm sorry that there is this messy double way to access CSV data.

pulldata(instance_id, desired_column, query_column, query)

is equivalent to

instance(instance_id)/root/item[query_column=query]/desired_element

The instance expression is native to the way that ODK forms are defined. In addition to querying CSVs, instance expressions can be used to get values from choice lists in the form, from attached XML, or from GeoJSON.

In the instance expression, the part in square brackets is a filter expression. The expression there will be evaluated against every row in your CSV (or feature in your GeoJSON, etc). If the expression evaluates to true, that row will be part of the result. This is more general than what pulldata enables. It's also exactly the same functionality that choice filter uses.

pulldata was introduced for two reasons as far as I know: to make the expression friendlier, and to provide a targeted way to get higher performance. In Enketo, pulldata is an alias for a instance expression so there's no performance difference. In Collect, pulldata uses a database whereas instance uses an in-memory cache. For lists with many tens of thousands of rows, pulldata is faster in Collect. The intent is to eliminate this performance difference within the next year.