Please help me to understand the difference or use of 'pulldata' and 'instance' in the ODK XLS form.
That's a good question and I'm sorry that there is this messy double way to access CSV data.
pulldata(instance_id, desired_column, query_column, query)
is equivalent to
instance(instance_id)/root/item[query_column=query]/desired_element
The instance
expression is native to the way that ODK forms are defined. In addition to querying CSVs, instance
expressions can be used to get values from choice lists in the form, from attached XML, or from GeoJSON.
In the instance
expression, the part in square brackets is a filter expression. The expression there will be evaluated against every row in your CSV (or feature in your GeoJSON, etc). If the expression evaluates to true
, that row will be part of the result. This is more general than what pulldata
enables. It's also exactly the same functionality that choice filter uses.
pulldata
was introduced for two reasons as far as I know: to make the expression friendlier, and to provide a targeted way to get higher performance. In Enketo, pulldata
is an alias for a instance
expression so there's no performance difference. In Collect, pulldata
uses a database whereas instance
uses an in-memory cache. For lists with many tens of thousands of rows, pulldata
is faster in Collect. The intent is to eliminate this performance difference within the next year.