Add distinct function for use with entities

What high-level problem are you trying to solve?

With entities now available, there are some form logic operations that cannot be achieved with the existing set of form functions. i.e.

The "trees" dataset, which is populated with entities created from a form "Tree registration form".

In the form "Tree follow-up Form", we want to filter down the "trees" dataset by the property "species", so in a subsequent question a user only has a limited number of options to select from. We want this to dynamically respond to entities with new species in the dataset, rather than having to update the definition.

There is not currently a way to build a distinct list of "tree/species" directly within the form to use in a select_one question.
This can be solved by utilising multiple datasets, i.e. "trees" & "species", however this adds complexity to manage, especially since a form can currently only create a single entity.

Previously, this type of distinct operation was rolled into the form definition ( e.g. using Excel functions), however entities should remove the need to update definitions for this reason.

Any ideas on how ODK could help you solve it?

By adding the function distinct-values
This would allow for a nodeset that contains only the "species" values to be calculated, and the distinct values of that nodeset used in the select_one question.

Alternatively, by extending support for regular expressions to allow groups, not just a boolean as the result of the calculation.

Upload any helpful links, sketches, and videos.

This function or something like it would be useful indeed! It would allow dynamic 'reverse choice filtering' more easily possible.

Currently I am creating a join of all/some of my dataset where a certain field matches (eg for the big entity list, join all site values for my entity items where status = incomplete), and then setting my choice filter on the parent (eg select site) to search the join for matching items and exclude others (so if only Site A, Site C, Site D, Site Z appear in the join, then my choice list for 'select site' will only show these matching options and not other sites that don't have incomplete items).

This results in a join that can be thousands of items in size and is not performant (particularly in Enketo).

An alternate to the search(distinct-values(join())) or however it's implemented, would be as above, eg if a unique nodeset of site values [Site A, Site, C, Site D, Site Z] from the large entity list (biglist.csv) could be created, and the question 'select the site' as select_one_from_file biglist.csv would present these options - using parameters such that value=site, label=site and choice filter like distinct-values(instance('biglist')/root/item[status='incomplete']/site), this removes the need for the parent list(s) altogether (though I'm pretty sure my expression is written incorrectly!)

I also wish to have a distinct-values(nodeset) function to only returns unique items in the nodeset.

The use case can be as straightforward as counting distinct values, or more advanced, such as serving as a choice list in select_one/select_many questions

1 Like