Counting how many time an option was selected in a generic way

mathieubossaert · October 13, 2020, 3:39pm

Hi to all,

This question is close to this one but different

Back to form definition for breeding birds counting protocol, locusts and grasshoppers

Colleagues iterate in a repeat loop each time they see or ear a bird within 5 minutes.
I would like to find a generic way to count how many time a specific taxa have been seen during the field session (= selected in the list), in order to show the summary of the field sessions before form finalization. I can do such aggregation on the database side after the data have been submit to aggregate but I would like to satisfy this user need in a more generic way, without coding a calculate for each species of the list as I do here :

stoc.xlsx (33.8 KB)

LN · October 14, 2020, 12:33am

That's a good one. Unfortunately I don't think there's a straightforward way to do this, certainly not in XLSForm. Would you use this in your analysis or mainly to show a summary to the data collector? I'm wondering because there are some things that could be considered with repeat but it would not be convenient for analysis.

mathieubossaert · October 14, 2020, 5:45am

Hi @LN,

Only to show à summary. There is no problem to get that result after form submission in our analysis. But enumerators need to check this summary in order to check they did not forgot any observation.

Would be glad to test I was thinking about regexp to creae a kind of array in a string and then add this new value or add 1 to the count each time the new value matches the string...

tobiasmllr · May 26, 2022, 8:58pm

Hello, I know this is an older question, but I want to give my solution to the stated problem, given other people might find this thread helpful later.

I faced a similar problem of counting species in repeating groups while preserving a continuous species ID. This ID is used to mark samples in the field, the ID shown to the data collector must be the same as the one submitted with the form.
Having 134 species of fish in the list of choices makes it cumbersome to keep one or two variables per species.

Disclaimer: I'm not 100 % sure this approach is fully functional, but some initial testing seems to show both options below preserve species ID's correctly.
Let me know how I could improve this approach, I'm not very familiar with the use of ODK and XLSForms yet.

Option 1 (only works in ODK Collect, not in Enketo):
I used a string-array as suggested by Mathieu, but counting grouped occurrences with regex is unfortunately not supported in ODK. Neither is string substitution, we can only have character substitution with the (poorly named?) translate()

My solution was to join together a string database but buffering each species' abbreviation with the symbol '#' to a 10-character string. Note that ODK Collect joins together variables inside nested repeat groups differently than Enketo. The latter joins occurrences within the whole form, it seems.
For each individual species entry, I then use an internal repeat to iterate through all previous db entries (all 10 characters wide) and compare it to the current string. For each match, I create a new string-array that contains 'x' for each matching species in earlier repetitions.
The use of once() at the start of a repetition to get a current snapshot of the database, causes a small problem as it means we can't go back and re-edit previous entries. If somebody does, the ID sequence breaks and sample labels will not be correct. So users have to be told not to go back and change repetitions while filling the form.
Enketo, unfortunately, has a bug that nested repeat groups do not update properly, which makes Option 1 only usable in ODK Collect: https://github.com/enketo/enketo-core/issues/830

example_countrepeat.xlsx (21.4 KB)

Option 2 (works in both ODK Collect and Enketo)
I used a similar approach as in option 1, but instead of having a single 'db' with all joined strings, I have one string array for each species. Entries to these species-arrays are made on every repetition, either 'x' for a match or '-' for no match. I then used translate() and string-length() to get which ID the current species should have. Here I also had to use once() to keep track of the state of the string-arrays at the start of the repetition.

I didn't attach a xlsx for option 2 due to limited time. I can provide that later if somebody is interested.

Edit: small clarifications and formatting

tobiasmllr · May 26, 2022, 10:06pm

I forgot to mention that there is a Option 3 that I didn't try yet.
If your species names in the choices is only one character (I'm not sure how many ASCII characters are allowed as choice names, this might be restricted to only 36 alphanumeric characters) then you can do Option 1 without the internal repeat to count previous occurrences (i.e. it would work in Enketo). You'd just build your single db-array with joining together the different 1-character choice names. To find which species number the current one has, you can then use the string-length of the whole db subtracted from the string-length of the db where you removed all current species IDs. Your ID would then be something like: 1 + string-length(${db}) - string-length(translate(${db}, ${speciescharacter}, '')