Proposal: customize name and label 'columns' in selects from external data in XLSForm

When creating select options from external (and internal) data in XLSForm, the "name" and "label" data columns are hardcoded in the XForm output (and therefore required in the data), when using:

  • select_one_from_file filename.csv & select_one_from filename.xml
  • select_multiple_from_file filename.csv & select_multiple_from filename.xml
  • (are there others?)

This means that you'd normally have to modify the data file to rename columns (or XML nodes), which is not great.

This is purely a pyxform/xlsform restriction because Collect, Enketo, iXForms etc are perfectly capable to deal with other names.

How about adding something outrageous (but very nice) support for:

select_multiple_from_file hh-data.csv using hh_number as value and hh_name as label (where either one or two column/node names can be changed)

What do you think? Implementation considerations/problems in pyxform would be very valid arguments, of course.

1 Like

Sorry I didn't respond to this earlier. Agreed this is important to do.

How about using the parameters column in XLSForm? We added it for question types that need any kind of specialized data. That would mean introducing value and label parameters for those types, both of which are optional. There's no possible validation because the external files wouldn't be available to pyxform. So whatever values are provided would just be passed on to the XML.

1 Like

Yes, using parameters would be perfect to override value and label. Thanks!

type parameters name label
select_multiple_from_file hh-data.csv value=hh_number, label=hh_name a Select
1 Like

Issue was created here: https://github.com/XLSForm/pyxform/issues/461

1 Like

Thanks, @martijnr! I've been having a Briefcase-related conversation with a user that overlaps with this and which led me to an enketo-transformer issue and a Kobo forum post. He's @Freeedim in those places and I'm hoping we can unify the various threads and make progress.

He rightfully points out that it's important to be able to specify labels in multiple languages so columns for the underlying value and a single label aren't enough.

I don't find the jr:itext multi language machinery very compatible with external secondary instances because it expects all the translations to be in the form definition. This defeats the purpose of pushing data to external files. Unless I'm missing something, I think we might need to consider extensions to the specs for jr:itext and external secondary instances to fully support using external secondary instances as sources of select choices.

search() provides multi-lingual support by:

  • having the user define column names in the choices sheet (for example, country in the name column, country_name in the label column, nom_pays in the label::French (fr) column, etc.)
  • using a static choice list with a single item in the form definition where the values for label and name define the columns to query in the external CSV
  • using a jr:itext call for the ref attribute of the label if the form is multilingual (for example, jr:itext('/data/produce/name:label') in external-csv-search-language.xml (2.0 KB) )
  • having a text item with a value that represents the corresponding label column in the CSV for each language in the itext block. (For example, the text block for /data/multi_produce/name:label in the form above has the value label::French in French).

@martijnr am I forgetting a straightforward mechanism for specifying multiple language labels in an external secondary instance? This is a big thing that has kept us from moving on from search().

1 Like

Good point. Would be nice to figure that out too. Thanks!

Yes, indeed.

Nothing defined afaik, but actually Enketo does support an undocumented, rogue :flushed: and forgotten method that relies on using ::nl postfixes to CSV columns or lang="nl" attributes in XML nodes. I wonder if that less flexible solution would be acceptable, as it's a very lightweight solution and I like it.

It would be exposed by a (new) translation function call. Enketo chose the function name translate1 and it could be used in the above proposal like this:

type parameters name label
select_multiple_from_file hh-data.csv value=hh_number, label=translate(hh_name) a Select

resulting in the XForm output:

<itemset nodeset="instance('hh-data')/root/item">
       <value ref="hh_number" />
       <label ref="translate(hh_name)" />
</itemset>

That new XPath function would return from its node-set parameter the first XML node (including transformed CSV to XML) with a lang attribute that matches the current language.

Note:
1 Very bad name because there is an XPath 1.0 function with that name that does something entirely different. We could use something like current-lang instead.

A related note around select_one_from_file:

I noticed that in the last version of Enketo a select_one with search() or a select_one_external renders the form without options. I then moved to use select_one_from_file (as suggested here: https://github.com/kobotoolbox/enketo-express/issues/545#issuecomment-242851549) which works excellent in Enketo but makes ODK Collect crash if I have labels in multiple languages.

So it seems that, at the moment, a form with external choices cannot be used in both Enketo and Collect.

To be precise, we don't have a specification for localized external choices. select_one_from_file works the same in Enketo and Collect as far as I know and neither supports localization with forms produced by pyxform at the moment. What @martijnr describes here implies it is likely possible to get localized files working with Enketo with modifications to the form definition XML but these modifications would currently be outside the ODK XForms spec. On the Collect side, the search() appearance/function lets you specify columns to use for different languages' labels on the choices sheet. This is also currently outside of the ODK XForms spec.

Could you please share a form that results in a crash? Here is an example that works the same in Collect in Enketo. The unlocalized labels are shown as expected and the localized columns are ignored.

To be explicit, I brought in the localization concept into this thread because I think it's critical and because I also think it may make us go down a different path for the column customization. Often when people have external CSVs, those come from external processes, and the ideal for them would be to be able to use their files unchanged.

For example, they may have columns id, name, nom and they'd like to be able to specify that values in the name column are to be used as the label in the default language and that nom is to be used as the label in the French (fr) language. Putting all of this in parameters seems it could become quite cumbersome. I'm not sure what a reasonable ODK XForms spec would look like for this (maybe a special instance to represent the mapping?) but I think it's important to at least explore. We may end up having to go with a convention like the ::language (lang code) suffix for practicality reasons but in many cases it will be less convenient from a user standpoint.

I'd really like to take this on but it will likely be a sizable project that touches multiple tools so I'm not sure when it will happen. Spec ideas are most welcome in the mean time.

This is the revised proposal that includes translation support that @LN and I have come up with and want to present for further discussion. It includes additional label parameters with a language code (e.g. label_en, label_fr) as shown in this example:

type name label::English (en) label::French (fr) parameters
select_one_from_file my_data.csv tree Tree Arbre value=id label_en=species label_fr=espèces

It works the same with XML external data files. It also works with choice_filters (so unchanged).

Compared to Enketo's rogue feature mentioned above this proposal does not require a specific CSV data structure, except that column headings should (ideally) be valid XML node names (we could work around that limitation though). So it provides the ultimate CSV flexibility. A small disadvantage is that it wouldn't work with XML files that use e.g lang attributes to provide translations. However, since we require a specific XML structure (root > item) for select_from_file anyway, that does not seem a big deal.

The earlier example without translations remains unchanged.

Pyxform would produce the following output (ignore this if you're not an XForm enthusiast or developer):

<itext>
    <translation default="true()" lang="English (en)">
        <text id="external_instance-my_data-0">
            <value>species</value>
        </text>
    </translation>
    <translation lang="French (fr)">
        <text id="external_instance-my_data-0">
            <value>espèces</value>
        </text>
    </translation>
</itext>
...
<instance id="my_data" src="jr://file-csv/mydata.csv" />
...
<itemset nodeset="instance('my_data')/root/item">
    <value ref="./id"/>
    <label ref="./*[local-name() = jr:itext('external_instance-my_data-0')]"/>
</itemset>

The reason for introducing local-name() instead of name() is to provide better support for external XML data which will likely be name-spaced. local-name() will ignore the namespace prefix.

(Not 100% sure if this ref value is XForms-compliant. Tbc.)

Any feedback would be very welcome!

P.S. Thanks @LN! Note, that I made a few tweaks to change parameter name back to value in XLSForm, use * instead of node() and changed the example to not use label but label_en.

1 Like

That's a good point. The only ref with a predicate I can quickly find in a W3C XForms example is this one and I don't know how related it is. In our spec, I think ref is always a simple path expression without a predicate with an exception for jr:itext. The JavaRosa implementation definitely would need to be reworked to support an expression with a predicate and I can't immediately tell how difficult that would be.

That said, it continues to be the most flexible and simple approach I can think of so unless we get other proposals, I'd be ok with moving forward.

It's inconsistent with the choices sheet but I don't know how many form designers would make the connection between the two. This seems fine to me.

Makes sense. And to confirm, if there were both label and label::French (fr) columns, the accepted parameters would be label and label_fr, right? I say "accepted" but I would be in favor of making defining labels for all languages otherwise declared in the form an enforced requirement. If the source document doesn't actually have columns for every language, a form designer could just do e.g. label_en=species label_fr=species.

CC @Xiphware who would probably find this one kind of fun, too.

1 Like

Good find! Definitely would not be able to ever support predicates in ref for anything other than querying an external data file (not primary instance/bind) in Enketo. So if we go ahead, we should probably clarify its limited use in itemset value/label children in our XForms spec.

Yes, I agree and also agree that we may as well make things easier for ourselves to be strict for translated forms.

1 Like