Incorrect type returned by API form schema fields

Mtyszler · November 5, 2020, 9:19am

1. What is the problem? Be very detailed.
When using the API call to get the form schema (https://odkcentral.docs.apiary.io/#reference/forms-and-submissions/'-individual-form/getting-form-schema-fields) or using ruODK form_schema(), the type returned is not correct. For example, select1 fields are marked as "string".

2. What app or server are you using and on what device and operating system? Include version numbers.

at least ODK Central 0.8. + Windows / R

3. What you have you tried to fix the problem?

I checked the API call directly on the browser confirming the problem

4. What steps can we take to reproduce the problem?

Use the API call as specified in https://odkcentral.docs.apiary.io/#reference/forms-and-submissions/'-individual-form/getting-form-schema-fields with any form containing a select1 field
5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.

I believe this error is more generic, and many types (notes, select_multiple, etc) are also being mapped into "string"
See attachment: all-widgets.xlsx (110.8 KB)

See screeshot:

LN · November 5, 2020, 2:31pm

In XForms, selects are considered a view-only concern and the underlying type is string. Identifying a field as a select requires looking at the form definition body.

The fields endpoint intentionally only provides the bind types — the simplest way that a field can be described. We should consider a documentation update to make that clear.

We could also explore Central doing the work to build enum types when generating the OData schema if choices are internal to the form. It would be good to know more about what you need this info for. For example, do you need to know if there was a choice that was shown but never selected? We would have to carefully consider the performance implications of such a change.

Mtyszler · November 5, 2020, 9:30pm

Thanks for the clarification.

It does make sense, but I must say it's a bit misleading. The reason I say this is that when you prepare an xls form type can indeed be select_one , but the type returned by the schema is actually the XForm processed type.

Here, I support the idea of better documentation, so we know what to expect. Maybe even add a mapping of xls form types ==> XForm types.

I already contributed a function to ruODK (see here ) which extracts choices.

My main use case is that I normally automate ETLs to get data ready for further processing in STATA or R. Knowing the xlsform type allows me to trigger type specific actions, for example applying labels to STATA variables or factors in R. Here I want to be complete.

Having said that, I think the form_schema_ext from ruODK might be enough. If a question has a choice lists associated, it must be select_one or select_multiple, I just need to figure it out which

LN · November 5, 2020, 9:47pm

I agree. I've reopened your issue at https://github.com/getodk/central-backend/issues/306 and made it about updating the docs for now.

I saw that, very cool!!

Thanks for the explanation, this makes sense. It's likely that at some point we'd do processing on Central to make this easier but I'm not sure when. Having a path through ruODK is bound to help lots of users.

Matthew_White · November 5, 2020, 10:03pm

If you're working with Stata and XLSForms specifically, you might want to check out odkmeta (which I wrote years ago) or odk2stata.

Mtyszler · November 6, 2020, 8:14am

I agree. I've reopened your issue at https://github.com/getodk/central-backend/issues/306 and made it about updating the docs for now.

I saw, thanks!

Mtyszler · November 6, 2020, 8:28am

Hi @Matthew_White. Thanks for the additions!

I am familiar with both odkmeta and odk2stata, great tools, btw.

I used odkmeta in the past, but I think I had some small issues (which I don't recall exactly now, and they might not be pertinent anymore). I know that odkmeta uses insheet which has been deprecated in favor of import delim. Also, we would often work with xlsx data downloads, and odkmeta needs csv. Moreover, we started with ODK Aggregate and then KOBO, and they produce slighly different outputs. Finally, one thing in our internal tool is the possibility to adapt/change the original labels in a specification file. For example, say you have a question What is your age?, but you actually want to label in the data file as Age

As for odk2stata I haven't tried to be honest, but I should . I foresee the same issue with adapting labels. Also, to be honest, most of my colleagues are not familiar with python, which creates a huge barrier.

Finally, the main use case I have at the moment is to facilitate spot checks during data collection, by quickly inspecting distributions, missing, values and nulls. For that I'd rather read all information from source, instead of requiring the user to carry a xlsx version of the form around, thus my specific insterest in the API and ruODK

Thanks again for the great tools and for stopping by. We might collaborate in the future!