Using select_one_external with a large file

Hello everyone, nice to meet you all.

I'm designing a form that has several questions to allow users to specify their location. It uses cascading selects to work through administrative levels, from country down to village. The total village list is in excess of 100k rows. I'm making the form in XLS and using the service from OnaData to host it (which I believe also converts it to XML).

For the cascading location selection, I started with select_one_from_file but found that our older devices would run out of memory and crash when the file became too large. Even when it didn't crash, it tended to be slow to the point where it would be an issue for users. Even after splitting the file by country, it would still crash on occasion (some of the field devices are quite old).

Having read the docs here I switched to using select_one_external which has resolved the speed and memory issues completely (it's super fast now).

The only problem here is that this method requires me to add the contents of the file as a tab called external_choices whereas I'd much rather keep it as a separate standalone file, if possible, for several reasons.

For one, I'd like a single file that I can keep updated and have forms point at it, so I only have to make one set of changes. I'd also like to avoid the issue of copy/paste errors when adding it. Finally, it does bloat the XLS file size considerably, which may become an issue in future if we add more external datasets in this fashion (people like to pass the XLS form around for comment etc).

Does anyone know of any way to get the speed of select_from_external without having to create this extra tab in the XLS form? Or perhaps any other approaches that might work well for me?

Thank you.

You can use the search() appearance as described at https://xlsform.org/en/#dynamic-selects-from-pre-loaded-data

We are actively working on increasing performance and decreasing memory needs for select_one_from_file. Those improvements should be available for Entity Lists this fall and for other attached data files after that.

Hi @taindow

By splitting the village data in to two or more partitions, you can achieve a balance between performance and manageability. In your form, based on the user's previous selections (e.g., country - state - district), by using relevance logic and two or more 'select one from file' questions for the 'Village', you can dynamically ensure that only the relevant data is referenced, reducing the load on the device. You can then use a calculate field to concat all your 'Village' capturing variables and use that variable for Village data when you use the submissions data. Although it's not as streamlined as having a single external CSV file, it should mitigate the issues you're facing with file size and device limitations. It involves some upfront work in partitioning and maintaining the files, but it can significantly enhance the user experience on older devices, which seems to be a priority for your project.

I hope this helps!