I have a form for scanning school exam barcodes. The form uses select_one_from_file to look up student data from a CSV file (reg_ben.csv). The CSV contains about 100,000+ records (student details).
The problem: When I load the form in ODK Collect, the CSV file doesn't download properly, and the form fails to open. I only see a loading screen, then nothing.
What I've tried:
Reduced CSV to a small test file (10 rows) → works fine.
Original large CSV → fails.
CSV is properly formatted with correct column names matching the XLSForm.
My XLSForm structure:
Main survey sheet has select_one_from_file reg_ben.csv and calculate fields using instance('reg_ben')/root/item[...]
External CSV has columns: provice_name, school_name, ..., barcode_code
Question:
Is there a size limit for CSV files in select_one_from_file?
If I embed the CSV data directly inside the XLSForm (in the choices sheet), will it work better? How can I convert my external CSV to an internal choice list while keeping all 100k+ records?
Are there alternative approaches for large dataset lookups in ODK Collect?
I appreciate any guidance from those who've handled large datasets in ODK.
You were right to use a CSV file. Putting the data in the choices sheet is not the right choice. Using entities could also work, but I like CSV files myself.
Here is an altered version of your form that works on my setup. As I do not speak Persian, I made a few modifications to test it out on my setup. I believe the issues were due to typos rather than the size of the CSV file. ODK handles large files quite well, in my experience.
For your use case, you should be using pulldata() instead of instance() in the calculations, since the select_one_from_file isn’t actually ever used! If you just want access to the CSV data, pulldata() makes much more sense for this use case. Here is the documentation for pulldata.
I moved the getbarcode variable out of the field-list group. The XLSForm logic generally flows from top to bottom of the form, and I’ve modified the logic so that you first scan a barcode, then the calculates look up the data, and then it starts displaying that data to the user.
I also added the _key suffix to the barcode in the .csv file to make lookups quicker. You can read about how this works in the data preloading documentation.
This probably isn’t totally perfected yet, but it should get you pointed in the right direction. In the future, make sure you remove all personally identifiable information from your .csv files when uploading to the forum .
Thank you for the support and for sharing the corrected XLSForm with me. I tested the file that was sent, but I noticed some problems compared to what I need:
In the corrected version, only three attributes are shown after scanning the barcode. In my original form, I wanted the full data (province, school, attendance number, student name, father’s name, etc.) to appear directly in text fields after scanning.
In the shared version, the data does not appear in the text type fields. Instead, it only shows a message saying whether the barcode is registered or not.
When I scan a barcode that should be registered, the form says “not registered,” while in my original design it should display the student’s details and confirm registration.
My goal is:
After scanning the barcode, the form should automatically pull all the student’s information from the CSV file and display it in text fields.
Then it should also show a message confirming whether the barcode is registered or not.
Could you please advise how I can adjust the form so that the full data is displayed in text fields (not just the registration status), while still keeping the registration check?
Indeed, the corrected form probably doesn’t correspond to your original design. However, all the variables for province, school number, attendance number, etc., should all be set in calculate fields, so you can just reference them as needed in labels by adding them in as ${variable_name}s to the label. As I don’t speak Persian, it was difficult for me to replicate exactly what you needed.
When I scan a barcode that should be registered, the form says “not registered,” while in my original design it should display the student’s details and confirm registration.
After scanning the barcode, the form should automatically pull all the student’s information from the CSV file and display it in text fields.
Then it should also show a message confirming whether the barcode is registered or not.
A clarification question: how does the form have access to a student’s information if they are not registered? I assumed that every student in reg_ben.csv was registered. If so, then you won’t be able to display student information for students that are not in reg_ben.csv.
If my assumption is correct - that every student in reg_ben.csv is registered - then it sounds like an issue in the logic of the calculate field that confirms registration. If you scan a barcode that is not in the reg_ben.csv, none of the other variables should be populated (e.g. school_name, father_name, etc.). You could check that the school_name is defined after the pulldata() call. If the school_name is defined, the student is registered. If it is not, the student needs to be registered, and you could show the text inputs for the registration using the relevant field.
You had part of this logic in the form already, and you’ll probably have to play around with it a bit to make it work to your requirements.