10,000 question form not working

1. What is the problem? Be very detailed.
When i upload form containing around 10,000 questions, the odk central server gives me 502 error.

2. What app or server are you using and on what device and operating system? Include version numbers.
ODK Central (Version:v1.2.1) along with Docker, Nginx
Ubuntu 20
Device Specifications:
32 GB ram
850 GB Hard Drive

3. What you have you tried to fix the problem?
I have tried increasing the Client max body size from 100m to 400m in odk.conf.template file. Still the error persists.
I have tried executing "docker-compose restart pyxform" but did not work

4. What steps can we take to reproduce the problem?
Try uploading a form containing around 10,000 questions and check server response.

5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.

Hi @sidra786, a 10k question form is uncommon but it should work. Please send a copy of the form and any form attachments to support@getodk.org so we can try to reproduce the issue.

Please check email. The issue is occurring while uploading the form to ODK central server.

Still waiting the answer/Solution

Still waiting for your email to support@getodk.org. Are you sure you sent it to the correct address?

Hi, I have sent from another email address. Please check now

Still waiting, Imminent response will be appreciated, Thanks

Check you email. A response was sent a day ago.

Hi,
Thanks for ur help, i have followed the steps that you provided. Everything works fine however when i load this form in ODK collect, it gives the form update failed error. Can u provide me some suggestion. Plus this form is generated automatically by my application. Your help is much appreciated

Media used for the generated form,
https://drive.google.com/file/d/1yqpzNnrGEGvHSOQWkhjQvbJKrSvc3etJ/view?usp=sharing

I wanted to summarize the outcome of my investigation so others can benefit.

This form, which is auto-generated by an external app, is quite large at 10,000 questions. It has 2,000 repeats with images inside those repeats. A form with this structure can work in ODK, but it's not ideal. Further, popular downstream tools you might use (e.g., for visualization and analysis) might not work at all.

I was able to get this form to work by installing pyxform locally on a machine with lots of RAM and CPU, converting the XLSForm to XForm using the xls2xform command, and uploading the XForm directly to Central. The file was 40 MB, but Central can easily handle 100 MB or more.

Collect, after a short delay, was able to load the form. Navigation was quick and so was form saving. I did not try data submissions. The form could easily generate 1 GB or more of submission data, but as long as the files themselves are reasonable in size (e.g., 5 MB) my guess is that it'd work.

The current performance bottlenecks are in tools that Central depend on: pyxform-http (for converting XLSForm to XForm) and Enketo (for form previews and submission editing). This form is an outlier (it's twice the size I've seen in the last 12 years) so addressing those bottlenecks is not a top priority at the moment.

If others have large forms that aren't performant in pyxform-http or Enketo, please let me know. Once we have a handful of examples, then we can look at what we can do to improve our support of very large forms.

Until then, if you come across a form this large, chances are you can restructure it for more efficiency. Here are approaches I recommend.

  • Only collect the data you use, not the data you think you might need. Pictures and videos in particular are often unused. If you need to verify data quality, log enumerator behavior or use random() to make a picture question relevant 10% of the time.
  • Break up the form into different sections to reduce the burden. Instead of a Zoo form, maybe a Tiger form, a Lion form, and a Bear form would be easier.
  • Find ways to reduce the number of repeats. If for example you have repeats on Tigers, Lions, and Bears, use an Animals repeat and ask what type of animal you have inside the repeat.
  • Reduce the number and complexity of calculates. If a relevance applies to multiple fields, wrap those in a group and put the relevance on the group.
  • Reduce duplicate choice lists (e.g., Yes/No/Maybe). Use only one of each type and refer to them.

As to what you should do, @sidra786, my recommendation is to re-think the form design. If that's something you can't do yourself, post at https://forum.getodk.org/c/marketplace/8 and perhaps someone on the community can help.

2 Likes