ODK Central submissions zip export fails with large dataset

1. What is the issue? Please be detailed.

We are using ODK Central (version details at end of post) on an AWS EC2 t2.large instance (8 gb ram). This server hosts dozens of forms, including a very large form (dozens of repeats, hundreds of fields, and tens of thousands of submissions already on the server).

When we attempt to download a zip file from the ODK Central web application by clicking "Download" -> "All data without media files (.zip)", we experience two kinds of issues:

  1. Usually, the download begins (ie, the file shows up in the download bar in the bottom left of google chrome), proceeds slowly, and then stalls. The download appears to finish after approximately 8.0 mb
  2. Once, after approximately a minute, the download had not begun and a 504 Gateway Time-out page showed up

We have also tried to download the submissions via R, using the ruODK package's submission_export function. This has a similar failed result.

I suspect that this issue is related to RAM, similar to the one described here:
Failed download records .zip folder from project submission page

My suspicion is further affirmed by the fact that provisioning the server with more memory (16 gb) makes exports work.

2. What steps can we take to reproduce this issue?

Reproducing the exact issue is not possible due to the sensitivity of the data. However, I suspect that a similar result would occur for anyone with (a) a very large number of submissions and (b) a low amount of memory.

3. What have you tried to fix the issue?

We have provisioned the server with more memory, which fixes the issue. However, it seems a shame to keep such an overly-provisioned server up and running all the time (given recurring costs) for a task which happens only a few times per day at most. Though the current "fix" (more memory) works, if there are better approaches, knowing them would be useful.

Version details

versions:
24ee74e5f974a518aa1cc8b06e7addb3be6b4690 (v1.3.3-2-g24ee74e)
 5cc6fd79d112ce36d6298c61bb8817689c4c323b client (v1.3.2)
 1d1a3a59969e61383da74119e405e67778b7a170 server (v1.3.3)

Having more RAM on the machine itself doesn't mean that Central will use it.

My guess is that when you over-provision the machine, something else (e.g., reboot, swap, faster disk, more cores) came along with the bigger machine that made the problem better.

Here's what I'd recommend to start:

  1. Go back to t2.large. It really should be enough.
  2. Allocate more memory to Central. If you have an 8GB machine, allocating 3.5GB is a good place to start. See https://docs.getodk.org/central-install-digital-ocean/#increasing-memory-allocation for how.
  3. Add 1GB swap to give yourself a little breathing room in the unlikely case memory will be exhausted https://docs.getodk.org/central-install-digital-ocean/#adding-swap for how.

The other thing I'd recommend is an upgrade to v1.5.3 so you get the newest features and fixes. In your case, the upgrade would help with any potential database connection leaks you might be running into.

Hello Joebrew, sometimes in my projects I have the same problem, I used the date variable and then I filtered with this variable to download in parts, if it is a project of several days you will have to download several times, I think it is not because of the characteristics of configuration of the server, it becomes a problem of the database, because when detecting the day that has a problem, I download by records to know which one has a problem.

1 Like

Thanks for writing this up, @joebrew and for chiming in, @abelinuxmx. We're sorry you're experiencing these challenges. We do some testing on large forms and try to keep them export-able even on modest machines. It would help us to learn more about your forms.

Could you please give closer ranges on these so we can try to reproduce? My guess is that this is mostly related to repeats. I believe with repeats there are cases where the whole zip has to be kept in memory. How many repeat instances for each of those repeats (rough average).

@abelinuxmx Does your form have repeats too? If so, how many repeats and roughly how many instances per submission?

Do either of your submissions capture images or other multimedia?

1 Like

Hello Helen, I do not use repetitions in my forms, but if I use auditlog, with geolocation and audio, it has happened to me with relatively small questionnaires of 50-60 questions, I have an instance in Digital Ocean, with 4 GB Ram and disk space of 50 gb ssd, they are small projects with 600-1000 cases. This happens occasionally, being few cases then I have no problem downloading them one by one.

@yanokwa I am pleased to report that I was successfully able to (i) downsize to t2.large, (ii) allocate more memory as per your instructions and (iii) add swap. And the result: I was successfully able to export data for the very large form! Thank you!

@abelinuxmx Thanks for the tip on dividing by date. Though not needed for this specific case (since swap + memory allocation resolved the issue), it's a useful tip. Thanks!

@LN Regarding your request for "closer ranges on these so we can try to reproduce", here is the info:

  • 30,988 submissions
  • 31 repeats (ie, 32 tables total including Submissions.csv)
  • Size of each repeat: between 1 and 149,235 rows; 4 to 762 columns
  • Largest repeat table: 149167 rows and 648 columns

You asked "Do either of your submissions capture images or other multimedia?" In my case, no.

Merci!

1 Like