500 error code issues with an ODK Central server v1.2.1

Thalie · June 17, 2021, 2:12pm

Hi all,

One of my research partner is also encountering 500 error code issues with an ODK Central server v1.2.1. The admin told me that it was even difficult to restart the server as the docker-compose command led to the error "cannot create temporary directory", which seems to indicate a storage space issue.

My first instinct is that it may be a general server performance issue (the original server requirements was 4 cores / 12 GB RAM / 300 GB storage) but I am wondering if I should investigate other leads?
Unfortunately I am pretty blinded on this system since I do not have an admin role on this server (only project manager rights on a few projects) and need to go through the admin to better understand what may be going wrong. I understood the backup archive may be something ~1.3 GB.

Even after the server was normally restarted, I find myself not able to upload any new forms on the server and the server seems generally quite slow, which I did not experience earlier.

To better understand what the context is I asked the admin to update me with the exact server's characteristics, the server's log, and also the number of ODK Central projects / forms that are available on the server as I learnt that it is shared between different projects and I only see 12 projects (among 3 with ~10 forms) but I do not see the projects in which I am not involved in. I will update this post asap with more information. I may also ask the admin to check the format of the encrypted audio files that have been generated by some forms for testing purpose to ensure that users have not sent 30 x 200 GB interviews...
The issue happened just after we accessed the API through various R requests, so I am still running tests on my own servers to check that concurrent requests may not have led to these issues, such as the timeout issues with ODK Central v1.1 when requesting encrypted data download with an incorrect passphrase - although given the problem seems to persist after services have been restarted so less likely to be the root cause.

Many thanks for any suggestions

yanokwa · June 17, 2021, 4:23pm

Sorry you are having problems, @Thalie. Your post suggests that this is more of a server problem rather than a Central problem.

The first thing I would recommend your IT team check is disk usage (I use ncdu) and RAM usage (I use htop) to see if there is anything unusual there. If all is well there, then look at at docker-compose logs service in the central directory to see if there is anything obviously broken there.

Thalie · June 17, 2021, 8:39pm

Many thanks @yanokwa for confirming that it should rather be a problem with the server and the suggestions. I will definitely ask the admin to investigate the different points you mentioned. Our main data collection with this specific server is starting in ~10 days, so we still have a bit of time to ensure that this is properly solved (and I will try not to become a complete control freak until then )

Thalie · July 8, 2021, 2:18pm

For documentation: the root cause behind this issue was indeed a memory issue. It was actually caused by a backup Cron job that had been incorrectly set up to run every minute, which was "slightly" excessive.