I am facing the same issue on 2 of my ODK central installations, both containing quite a big amount of data. One instance has around 40k+ submissions (each with at least 2 images), while the other has around 6k submissions, each with around 3 images each.
I suspect that the issue comes when someone tries to start a download of all data with media images. The Central server starts giving following issues:
Unable to login. Message coming as SOMETHING WENT WRONG - ERROR 500.
If you have a logged in session already, the portal starts giving message: THE USER IS ALREADY LOGGED IN. PLEASE REFRESH THE PAGE.
There is no way to login to the central portal via any account.
Also, the field workers are unable to send data to the server. The Collect app also starts giving error of 500 and CANNOT CONNECT messages.
Central version 1.3.3. No issue of RAM or CPU or disk space.
Temporary workaround: I tried everything but nothing worked. I had to restart the full machine again. Then everything started working. But after a couple of days, the same thing happened again.
If someone could help, I would be grateful. Let me know if you need to me pull out some logs.
Thanks. It does not work for me like this. I have tried clearing cache, changing browsers and changing computers as well.
Restart is a not a good way, although it is the only way I know so far. There is always a risk of anything going wrong, ending up losing all data. The data so far is so huge that it's not easily possible to back it up frequently. No issue of RAM, CPU or disk space. Version is 1.3.3, but I assume 1.4 is also having similar issues (I saw some thread in forum).
I used ODK Aggregate to its breaking point with huge huge amount of data in different projects, and it never broke down. But Central is causing issues. Need some permanent fix for it.
Let me know if I could get you some logs for troubleshooting.
Thanks, very valid points. My infra is in AWS, and I select pretty powerful machines (never lighter than t3.large). However, the RAM and CPU checks I do on the host only, not inside containers.
Would need some hand-holding for picking up logs (which and how), and also how to check container resources?
Note that t3.large is burstable and so performance isn't predictable.
Independent of that, a machine with lots of RAM won't help because node (which Central uses) won't use more than 2 GB of RAM unless you specifically allocate more RAM.