Severe issue on multiple Central working with images - Error 500

Hi,

I am facing the same issue on 2 of my ODK central installations, both containing quite a big amount of data. One instance has around 40k+ submissions (each with at least 2 images), while the other has around 6k submissions, each with around 3 images each.

I suspect that the issue comes when someone tries to start a download of all data with media images. The Central server starts giving following issues:

  • Unable to login. Message coming as SOMETHING WENT WRONG - ERROR 500.
  • If you have a logged in session already, the portal starts giving message: THE USER IS ALREADY LOGGED IN. PLEASE REFRESH THE PAGE.
  • There is no way to login to the central portal via any account.

Also, the field workers are unable to send data to the server. The Collect app also starts giving error of 500 and CANNOT CONNECT messages.

Central version 1.3.3. No issue of RAM or CPU or disk space.

Temporary workaround: I tried everything but nothing worked. I had to restart the full machine again. Then everything started working. But after a couple of days, the same thing happened again.

If someone could help, I would be grateful. Let me know if you need to me pull out some logs.

Thanks,
Saad

1 Like

I get those user login issues every now and then. Deleting browser cookies fixes that.

I've seen issues fixed temporarily through a server restart here on this forum. Are you low on disk space? Running the latest Central version?

Hi @Florian_May

Thanks. It does not work for me like this. I have tried clearing cache, changing browsers and changing computers as well.

Restart is a not a good way, although it is the only way I know so far. There is always a risk of anything going wrong, ending up losing all data. The data so far is so huge that it's not easily possible to back it up frequently. No issue of RAM, CPU or disk space. Version is 1.3.3, but I assume 1.4 is also having similar issues (I saw some thread in forum).

I used ODK Aggregate to its breaking point with huge huge amount of data in different projects, and it never broke down. But Central is causing issues. Need some permanent fix for it.

Let me know if I could get you some logs for troubleshooting.

Many thanks,
Saad

We run Central instances with much more data than you have that don't have this problem. The issue is most likely with your infrastructure.

Look at your logs. What do they say when you start a download with media?

You say you have no issue of RAM or CPU. What frequency are you measuring usage? Are you looking at the host only or also each container's stats?

Hi @yanokwa,

Thanks, very valid points. My infra is in AWS, and I select pretty powerful machines (never lighter than t3.large). However, the RAM and CPU checks I do on the host only, not inside containers.

Would need some hand-holding for picking up logs (which and how), and also how to check container resources?

Many thanks,
Saad

Note that t3.large is burstable and so performance isn't predictable.

Independent of that, a machine with lots of RAM won't help because node (which Central uses) won't use more than 2 GB of RAM unless you specifically allocate more RAM.

See https://docs.getodk.org/central-troubleshooting/#reading-container-logs for how to read logs and check status.

See https://docs.getodk.org/central-install-digital-ocean/#increasing-memory-allocation for allocating more memory to the service container.

1 Like