1. What is the issue? Please be detailed.
We're running ODK Central on Kubernetes, in the Kubernetes configuration we limit resources (CPU and Memory) by setting resource limits. When starting the bulk download of completed forms to a compressed file, the service claims more memory then we allow through our resource constrains and crashes. This probably happens because the service is looking at the memory available on the host and not memory allocated to the pod/container. 'vmstat' in the container shows resources available to the pod and 'vmstat -s' will show memory available on the host, the later is used in some script to determine the number of workers to start, so might also be used elsewhere.
2. What steps can we take to reproduce this issue?
Deploy on Kuberenetes, limit the resources, upload many forms and try and download them in a compressed file.
3. What have you tried to fix the issue?
Some folks have just upped the resources to the service, like setting the memory limit to 4 or even 8GB, although monitoring seems to show a maximum usage of around 1.5GB of memory. I'm now considering using the following:
- name: SERVICE_NODE_OPTIONS
This is what I'm considering adding to my configuration, hoping this will actually limit the amount of memory the service can and will use.
4. Upload any forms or screenshots you can share publicly below.
The last time this happened was before I joint the team trying to solve it, I'm just looking for confirmation that the setting above will limit the memory usage, even if the pod sees much more available memory?
1.5 GB is very little RAM for a server. New phones have 6X that much.
You'll have more success with 2 GB or more. If you really can't do that, at least add swap.
https://docs.getodk.org/central-install-digital-ocean/#increasing-memory-allocation has more on this.
As to your specific question...
SERVICE_NODE_OPTIONS sets the maximum amount of RAM that Node will use, but if there isn't enough, the application will crash. I'd recommend at least 2048 MB to give yourself space. You can dial it back if you find it's not necessary.
Do you know a command that can, inside the container, reliably show the memory limit? For example, what does
cat /sys/fs/cgroup/memory/memory.limit_in_bytes show when you limit memory with Kubernetes? And what does it show when you have no limit?
Apologies for the late response.
cat /sys/fs/cgroup/memory.max shows 2147483648 which equals the 2Gi memory limit configured for the pod.
Unfortunately I can't test it with no limit set as that is not allowed in our production environment, I'll try and find a test location where I might be able to test this.
I'm not sure this statement is necessarily true.
Commands such as lscpu / vmstat etc generally get the kernel resources, which are shared between containers/pods running on the host. So the information you get corresponds to the underlying host and not the container.
However, when limiting via docker compose or Kubernetes resource limits, the container will be restricted by a cgroup. So even if lscpu / vmstat displays all of the host resources, the container can still only use the resources defined in the restrictions.
In a related issue, ODK Central currently only runs one worker by default (WORKER_COUNT is broken as of now, see this PR), so my assumption is that the system just doesn't have enough resources.
Have you tried running with more?