We don't see this issue on ODK Cloud, so there is some combination of your infrastructure and usage that is triggering this issue.
The Central team is investigating the possibility of a database leak and we'll update this topic once we learn more. We are also considering releasing a patch to increase the database pool size as a short-term workaround while we try to get to the root cause. More on that later.
In the meanwhile, here are a few things to do to reduce the load on the server:
- If you are exporting submission to CSV via API, use the $filter syntax to only export recent submissions. See https://odkcentral.docs.apiary.io/#reference/submissions/submissions/exporting-form-submissions-to-csv for more.
- If you are feeding data indirectly to Power BI or Excel, consider using the OData feed instead. See https://docs.getodk.org/central-submissions/#connecting-to-submission-data-over-odata for more.
You can also automate restarting the Central service when you get 500s. This is not great long-term fix, but the restart happens very quickly, so it's something to consider if you don't want to babysit the server.
Assuming you are using Ubuntu, here are the instructions:
Install monit
apt-get install monit;
Add /etc/monit/conf.d/central.conf
check program central-healthcheck with path /usr/local/bin/central-healthcheck.sh
if status > 0 for 2 cycles then exec "/bin/bash -c '/usr/bin/docker restart central_service_1'"
Add /usr/local/bin/central-healthcheck.sh
#!/bin/bash
CENTRAL_HTTP_CODE=$(curl --silent --insecure --include --request POST --header "Content-Type: application/json" --data-binary "{\"email\": \"email\",\"password\": \"password\"}" --write-out "%{http_code}" --output /dev/null https://127.0.0.1/v1/sessions)
if [[ $CENTRAL_HTTP_CODE -ne 401 ]]; then
echo "$CENTRAL_HTTP_CODE" > /dev/stderr
exit 1
fi
exit
Set permissions and reload monit
chmod 0755 /usr/local/bin/central-healthcheck.sh;
chmod 0644 /etc/monit/conf.d/central.conf;
monit reload;