1. What is the problem? Be very detailed.
ODK Central stopped responding after a non-ODK related upgrade of packages. I have been receiving timeout and proxy errors since.
I had a perfectly working Central setup v. 0.8.0 (whole story here) and after doing a general apt-get upgrade, Central started to return 504 Gateway timeout errors.
The packages that could matter (apache, libssl, certbot, iptables, systemd, python) had only security updates, nothing significant.
I upgraded Central to version 1.2.0. Now the following happens:
- On plain http, when accessing static content, like
version.txt
, I get an immediate response. Good, I suppose that means server works, at least partially. - On plain http, when accessing the main page, background gray appears. After 60 seconds the rest of the page with a login window appears. The
restore
file returns a HTTP 504 Gateway time-out. - On https, for any request, i get a HTTP 500 Proxy Error: The proxy server could not handle the request. Reason: Error during SSL Handshake with remote server
Also, the docker-compose@central service is down, but I have no idea if it is relevant.
Postgresql database works. I can connect manually and perform queries.
2. What app or server are you using and on what device and operating system? Include version numbers.
First, Central 0.8.0, now 1.2.0
Standalone machine with Debian 10, kernel 4.19.0.
The network setup: There is another web server running concurrently on the same machine, so an Apache (2.4.38) reverse proxy is forwarding packets with the address survey.example.com to ports :4480 and :4443.
There is a Let's Encrypt certificate that was already used.
3. What you have you tried to fix the problem?
I upgraded Central to version 1.2.0. Because some settings have changed between versions, I upgraded by ditching my old working central folder and instead git-cloned the repository in a new folder.
I had to stop and delete the current nginx
docker container, before starting a new one. Otherwise it would start but was "unhealthy".
My .env
file looks like this:
# Use fully qualified domain names. Set to DOMAIN=local if SSL_TYPE=selfsign.
DOMAIN=survey.example.com
# Used for Let's Encrypt expiration emails and Enketo technical support emails
SYSADMIN_EMAIL=mail@example.com
# Options: letsencrypt, customssl, upstream, selfsign
SSL_TYPE=upstream
# Do not change if using SSL_TYPE=letsencrypt
HTTP_PORT=4480
HTTPS_PORT=4443
Having everything configured in .env
, I could leave docker-compose.yml
in its original state. (Thank you for this change.)
The relevant Apache config:
<VirtualHost *:80>
ServerName survey.example.com
ProxyPreserveHost On
ProxyPass / http://localhost:4480/
ProxyPassReverse / http://localhost:4480/
</VirtualHost>
<VirtualHost *:443>
SSLProxyEngine On
ServerName survey.example.com
ProxyPreserveHost On
SSLEngine On #
SSLProxyVerify none # These settings are
SSLProxyCheckPeerCN off # supposed to help but
SSLProxyCheckPeerName off # seem to have no effect.
SSLProxyCheckPeerExpire off #
ProxyPass / https://localhost:4443/
ProxyPassReverse / https://localhost:4443/
SSLCertificateFile /etc/letsencrypt/live/my.example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/my.example.com/privkey.pem
Include /etc/letsencrypt/options-ssl-apache.conf
</VirtualHost>
I tried various combinations of SSL_TYPE and HTTP ports (80 vs 4480 and 443 vs 4443) and this combination kind of worked the best. I'm not really sure what settings to choose. If I have a pre-existing certificate that is from Let's Encrypt, is it upstream
or letsencrypt
?
4. What steps can we take to reproduce the problem?
This is probably a configuration problem than a straight out error of ODK.
Because plain HTTP works, I think the problem is related to HTTPS misconfiguration. The timeouts can be explained by databases refusing non-secure connections, so fixing those should help with the timeouts as well (or so I hope).
If someone could give me any insight what could go wrong, i would really appreciate it.
5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.
Some log excerpts that may be useful:
Nginx
2021/06/08 10:16:04 [error] 14#14: *2101 upstream timed out (110: Connection timed out) while connecting to upstream, client: 172.19.0.1, server: , request: "GET /v1/sessions/restore HTTP/1.1", upstream: "http://172.19.0.7:8383/v1/sessions/restore", host: "survey.example.com", referrer: "http://survey.example.com/"
Service
starting cron..
using 4 worker(s) based on available memory (32894260)..
starting server.
unable to connect to ipc-file `naught.ipc`
removing the ipc-file and attempting to continue
(node:59) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
ConnectionError: Connection terminated due to connection timeout
at Object.createConnection (/usr/odk/node_modules/slonik/dist/src/factories/createConnection.js:54:23)
at processTicksAndRejections (internal/process/task_queues.js:85:5)ConnectionError: Connection terminated due to connection timeout
at Object.createConnection (/usr/odk/node_modules/slonik/dist/src/factories/createConnection.js:54:23)
at processTicksAndRejections (internal/process/task_queues.js:85:5)Failed to audit-log task success message!
ConnectionError: Connection terminated due to connection timeout
at Object.createConnection (/usr/odk/node_modules/slonik/dist/src/factories/createConnection.js:54:23)
at processTicksAndRejections (internal/process/task_queues.js:85:5)
'{"did":"nothing","because":"no backup configured"}'
Enketo
13:48:20 0|enketo | Worker 4397 ready for duty at port 8005! (environment: production)
13:50:30 0|enketo | Error: Redis connection to enketo_redis_main:6379 failed - connect ETIMEDOUT 172.19.0.3:6379
13:50:30 0|enketo | at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
13:50:30 0|enketo | errno: 'ETIMEDOUT',
13:50:30 0|enketo | code: 'ETIMEDOUT',
13:50:30 0|enketo | syscall: 'connect',
13:50:30 0|enketo | address: '172.19.0.3',
13:50:30 0|enketo | port: 6379
13:50:30 0|enketo | }
Redis Cache and Main
1:C 07 Jun 2021 17:48:58.506 # Configuration loaded
1:M 07 Jun 2021 17:48:58.507 * Running mode=standalone, port=6380.
1:M 07 Jun 2021 17:48:58.507 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 07 Jun 2021 17:48:58.507 # Server initialized
1:M 07 Jun 2021 17:48:58.507 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 07 Jun 2021 17:48:58.507 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 07 Jun 2021 17:48:58.507 * DB loaded from disk: 0.000 seconds
1:M 07 Jun 2021 17:48:58.507 * Ready to accept connections
Apache
[Tue Jun 08 12:23:24.357237 2021] [proxy:error] [pid 33513] [client 233.252.0.0:50800] AH00898: Error during SSL Handshake with remote server returned by /version.txt
[Tue Jun 08 12:23:24.357250 2021] [proxy_http:error] [pid 33513] [client 233.252.0.0:50800] AH01097: pass request body failed to [::1]:4443 (localhost) from 233.252.0.0 ()
Docker-compose via systemd
systemd[1]: Started central via docker-compose.
systemd[14217]: docker-compose@central.service: Changing to the requested working directory failed: No such file or directory
systemd[14217]: docker-compose@central.service: Failed at step CHDIR spawning /usr/bin/docker-compose: No such file or directory
systemd[1]: docker-compose@central.service: Main process exited, code=exited, status=200/CHDIR
systemd[1]: docker-compose@central.service: Failed with result 'exit-code'.
systemd[1]: docker-compose@central.service: Service RestartSec=100ms expired, scheduling restart.
systemd[1]: docker-compose@central.service: Scheduled restart job, restart counter is at 5.
systemd[1]: Stopped central via docker-compose.
docker ps
IMAGE COMMAND CREATED STATUS PORTS NAMES
central_service "docker-entrypoint..." 25 hours ago Up 20 hours 8383/tcp central_service_1
central_enketo "docker-entrypoint..." 25 hours ago Up 20 hours 8005/tcp central_enketo_1
itsissa/namshi-smtp:4.89-2.deb9u5 "/bin/entrypoint.s..." 25 hours ago Up 20 hours 25/tcp central_mail_1
postgres:9.6 "docker-entrypoint..." 6 days ago Up 20 hours 5432/tcp central_postgres_1
getodk/pyxform-http:v1.5.1 "gunicorn --bind 0..." 6 days ago Up 20 hours central_pyxform_1
redis:5 "docker-entrypoint..." 6 days ago Up 20 hours 6379/tcp central_enketo_redis_cache_1
redis:5 "docker-entrypoint..." 6 days ago Up 20 hours 6379/tcp central_enketo_redis_main_1
central_nginx "/bin/bash /script..." 20 hours ago Up 20 hours (healthy) 0.0.0.0:4480->80/tcp, central_nginx_1
0.0.0.0:4443->443/tcp