Problem with Central 1.4 upgrade "User already logged in"

Nicholai · April 20, 2022, 6:31pm

1. What is the issue? Please be detailed.
I am running ODK Central on a Digital Ocean droplet.
I followed the steps for Central upgrade and everything seemed to go smoothly.

When I try to log-in on my domain, I receive a message
"A user is already logged in. Please refresh the page to continue."

When I try to download data via ruODK, I get the following error message:

se <- submission_export(local_dir="odk_exports", overwrite=TRUE, media=FALSE)
Request failed [502]. Retrying in 1.2 seconds...
Request failed [502]. Retrying in 2.1 seconds...
Error in response %>% httr::stop_for_status(task = glue::glue("get desired response from server {url} as user "{un}".\n\n", :
Bad Gateway (HTTP 502). Failed to get desired response from server https://libodk.com as user "nicholai@lib.solar"

Here is the version info from version.txt:

cd751d789e35e5cb1469cf651d6ba7275903bc03 (v1.4.2-4-gcd751d7)
 461c1037997f9fb88075b75612b83cea297ced42 client (v1.4.0)
 367a7f572d3ede411cea16d0fa1decb98e280e92 server (v1.4.2)

When I check the status using docker-compose ps, everything seems fine:

root@odk:~/central# docker-compose ps
            Name                          Command                  State                     Ports               
-----------------------------------------------------------------------------------------------------------------
central_enketo_1               docker-entrypoint.sh /bin/ ...   Up             8005/tcp                          
central_enketo_redis_cache_1   docker-entrypoint.sh redis ...   Up             6379/tcp                          
central_enketo_redis_main_1    docker-entrypoint.sh redis ...   Up             6379/tcp                          
central_mail_1                 /bin/entrypoint.sh exim -b ...   Up             25/tcp                            
central_nginx_1                /bin/bash /scripts/odk-set ...   Up (healthy)   0.0.0.0:443->443/tcp,             
                                                                               0.0.0.0:80->80/tcp                
central_postgres_1             docker-entrypoint.sh postgres    Up             5432/tcp                          
central_pyxform_1              gunicorn --bind 0.0.0.0:80 ...   Up                                               
central_secrets_1              docker-entrypoint.sh ./gen ...   Exit 0                                           
central_service_1              docker-entrypoint.sh ./wai ...   Up             8383/tcp

Here is an excerpt from the nginx logs, which shows errors due to connection refused.

2022/04/20 18:47:09 [error] 199#199: *123 connect() failed (111: Connection refused) while connecting to upstream, client: 69.209.6.149, server: libodk.com, request: "GET /v1/sessions/restore HTTP/1.1", upstream: "http://172.18.0.8:8383/v1/sessions/restore", host: "libodk.com", referrer: "https://libodk.com/"
69.209.6.149 - - [20/Apr/2022:18:47:09 +0000] "GET /v1/sessions/restore HTTP/1.1" 502 568 "https://libodk.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36"
69.209.6.149 - - [20/Apr/2022:18:47:09 +0000] "GET /fonts/icomoon.ttf?ijnc2n HTTP/1.1" 304 0 "https://libodk.com/css/app.e22ccca1.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36"
69.209.6.149 - - [20/Apr/2022:18:47:09 +0000] "GET /favicon.ico HTTP/1.1" 304 0 "https://libodk.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36"
69.209.6.149 - - [20/Apr/2022:18:47:24 +0000] "GET /version.txt HTTP/1.1" 304 0 "https://libodk.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36"
2022/04/20 18:47:30 [error] 199#199: *191 connect() failed (111: Connection refused) while connecting to upstream, client: 41.191.107.44, server: libodk.com, request: "GET /v1/key/1234/projects/2/formList HTTP/1.1", upstream: "http://172.18.0.8:8383/v1/key/1234/projects/2/formList", host: "libodk.com"
41.191.107.44 - - [20/Apr/2022:18:47:30 +0000] "GET /v1/key/1234/projects/2/formList HTTP/1.1" 502 166 "-" "org.odk.collect.android/v2021.2.4 Dalvik/2.1.0 (Linux; U; Android 11; TECNO PR651H Build/RP1A.201005.001)"
192.241.206.190 - - [20/Apr/2022:18:47:34 +0000] "GET /portal/redlion HTTP/1.1" 301 185 "-" "Mozilla/5.0 zgrab/0.x"
2022/04/20 18:48:17 [error] 199#199: *196 connect() failed (111: Connection refused) while connecting to upstream, client: 69.209.6.149, server: libodk.com, request: "GET /v1/test/ABCD/projects/2.forms/RegistrationCentral3/draft/formList HTTP/1.1", upstream: "http://172.18.0.8:8383/v1/test/ABCD/projects/2/forms/RegistrationCentral3/draft/formList", host: "libodk.com"
69.209.6.149 - - [20/Apr/2022:18:48:17 +0000] "GET /v1/test/ABCD/projects/2/forms/RegistrationCentral3/draft/formList HTTP/1.1" 502 166 "-" "org.odk.collect.android/v2022.2.0 Dalvik/2.1.0 (Linux; U; Android 11; SM-A725M Build/RP1A.200720.012)"
69.209.6.149 - - [20/Apr/2022:18:48:24 +0000] "GET /version.txt HTTP/1.1" 304 0 "https://libodk.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36"

Here is an excerpt from docker logs --tail 100 central_service_1, which shows migration issues:

generating local service configuration..
running migrations..
(node:27) UnhandledPromiseRejectionWarning: error: the database system is starting up
    at Parser.parseErrorMessage (/usr/odk/node_modules/pg-protocol/dist/parser.js:278:15)
    at Parser.handlePacket (/usr/odk/node_modules/pg-protocol/dist/parser.js:126:29)
    at Parser.parse (/usr/odk/node_modules/pg-protocol/dist/parser.js:39:38)
    at Socket.<anonymous> (/usr/odk/node_modules/pg-protocol/dist/index.js:10:42)
    at Socket.emit (events.js:400:28)
    at Socket.emit (domain.js:470:12)
    at addChunk (internal/streams/readable.js:290:12)
    at readableAddChunk (internal/streams/readable.js:265:9)
    at Socket.Readable.push (internal/streams/readable.js:204:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:188:23)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:27) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 5)
(node:27) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
checking migration success..
*** Error starting ODK! ***
After attempting to automatically migrate the database, we have detected unapplied migrations, which suggests a problem with the database migration step. Please look in the console above this message for any errors and post what you find in the forum: https://forum.getodk.org/

3. What have you tried to fix the issue?
I have tried restarting the droplet,
I've tried systemctl stop docker-compose@central followed by docker-compose build
I've confirmed that there is enough disk space using df -h
I've created a new administrator account with a different email / login but get the same error.

I tried installing docker-compose based on instructions here:

So now I am on docker-compose version 1.29.2

I noticed that my .env didn't have port instructions, so I modified it like this:

SSL_TYPE=letsencrypt
DOMAIN=libodk.com
SYSADMIN_EMAIL=nicholai@lib.solar
HTTP_PORT=80
HTTPS_PORT=443

I tried increasing the memory allocation following the instructions here:

I used - NODE_OPTIONS=--max_old_space_size=6144 since I have 8GB available.

This seems to have fixed the migration problem but the main issues remain.

Any help would be very much appreciated.

Thank you,
Nicholai

yanokwa · April 20, 2022, 9:09pm

~~Your migrations have run out of memory. Increase your memory allocation. This is documented at https://docs.getodk.org/central-troubleshooting/#migration-fails-due-to-out-of-memory-error.~~

Whoops. Looks like you already tried the above!

Now that the migration has been fixed, what is the remaining problem? ruODK? What version are you running? You can check with this command: packageVersion("ruODK").

Nicholai · April 20, 2022, 9:19pm

Thank you @yanokwa .
I followed the instructions and increased the memory allocation with - NODE_OPTIONS=--max_old_space_size=6144 since I have 8GB available.
Then I rebuilt the service container.

This seems to have fixed the migration issues, but the nginx logs still show a connection refused error.
When I try to log in on my domain, I still get "A user is already logged in", etc.

I am running ruODK version 1.3.1... but that issue is not so urgent compared to the login / overall access issue.

yanokwa · April 20, 2022, 9:25pm

User already logged in is usually a caching issue. Try a different browser or an incognito mode to see if that helps.

Nicholai · April 20, 2022, 9:26pm

Ok I just tried in an incognito window.

When I log in, I get an Error code 502

This is the same error I get when I try to download via ruODK:
Bad Gateway (HTTP 502). Failed to get desired response from server https://libodk.com as user "nicholai@lib.solar"

I tried creating another administrator, in case it was an issue with my login. But no luck.

Florian_May · April 22, 2022, 9:15am

That issue is definitely upstream of ruODK if you get the same error through the Central web interface.

yanokwa · April 22, 2022, 12:47pm

We’ve been able to resolve this issue and I wanted to update this topic with the solution and our next steps in case someone else runs into this problem.

Central deletes abandoned drafts during the upgrade to v1.4. It's ordinarily a very fast operation, but this particular server had 60k abandoned drafts, each with a large form attachment. That rare combination, caused by an API integration, meant the migration was taking a very long time.

The migration is not critical to Central functioning and this server was scheduled to be retired anyway, so the best solution was to disable the drafts migration on this server.

A better longer term solution is to change the draft migration so it runs in batches in a background task. We believe this specific issue is quite rare so we’ll wait to see if anyone else reports it before we implement the longer term solution for the migration.

We will also explore ways to prevent this broader type of issue by batching and backgrounding time-insensitive operations in Central.