Can't login after upgrading Central to v2024.1

aankrah · July 23, 2024, 1:23pm

I recently tried upgrading my Central instance from, I believe, v1.3 to v2024.1.0-1-g42d83f1. The good news is, it was successful and i am able to get to the login page when I visit the domain.

However, when I try to login with my credentials, i get the error message "Incorrect email address and/or password."

I have tried the good ol' stop and restart the docker cluster, restart the host device, allocate even more RAM and rebuild service, checked if the postgres14 upgrade successful file has been created (it has) and none seemed to help my case. Running docker logs --tail 15 central-service-1 gives this output. Don't know if that is helpful.

Any pointers would be greatly appreciated.

danbjoseph · July 23, 2024, 3:51pm

You can try directly (re)setting a user's password with the command line tools. You could also try creating a new user via the command line tools and then also setting their password with the tools as well. See: https://docs.getodk.org/central-command-line/#getting-to-the-tools

aankrah · July 24, 2024, 7:34am

@danbjoseph I tried creating a new user with odk-cmd and to my surprise all my projects, forms and associated submissions were nowhere.

I tried restoring my last backup and using the recovery steps outlined here but neither could help my case.

aankrah · July 24, 2024, 11:22am

UPDATE
After restarting central a couple times, the web UI was no longer accessible from its domain. Taking clues from this section of the docs, I navigated to /var/lib/docker/volumes/ and checked the size of all files there to kinda guage which one matched the size of my last backup with the command du -sh * | sort -hr

I then made a copy of the output of CENTRAL_NEW_DB=$(docker inspect --type container central-postgres14-1 \ -f '{{(index .Mounts 0).Source}}' | cut -d / -f 6) and renamed the randomID file with the largest size to the value of "$CENTRAL_NEW_DB". I executed the build and up -d commands after that and Central is still not accessible even with 127.0.0.1 on the server hosting it.

I have attached the output of docker compose logs --tail 10 below.
log_output.txt (9.5 KB)

aankrah · July 25, 2024, 3:36pm

UPDATE 2
In order to skip delaying my entire team's work due to this issue, I stopped the old instance, renamed the central folder and installed a fresh instance of central. Everything went well until it was time to create a new user via odk-cmd. I again saw a similar error from the last instance (shown below).

sebtux · July 26, 2024, 12:34am

Hi,

I regret I don't have any particular advice. But once I was in a similar situation because I didn't follow the installation instructions until the end. In my case, I didn't change upstream servers' config for the SSL certificates which had changed.

It seems to me (subjective assessment here) you may have missed some step during your upgrade process.

Regarding your clean setup... have you checked there are no old containers running?

HTH,

--
Seb

aankrah · July 26, 2024, 6:00am

Hi @sebtux, thanks for your response. I don't quite think upstream is an issue with my current setup. I've deployed central 3 times now and I was just fine using letsencrypt or customssl. The section fo the old instances logs that I found concerning were

central-service-1 | running migrations..
central-service-1 | Error: connect ECONNREFUSED 172.19.0.6:5432
central-service-1 | checking migration success..
central-service-1 | Error: connect ECONNREFUSED 172.19.0.6:5432
central-service-1 | *** Error starting ODK! ***
central-service-1 | After attempting to automatically migrate the database, we have detected unapplied migrations, which suggests a problem with the database migration step. Please look in the console above this message for any errors and post what you find in the forum: https://forum.getodk.org/
central-service-1 | wait-for-it: waiting 15 seconds for postgres14:5432
central-service-1 | wait-for-it: timeout occurred after waiting 15 seconds for postgres14:5432
central-service-1 | generating local service configuration..

Regarding the new instance, I made sure to stop the cluster and restart the host device before commencing the deployment. So, I'm kind of certain there is no old central container still running. Also, the section of the logs above from the old instance can be observed in the logs of central-service-1 for the new deployment as well. I don't know if it has to do with some new requirement in central that my environment triggers. Hoping someone out there could help me figure this out ASAP.

alxndrsn · July 27, 2024, 7:50am

This sounds like a big jump!

I hope your data is reliably backed up and you haven't lost anything while attempting these upgrades.

I have a few questions about the process followed and your setup:

Did you follow upgrade instructions for each intermediate version (https://docs.getodk.org/central-upgrade/)
Have you tried upgrading from v1.3 to any of the intermediate versions?
Are you running PostgreSQL from the standard ODK Central docker images, or are you running it externally?

aankrah · July 27, 2024, 11:21am

Hello @alxndrsn,

I sort of misinterpreted the whole "Central v2023.5, v2024.1: no upgrade notes" part of the documentation and went straight to executing this section of the docs. I did not try upgrading to any intermediate version and yes, I'm using the default postgres deployment with central.

LN · July 28, 2024, 5:24pm

Really sorry you’re in this state. Like @alxndrsn said, do make sure you know where your backup is and keep it safe.

It sounds like you pulled the new 2024.1 changes and then rebuilt without having gone through any of the upgrade instructions. Did you get any kind of error about the postgres14 upgrade? The more you can tell us about what actually happened in your upgrade attempt the more likely someone will have an idea.

You’re likely going to have some challenges trying to run Central twice from the same host and that may be what the migration failures are about. One thing you could do is start on a different host with a fresh install and practice restoring your backup. That will give you a chance to get familiar with the steps so you can then do it safely on your real server. ~~Maybe before you do that @alxndrsn can remind us whether a Postgres12 backup be restored directly to a fresh install with postgres14?~~ (I confirmed that this should work)

Note also that if you use web links you’ll have to restore those separately so do not delete anything from your original host until you are absolutely certain that you have a fully working test server up.

aankrah · July 29, 2024, 2:36am

Hello @LN

I did not get any errors with the upgrade itself. I remember checking for the successful install file afterwards and it was present on the initial instance of Central after the upgrade. My troubles started when I wanted to restore the backup I had. Unfortunately, I did not take note of the error message. Is there a way to retrieve that from the old folder I renamed?

aankrah · July 29, 2024, 2:41am

One more thing,
the instance of central 2024.1 running right now always shows the status of central-service-1 as up X seconds with X always less than 10 while the others (nginx, enketo, postgres14, etc) have been running for days.

Hopefully, this bit of info is helpful.

alxndrsn · July 30, 2024, 8:13am

If you send me a larger set of logs from your postgres containers via direct message I can take a look for any obvious causes. (Click "Message" on https://forum.getodk.org/u/alxndrsn/summary).

The output from the following would be helpful:

docker logs central-postgres
docker logs central-postgres14

aankrah · July 30, 2024, 8:50am

Okay @alxndrsn, coming right up.

FYI,

docker logs central-postrgres

yielded the response Error response from daemon: No such container: central-postgres.
However, the command

docker logs central-postgres-1 >> ~/Desktop/central_logs/central-postgres-1.txt

yields touch: cannot touch '/var/lib/postgresql/14/data/../.postgres14-upgrade-successful': No such file or directory

yanokwa · August 5, 2024, 6:05am

Just to close this thread, it's not clear what went wrong here, but the problem was fixed by switching the Central install to v1.5.4, restoring the data backup, and upgrading to the latest version of Central.

We'll be adding stronger recommendations to the documentation about the need for full-system backups and regular upgrades.