I recently tried upgrading my Central instance from, I believe, v1.3 to v2024.1.0-1-g42d83f1. The good news is, it was successful and i am able to get to the login page when I visit the domain.
However, when I try to login with my credentials, i get the error message "Incorrect email address and/or password."
I have tried the good ol' stop and restart the docker cluster, restart the host device, allocate even more RAM and rebuild service, checked if the postgres14 upgrade successful file has been created (it has) and none seemed to help my case. Running docker logs --tail 15 central-service-1 gives this output. Don't know if that is helpful.
You can try directly (re)setting a user's password with the command line tools. You could also try creating a new user via the command line tools and then also setting their password with the tools as well. See: https://docs.getodk.org/central-command-line/#getting-to-the-tools
UPDATE
After restarting central a couple times, the web UI was no longer accessible from its domain. Taking clues from this section of the docs, I navigated to /var/lib/docker/volumes/ and checked the size of all files there to kinda guage which one matched the size of my last backup with the command du -sh * | sort -hr
I then made a copy of the output of CENTRAL_NEW_DB=$(docker inspect --type container central-postgres14-1 \ -f '{{(index .Mounts 0).Source}}' | cut -d / -f 6) and renamed the randomID file with the largest size to the value of "$CENTRAL_NEW_DB". I executed the build and up -d commands after that and Central is still not accessible even with 127.0.0.1 on the server hosting it.
I have attached the output of docker compose logs --tail 10 below. log_output.txt (9.5 KB)
UPDATE 2
In order to skip delaying my entire team's work due to this issue, I stopped the old instance, renamed the central folder and installed a fresh instance of central. Everything went well until it was time to create a new user via odk-cmd. I again saw a similar error from the last instance (shown below).
I regret I don't have any particular advice. But once I was in a similar situation because I didn't follow the installation instructions until the end. In my case, I didn't change upstream servers' config for the SSL certificates which had changed.
It seems to me (subjective assessment here) you may have missed some step during your upgrade process.
Regarding your clean setup... have you checked there are no old containers running?
Hi @sebtux, thanks for your response. I don't quite think upstream is an issue with my current setup. I've deployed central 3 times now and I was just fine using letsencrypt or customssl. The section fo the old instances logs that I found concerning were
central-service-1 | running migrations.. central-service-1 | Error: connect ECONNREFUSED 172.19.0.6:5432 central-service-1 | checking migration success.. central-service-1 | Error: connect ECONNREFUSED 172.19.0.6:5432 central-service-1 | *** Error starting ODK! *** central-service-1 | After attempting to automatically migrate the database, we have detected unapplied migrations, which suggests a problem with the database migration step. Please look in the console above this message for any errors and post what you find in the forum: https://forum.getodk.org/ central-service-1 | wait-for-it: waiting 15 seconds for postgres14:5432 central-service-1 | wait-for-it: timeout occurred after waiting 15 seconds for postgres14:5432 central-service-1 | generating local service configuration..
Regarding the new instance, I made sure to stop the cluster and restart the host device before commencing the deployment. So, I'm kind of certain there is no old central container still running. Also, the section of the logs above from the old instance can be observed in the logs of central-service-1 for the new deployment as well. I don't know if it has to do with some new requirement in central that my environment triggers. Hoping someone out there could help me figure this out ASAP.
I sort of misinterpreted the whole "Central v2023.5, v2024.1: no upgrade notes" part of the documentation and went straight to executing this section of the docs. I did not try upgrading to any intermediate version and yes, I'm using the default postgres deployment with central.
Really sorry you’re in this state. Like @alxndrsn said, do make sure you know where your backup is and keep it safe.
It sounds like you pulled the new 2024.1 changes and then rebuilt without having gone through any of the upgrade instructions. Did you get any kind of error about the postgres14 upgrade? The more you can tell us about what actually happened in your upgrade attempt the more likely someone will have an idea.
You’re likely going to have some challenges trying to run Central twice from the same host and that may be what the migration failures are about. One thing you could do is start on a different host with a fresh install and practice restoring your backup. That will give you a chance to get familiar with the steps so you can then do it safely on your real server. Maybe before you do that @alxndrsn can remind us whether a Postgres12 backup be restored directly to a fresh install with postgres14? (I confirmed that this should work)
Note also that if you use web links you’ll have to restore those separately so do not delete anything from your original host until you are absolutely certain that you have a fully working test server up.
I did not get any errors with the upgrade itself. I remember checking for the successful install file afterwards and it was present on the initial instance of Central after the upgrade. My troubles started when I wanted to restore the backup I had. Unfortunately, I did not take note of the error message. Is there a way to retrieve that from the old folder I renamed?
One more thing,
the instance of central 2024.1 running right now always shows the status of central-service-1 as up X seconds with X always less than 10 while the others (nginx, enketo, postgres14, etc) have been running for days.
If you send me a larger set of logs from your postgres containers via direct message I can take a look for any obvious causes. (Click "Message" on https://forum.getodk.org/u/alxndrsn/summary).
Just to close this thread, it's not clear what went wrong here, but the problem was fixed by switching the Central install to v1.5.4, restoring the data backup, and upgrading to the latest version of Central.
We'll be adding stronger recommendations to the documentation about the need for full-system backups and regular upgrades.