Upgrade from 1.2 to 2023.2 did not migrate database corectly

dakotabenjamin · April 14, 2023, 2:54pm

1. What is the issue? Please be detailed.

We performed an upgrade following the docs. We go to the steps of upgrading the database (we use the default) and there were no errors present. During the step to migrate postgres > postgres14 (docker compose up postgres) we got the following response:

ubuntu@ip-xxxx:~/central$ docker compose up postgres
[+] Running 3/3
 ✔ Network central_default       Created                                                                                                                                                 0.1s
 ✔ Volume "central_postgres14"   Created                                                                                                                                                 0.0s
 ✔ Container central-postgres-1  Created                                                                                                                                                 0.1s
Attaching to central-postgres-1
central-postgres-1  | Fri 14 Apr 2023 02:17:53 PM GMT [upgrade-postgres.sh] Checking for existing upgrade marker file...
central-postgres-1  | Fri 14 Apr 2023 02:17:53 PM GMT [upgrade-postgres.sh] No old data found.
central-postgres-1  | Fri 14 Apr 2023 02:17:53 PM GMT [upgrade-postgres.sh] Complete.
central-postgres-1 exited with code 0

The new database was empty, meaning it appears we have lost all our data and users (we did perform a database backup and a server backup, so if we cannot continue we will revert to 1.2). Does anyone know what may have gone wrong, how how we can ensure the database upgrade works?

3. What have you tried to fix the issue?
I haven't tried anything yet for fear of losing the ability to easily update the database from 9.6 / odk version 1.2

yanokwa · April 14, 2023, 5:03pm

Did anything unusual happen doing the upgrade or did it work step-by-step exactly as written in docs?

Did the docker compose build before this step you showed have information about a database upgrade happening?

Does ls ./files/postgres14/upgrade/upgrade-successful show that the file has been created?

As far as next steps, I'd recommend reverting to the server backup so you aren't blocking work. Then, I'd restore the server backup to another machine and walk through the process again to see what happens.

dakotabenjamin · April 14, 2023, 5:50pm

All the steps we took returned usual responses- no errors that we found. We did get the `upgrade-successful file as well. Gist for docker build: https://gist.github.com/dakotabenjamin/b9f37cc5dc2f9825f474ae327d44efc7- I don't see anything wrong per se but nothing about a db upgrade happening.

Unfortunately, a separate issue has arisen as we attempt to restore the backup to another machine but we are working through it. Do you know if it's possible to revert the version back to 1.2 in-place? can we simply checkout the older version and downgrade the submodules? alternatively, can we simply restore a database dump to a fresh server without running the backup restore script?

yanokwa · May 4, 2023, 8:17pm

A post was split to a new topic: Stuck on Central upgrade

Saad · April 15, 2023, 2:19pm

I guess i read somewhere that for versions below 1.3, its better to do upgrade in 2 steps. First from 1.x to 1.5, and then 1.5 to 2023.x. when u restore your server from backup, try this.

dakotabenjamin · May 2, 2023, 9:57pm

Update:

After much effort (we had a separate issue of a borked backup using the API, luckily we took a pg_dump as well) we managed to restore the original 1.2 server. From there, we did the following steps to work towards an updated solution:

Created a fresh backup using pg_dump directory format (seemingly the same format the API creates- I haven't had good luck with that for some reason)
Setup a local version of central v 1.2.2 and restored the backup. As a note- enketo uses a debian stretch based image which fails to get apt repos properly due to repo urls changing to archive.debian.org. I had to add the following to some of the dockerfiles:

RUN echo "deb http://archive.debian.org/debian stretch main" > /etc/apt/sources.list

To restore the backup, I added the directory as a volume to the postgres container then ran pg_restore:

docker-compose exec postgres pg_restore -Fd -d postgres -U odk --clean --create /data/restore/tmp/odk-2023-05-02

I started working through the upgrade docs for every major upgrade. @Saad that is good to hear- as a precaution I worked one by one.

Once completed with that, I will take another backup of the local server, fully upgraded, and get that onto our new production server. If that doesn't work we will have to take another backup from our old restored server, and run through this whole process again on production instead of local.

All in all this has been a bit of a gruesome experience. We will definitely be more vigilant about upgrades in the future, but I also hope that the devs will consider more robustly supported upgrades/migrations. Or at least to clarify which versions won't work to be upgraded in the upgrade docs.

dakotabenjamin · May 3, 2023, 12:59am

Everything was going good until the big upgrade:

odk-central-postgres-1  | Performing Upgrade
odk-central-postgres-1  | ------------------
odk-central-postgres-1  | Analyzing all rows in the new cluster                       ok
odk-central-postgres-1  | Freezing all rows in the new cluster                        ok
odk-central-postgres-1  | Deleting files from new pg_xact                             ok
odk-central-postgres-1  | Copying old pg_clog to new server                           ok
odk-central-postgres-1  | Setting oldest XID for new cluster                          ok
odk-central-postgres-1  | Setting next transaction ID and epoch for new cluster       ok
odk-central-postgres-1  | Deleting files from new pg_multixact/offsets                ok
odk-central-postgres-1  | Copying old pg_multixact/offsets to new server              ok
odk-central-postgres-1  | Deleting files from new pg_multixact/members                ok
odk-central-postgres-1  | Copying old pg_multixact/members to new server              ok
odk-central-postgres-1  | Setting next multixact ID and offset for new cluster        ok
odk-central-postgres-1  | Resetting WAL archives                                      ok
odk-central-postgres-1  | Setting frozenxid and minmxid counters in new cluster       ok
odk-central-postgres-1  | Restoring global objects in the new cluster                 ok
odk-central-postgres-1  | Restoring database schemas in the new cluster
odk-central-postgres-1  |                                                             ok
odk-central-postgres-1  | Copying user relation files
odk-central-postgres-1  | error while copying relation "pg_toast.pg_toast_17363": could not write file "/var/lib/postgresql/14/data/base/16401/17366.5": No space left on device

I was playing it pretty close, but the sudo ./files/postgres14/upgrade/check-available-space command returned successful so I thought I was in the clear. Any suggestions? Can I make some space and then rerun docker compose up postgres?

yanokwa · May 4, 2023, 8:14pm

I'm sorry you've had an unpleasant experience self-hosting, @dakotabenjamin, and I appreciate you documenting what worked and what did not so others can benefit.

In general, you can skip upgrades and directly install the latest version as long as you follow all relevant upgrade instructions (e.g., upgrading Docker). That said, we recommend you run the latest version of Central because it tends to be the safest. That's what we do on ODK Cloud.

To answer your immediate question, yes, you can make some space and re-run docker compose up postgres. How close were you playing it? My guess is that between when you ran check-available-space and did the migration, some other process wrote to disk and that pushed you over the limit. I've filed an issue at https://github.com/getodk/central/issues/424 to add some buffer to our estimate.

We spent a lot of time planning and testing this migration. The code was reviewed by multiple people on the team and went through several QA passes. We also made this release infrastructure only so users wouldn't rush to upgrade to get new features.

It's been a smooth migration for most, but the reality is that there are edge cases we can't predict. I know that's cold comfort, but please know we've heard your feedback and we'll work to catch more of those edge cases as Central evolves. It might also be time for us to introduce a Central beta program.

If you're still stuck after you increase disk space space, shoot me an email at yanokwa@getodk.org and I'm glad to take a quick look at your install and see if I can help get the upgrade to completion.

yanokwa · July 21, 2023, 5:10pm

A post was split to a new topic: Upgrade from 1.5.2 to latest failed