ODK Central with custom database with SSL

Thanks for the thorough reply, Matt!

In attempt 4 and 5, I used only one option. In my follow-up, I used both.

I read my errors as I can't append parameters to the db name at all.

My IT crew pointed me to a pem certificate at https://docs.microsoft.com/en-us/azure/postgresql/concepts-ssl-connection-security which I'll try next. Here's hoping that Slonik supports the config setting!

1 Like

Yeah, I'm now unsure whether that approach is possible. In particular, I think the UnhandledPromiseRejectionWarning comes from this. That approach is definitely a workaround, so it'd be better anyway for us to find an alternative.

I'm guessing that Slonik does, but right now Central doesn't have a way to pass those options to Slonik.

The Central team will discuss further over the coming days. I'll post an update here once we know more. In the meantime, let us know what else you learn/try!

1 Like

Solution: If using an external, hosted database, and that database requires TLS/SSL, you must provide a root certificate.
Reference for Postgres hosted on Azure: https://docs.microsoft.com/en-us/azure/postgresql/concepts-ssl-connection-security
ODK Central uses slonik, which in turn uses node-postgres. SSL options: https://node-postgres.com/features/ssl

It appears that ca is the correct option. The node-postgres and azure docs are unclear as to which is which. These are not as beginner friendly as the ODK docs.
Migrations, still using knex, meanwhile use the option rejectUnauthorized: false.

  1. Download the Microsoft PEM certificate linked from https://docs.microsoft.com/en-us/azure/postgresql/concepts-ssl-connection-security into my home directory on the server running Central (e.g. using wget).
cd ~
wget https://cacerts.digicert.com/DigiCertGlobalRootG2.crt.pem
  1. vim ~/central/files/service/config.json.template (comments inserted for clarity, not in production config)
{
  "default": {
    "database": {
      "host": "DBHOST",
      "user": "DBUSER",
      "password": "DBPASS",
      "database": "DBNAME",
      "ssl": {
        "rejectUnauthorized": false, # this is for migrations using knex
        ca: fs.readFileSync('/home/USERNAME/DigiCertGlobalRootG2.crt.pem').toString() # this is for ops using slonik
      },
    },

... # other config, eg. custom email
2 Likes

Glad you have a solution! It sounds like everything is working? Where did you place this configuration?

@Matthew_White sorry, meant to reply sooner. I've edited the post marked as Solution with further details for clarity.

1 Like

Thanks, @Florian_May!

It's interesting, I actually don't think the ca option is being passed to Slonik. We construct a connection string to pass to Slonik, and we don't use the ca option when constructing that connection string. The ca option would be passed to Knex though.

I'm also surprised that you're able to call fs.readFileSync(), since that would involve executing JavaScript rather than parsing static JSON values. We use the config package to parse the config file, and I believe that config parses .json files as static JSON. (config supports loading dynamic .js files, but we don't use that functionality.)

In any case, this has been a useful discussion, and we have a clearer sense of the changes we'll need to make (probably for v1.4). Hopefully all this will be simpler before long!

1 Like

I'm upgrading my prod server from 1.1 to latest (1.2) and am running into trouble.
Specifying the SSL cert as ca: fs.readFileSync('/home/USERNAME/DigiCertGlobalRootG2.crt.pem').toString() throws parser errors in the Central service. I've permutated all trailing commas / quotes / comments / indents but haven't found a way to provide the SSL root certificate via the config. This is blocking my upgrade path to ODK Central versions using slonik.

I'm also surprised that you're able to call fs.readFileSync()

On my UAT server, this somehow worked.
On my production server, I'm getting parsing errors indicating that this is not valid JSON. The node-postgres docs on SSL indeed show usage of the SSL parameters from JS. I agree this shouldn't have worked on my UAT server, but seeing I can access my old DB I will absolutely not touch that server until I've rescued all the data from that old DB.

Would anyone be able to take a look at this? This post boils down to the question: "How to provide a root certificate for an external database which requires TLS/SSL via the ODK Central config".

Unfortunately, as I understand it so far, this isn't possible at the moment. I think we could find a way to pass the ca option through to Knex. But unlike Knex, Slonik doesn't directly accept node-postgres options and will only connect to the database via a connection string. We construct that connection string ourselves, but when we do so, we ignore all SSL options. That makes me think that a code change will be required in order to pass SSL options through to Slonik. I'll discuss what that would involve with the rest of the Central team and report back here.

However, one point of confusion I have is why you didn't have to specify ca when you were running v1.1. If your database required that option, wouldn't you have needed to specify that for Knex in v1.1?

When you were running v1.1, what did your database config look like? Did you specify anything under "ssl" other than "rejectUnauthorized": false? I think not, but just wanted to check.

1 Like

Cheers! I'll report back here with versions and Configs in a few hrs. External db with TLS/SSL is a hard requirement for our record keeping, so it would be great if it were possible past v1.1.

1 Like

It's not ready for production yet, but I'm working on a way to pass a subset of SSL options through to Slonik. I'm hoping that if we can just pass "rejectUnauthorized": false to both Slonik and Knex, that'll match what worked for you in v1.1 and will allow you to connect in v1.2.

1 Like

Oh that's fantastic, thanks! That would help me greatly.

FYI here are my configs:

Server 1

Test server, works, but slow performance, and timeouts when exporting submissions and media to ZIP

versions:
e16b79530756bbd825f68275b3b6e53951409474
 70ec99f02885a06e37709db6319bfdf96fac84eb client (v1.2.2)
 59556ecea91f0a25678cd6da0084adb7e66ca099 server (v1.2.1)

Database: TLS/SSL enabled, Azure hosted Postgres, upgraded from ODK Central 0.6
Config (sensitive credentials replaced by allcaps):

{
  "default": {
    "database": {
      "host": "DBHOST",
      "user": "DBUSER",
      "password": "DBPASS",
      "database": "DBNAME",
      ssl: {
        rejectUnauthorized: false,
        ca: fs.readFileSync('/home/USER/DigiCertGlobalRootG2.crt.pem').toString(),
        # key: fs.readFileSync('/home/USER/DigiCertGlobalRootG2.crt.pem').toString(),
        # cert: fs.readFileSync('/home/USER/DigiCertGlobalRootG2.crt.pem').toString(),
      },
    },

Note the missing quotes, trailing commas, and use of JS functions inside a JSON config.
However, the connection to the db works.

Server 2

Prod server, drafts are broken from db migrations to v1.2 and back to 1.1, but db connects.
Versions:

versions:
86d3ebc27ca8e3feb1b6a107da7f8f4fbb21f3f1
 cddb691e40e84aabff87b9d427e22a50282d6f99 client (v1.1.2)
 a33bc6fb3c34fe38894b0e9d0bb404f81da325e6 server (v1.1.1)

Database: TLS/SSL enabled, Azure hosted Postgres, upgraded from ODK Central 0.6. Lives on same host as test db.
Config:

{
  "default": {
    "database": {
      "host": "DBHOST",
      "user": "DBUSER",
      "password": "DBPASS",
      "database": "DBNAME",
      "ssl": {
        "rejectUnauthorized": false
      }
    },

Note the absence of trailing commas, full quotation of keys, and absence of comments as well as JS functions. Any of these threw parser errors, crashing the service container.

Use cases

Quick write up for core team discussion. Feature request: support external DB with TLS/SSL enabled.

Our IT and records keeping policies mandate the use of our hosted DBs, which come with failsafes and maintained backups. Production data in docker containers is deemed too flimsy, and storing the submissions with media attachments would blow out our server size within a field season. The DBs have TLS enabled for additional security. It's great to have a stateless server on one side, and all state and growth on the other side in an external, secure DB!

We could accept ignoring the TLS/SSL via "rejectUnauthorized: false", but would prefer to use the readily available certificate. I've noticed a certificates folder in the build process, could it be an option to specify an internal path to a DB SSL certificate and copy the certificate file into the source prior to docker-compose build?

Thanks, @Florian_May! I have additional thoughts that I'll write up over the next couple of days, but I wanted to give you an update on the PR. We're double-checking some things related to the PR, but I think it's ready to try on your test server.

With the PR in place, you'll be able to specify true for ssl:

{
  "default": {
    "database": {
      "host": "DBHOST",
      "user": "DBUSER",
      "password": "DBPASS",
      "database": "DBNAME",
      "ssl": true
    },

Doing so will automatically specify false for rejectUnauthorized. The above configuration should work the same as how this configuration would have worked in v1.1:

{
  "default": {
    "database": {
      "host": "DBHOST",
      "user": "DBUSER",
      "password": "DBPASS",
      "database": "DBNAME",
      "ssl": {
        "rejectUnauthorized": false
      }
    },

We're taking this approach because it matches how Slonik works. Whenever SSL is specified, Slonik seems to specify false for rejectUnauthorized. I've made a note of that behavior on the Slonik repo (here) because it'd be nice to have the option to specify true. However, given that that's how configuring Slonik works today, I think we should match it.

To deploy the PR on your test server:

  • Log into the server and navigate to the project folder (cd central).
  • cd server
  • git fetch
  • git checkout origin/db-connect
  • cd ..
  • Follow the regular upgrade instructions, starting at docker-compose build.

Let us know if this works!

1 Like

Thanks Matt! I'll try this in ca 6h and will report back here. The super quick turnaround of this much appreciated!

Edit: tried and looking good (SSL) but with a few bugs

Steps

Compressed steps to upgrade

  • Central to latest origin/master
  • central-frontend to latest origin/master (was at tag 1.2.2 but there are newer commits)
  • central-backend to branch origin/db-connect
~/central/$ git fetch --all && git pull
~/central/client$ git pull origin master
~/central/server$ git fetch --all && git checkout origin/db-connect
~/central/$ docker-compose build && docker-compose stop && docker-compose up -d
~/central/$ docker-compose logs -f --tail=200 service nginx

Versions:

versions:
e16b79530756bbd825f68275b3b6e53951409474(latest master)
+9d97de6397317dc4ce046d60cbb2ccaf4e967f40 client (v1.2.2-14-g9d97de63) (latest master)
+93732a496bc9daa40cdbb36247c908649a96f10a server (v1.2.1-4-g93732a4) (latest db-connect)

Config (verified through git diff showing only my changes):
files/service/config.json.template - credentials replaced with allcaps.

{
   "default": {
     "database": {
-      "host": "postgres",
-      "user": "odk",
-      "password": "odk",
-      "database": "odk"
+      "host": "DBHOST",
+      "user": "DBUSER",
+      "password": "DBPASS",
+      "database": "DBNAME",
+      "ssl": true
     },
     "email": {
       "serviceAccount": "no-reply@${DOMAIN}",
       "transport": "smtp",
       "transportOpts": {
-        "host": "mail",
-        "port": 25
+        "host": "mail-relay.lan.fyi",
+        "port": 587
       }
     },
     "xlsform": {

Logs

Starting up looks good!
~/central$ docker-compose logs -f --tail=200 service nginx

Attaching to central_nginx_1, central_service_1
nginx_1               | writing a new nginx configuration file..
nginx_1               | starting nginx without certbot..
service_1             | wait-for-it.sh: waiting 15 seconds for postgres:5432
service_1             | wait-for-it.sh: postgres:5432 is available after 0 seconds
service_1             | generating local service configuration..
service_1             | running migrations..
service_1             | starting cron..
service_1             | using 4 worker(s) based on available memory (8153436)..
service_1             | starting server.

I can see that migrations (using knex) have run or at least didn't fail, and there's no trace of SSL errors in the logs.
(Mind that the postgres:5432 is the local db I'm not using. I use the hosted DB with TLS/SSL.)

I was able to demote an admin via GUI and promote them again via the command line - no SSL issues in the logs. I believe this process used slonik.

docker-compose exec service odk-cmd --email EMAIL user-promote
'{"success":true}'

Bugs and issues

I'm running the latest master for central and the front-end, and the latest origin/db-connect for the backend. I'm not sure whether these versions are supposed to mesh together. The following bugs are likely the same as with the latest tag (1.2.1/1.2.2) I'd get through the standard install steps.

Authentication bug "login not persistent"

The site loads, but authentication is shifty - reloading the site requires me to login again. This happens also in a private browser window, so it's not a stale cookie.
After some to and fro logging in again and again, a site reload seems to remember my login, both reload and forced reload. This is puzzling, but not a blocker as not many users access the server directly.

Authentication bug "export not permitted to Administrator"

Exporting to ZIP fails on 403.1 "The authentication you provided does not have rights to perform that action." although my account is an admin account.
This error also occurs on forms with no submissions (export without media).
However, download of smaller datasets to ZIP (or larger ones but without media files) seems to work fine in a private browser tab.
Puzzling.

Bug "export stalls for more than 10 seconds causing timeout"

Exporting larger forms with attachments times out after 10 seconds - Central takes too long to start streaming, our cache cuts off. Likely a config on our end.
Server has 13G of free disk space, 8G of RAM (2G used), 4G of swap, CPU usage is low. Unlikely to be a hardware bottleneck, but could there be a lag in the software somewhere?

Bug "new form inaccessible for a few minutes"

Creating a new form works, lets me upload a form def, but doesn't immediately show up in the form list.

Bug "form draft pages inaccessible for a few minutes"

Creating a new form draft immediately redirects me to the index page, later processes the event, and much later shows the draft.

# I hit "create new draft"
 172.30.1.9 - - [12/Aug/2021:02:45:04 +0000] "POST /v1/projects/2/forms/build_Field-observation-form1-4_1571300489/draft HTTP/1.1" 200 16 "https://odkc-uat.dbca.wa.gov.au/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
nginx_1               | 172.30.1.10 - - [12/Aug/2021:02:45:05 +0000] "GET /v1/projects/2/forms/build_Field-observation-form1-4_1571300489/versions HTTP/1.1" 200 624 "https://odkc-uat.dbca.wa.gov.au/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
service_1             | ::ffff:172.19.0.9 - - [12/Aug/2021:02:45:05 +0000] "GET /v1/projects/2/forms/build_Field-observation-form1-4_1571300489/versions HTTP/1.0" 200 624
# I get bumped to the index page

# a few seconds later this happens
service_1             | [2021-08-12T02:45:11.852Z] start processing event form.update.draft.set::2021-08-12T02:45:04.884Z::95e60ac8-991c-4f97-aecd-4650953602b2 (1 jobs)
service_1             | [2021-08-12T02:45:11.915Z] finish processing event form.update.draft.set::2021-08-12T02:45:04.884Z::95e60ac8-991c-4f97-aecd-4650953602b2

# Minutes later, I can visit the form overview and see the new draft. The "upload new form def" works.

Seems fixed - performance issues

Exporting submissions and media via ruODK was super slow, but seems to work at normals speeds now.

@Matthew_White am I running the right versions together?
What would you like me to do with the bugs I found?

Awesome, thanks for trying out the PR!

I actually recommend staying at v1.2.2. In general, commits on the master branch of central-frontend that are after a release tag probably haven't gone through QA and may or may not even match the master branch of central-backend. The db-connect branch of central-backend branches off the v1.2.1 tag, so the only way it differs from the latest release is the changes related to the database connection.

Excellent! It sounds like the server is able to connect to the database via both Slonk and Knex. :tada:

As part of v1.2, we changed how some things work around login and logout. One change is that it shouldn't be possible to log in twice in two different tabs. If you log in in one tab, then try to log in in a second tab, the second tab should show the message, "A user is already logged in. Please refresh the page to continue." It sounds like you didn't see that message, right? Could you have been using an older version of the frontend, from before v1.2? This shouldn't be needed, but you could also try clearing your browser cache.

Also, what happens if you use your browser to clear the cookies and site data for the frontend? You could also check that the browser isn't set to block cookies or site data for the frontend.

This issue doesn't feel related to the db-connect PR. It could be related to another part of v1.2. If it ends up not being resolved, I recommend creating a separate forum topic.

That does sounds puzzling! This sounds like an issue with your cookie, since that's what's used to authenticate most file downloads. The cookie is removed whenever you log out, including when you log out in a different tab. Maybe something related to the first issue above led to the cookie being removed? If you end up creating a forum topic for the issue above, and this issue doesn't end up being resolved, that topic might be a good place to discuss this issue as well.

We've tried to make that part of the code high-performance, but there's also a fair amount of complexity there. 10 seconds seems pretty short for a timeout!

This one seems quite unusual to me! Some tasks are run after a form is created, and those tasks usually take a few seconds. However, even if those tasks take a while or even fail, the form should be immediately visible in the form list. Could something be caching the API response for the form list?

This feels related to the previous issue. The fact that you don't see the new draft for minutes and the fact that you're immediately redirected are probably related. It sounds like the API is telling the frontend that a draft doesn't exist. When you create a new draft, you're redirected to the Draft Status page. If the frontend tries to render that page, then is told that a draft doesn't actually exist (a contradiction), it will redirect to the index page.

The service log indicates that a task was started and finished for the draft soon after it was created. That should mean that the draft was successfully committed to the database. So I'm again thinking that there's some issue communicating that change to the frontend.

It's not clear that this is related to the db-connect PR, so I'd recommend creating a separate forum topic to discuss this issue and the previous issue together.

Awesome, this is starting to make sense.

I've deleted my cookies, and site reload now keeps me logged in, and allows me to export data.

I've seen the words "varnish cache' in the timeout message - this indicates that our IT have some kind of cache which I will kindly ask them to disable, and increase the timeout. I expect this could also fix the auth issues. I'll create new topics for any leftover issues.

Edit: our IT disabled the newly installed cache server for my ODK Central installs.
They advise to set headers to cache-control: private on the software side (ODK Central) to enforce no caching. Seems that would be a sensible default for Central, seeing it produces unexpected behaviour with caching enabled. (Crusty Django SSR dev speaking here...)

Since the SSL issue seems fixed, could it be merged?

1 Like

Hello All, Just wanted to know can we connect single Central Backend instance to two Central Frontends? and where to configure the APIs in frontend.
Thanks in advance.

@Florian_May, that PR is now merged! I think you can go ahead and deploy that change to your production server. Since you downgraded your production server to v1.1, you might also need to redo the special instructions for upgrading to v1.2 (not sure).

Even though the PR is merged into the master branch, you should not use the master branch on your server, as the branch contains commits that have not gone through QA. Instead, you should use origin/db-connect. If your server is on the commit 93732a496bc9daa40cdbb36247c908649a96f10a, you should be all set.

I think we'd be open to a PR or a detailed issue along those lines. I'm thinking that this header would be needed for both backend responses and frontend resources? If so, I'm guessing that the header would probably go in the nginx configuration in the central repository. But I also know that there are a few different caching-related headers; not sure whether any of those would need to be changed as well.

This seems farther off the beaten path compared to the standard Central configuration, but I think we'd be open to a PR along these lines. As a first step, if you want to create a detailed issue in the central repository, that'd be a good way for us to think more about the approach.

I'll note that Slonik and Knex seem to accept certificate options in different ways: Slonik wants a filename, while Knex wants the file contents. And like you said, in order for the backend to be able to access the file, the Docker setup would probably need to change. So both the central and central-backend repositories would need to change.

But I'm also wondering, given that Slonik will always specify false for rejectUnauthorized when SSL is specified, is there much benefit to specifying the certificate at this point? Maybe Slonik would need to change first?

In any case, I'm glad your server can connect to the database now! :rocket:

Thanks @Matthew_White1

My prod server is already happily running the db-connect branch - had to upgrade as soon as UAT showed it works, as the db (already migrated to 1.2) vs the 1.1. frontend gave me trouble.

PROD: tagged frontend with db-connect backend.

versions:
e16b79530756bbd825f68275b3b6e53951409474
 70ec99f02885a06e37709db6319bfdf96fac84eb client (v1.2.2)
+93732a496bc9daa40cdbb36247c908649a96f10a server (v1.2.1-4-g93732a4)

UAT: latest master frontend, db-connect backend.

versions:
e16b79530756bbd825f68275b3b6e53951409474
+9d97de6397317dc4ce046d60cbb2ccaf4e967f40 client (v1.2.2-14-g9d97de63)
+93732a496bc9daa40cdbb36247c908649a96f10a server (v1.2.1-4-g93732a4)

re caching headers
Disabling caching for the backend API could work but I'm unclear on how caching in VueJS works.
Let me talk to my IT folks and see which headers should go where. If I find a robust suggestion, I'll send a PR.

re SSL certificate
Since our DB and VMs running Central are well protected from the outside world, we're good with ignoring SSL. I just thought it might be safer to specify a certificate, rather than to ignore a safety feature.
Out of interest: I see Central uses two different ORM libraries (knex and slonik), which makes things like configuration options harder. Could Central migrate to just one?

1 Like

We've mostly moved from Knex to Slonik and generally like Slonik. However, Slonik doesn't have its own migration mechanism, so we've continued to use Knex for that. Mostly that's been fine, though like you said, configuration has turned out to be a challenge. I do think it'd be a significant effort to move away from Knex though.

1 Like

As an update, this functionality is now available as part of v1.3. Thank you for all your help troubleshooting this issue!

1 Like