ODK Cental returns 500 if encrypted submission export is called multiple times with incorrect passphrase

1. What is the problem? Be very detailed.
Originally reported by @Thalie at https://github.com/ropensci/ruODK/issues/30#issuecomment-831171343

2. What app or server are you using and on what device and operating system? Include version numbers.
ODK Central versions - @Thalie?

3. What you have you tried to fix the problem?
Currently triaging.
ruODK tests run against a production server, so I'm not immensely keen to crash it on purpose.

4. What steps can we take to reproduce the problem?
Thalie reports this behaviour from ruODK::submission_export being called with an incorrect passphrase against an encrypted form.

These are the unit tests for ruODK::submission_export with encrypted forms: https://github.com/ropensci/ruODK/blob/main/tests/testthat/test-submission_export.R#L104
I do not have tests with an incorrect passphrase which then crash the server.

5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.
I'd expect:

  • ODK Central not to crash
  • ODK Central to return an informative http status and error message when the passphrase is incorrect - unless this is a security risk (brute force attacks?)

@Thalie:

  • What ODK Central version are you using?
  • What are the server logs at the time of the crashes?

Thanks @Florian_May

DigitalOcean 4 GB Memory
client v1.1.2
server v1.1.0-7-ga33bc6f (not sure why not v1.1.1 as the other ODK Central instances I am using... as I thought I had done all required updates - I may have to check this into more details...)

I have just checked the query outside of ruODK and get
code 400.12 - Could not perform decryption. Double check your passphrase and your data and try again.

I was not very precise when I said "crash", the web interface is still online, but you can no longer visualise content (e.g., projects), the backend becomes unresponsive, requests do not work any more, and the services need to be restarted.

The behaviour is not absolutely reproducible (I had to run the request 3-4 times to get the server unresponsive), so I would suspect timeout due to overwhelming memory resources? (sorry servers are really out of my field)

Addressed in ruODK: https://docs.ropensci.org/ruODK/news/index.html#major-fixes-1
ruODK now terminates submission_export requests immediately on HTTP 500, which is returned by ODK Central on incorrect passphrases.

Suggestions for ODK Central:

  • Is it possible to stop processing the request early if passphrase incorrect to preserve server memory? Thalie's log indicates the db connection pool was exhausted.
  • What's the most informative server status to return - 500 or something in the 400 ranges like @Thalie suggested? My ODK Central 1.1 returns HTTP 500, Thalie's Central returns 400.12 (nice - why am I not getting a 400.12?).

@LN even a single request with an incorrect passphrase can crash my ODK Central server (v1.1) when sent from sufficiently many different test environments.

I'd like to file this here (for visibility) as a bug report against ODK Central's backend: prevent submission_export requests against encrypted forms with an incorrect passphrase from consuming too much server memory, and return an appropriate HTTP status (if that's not already done).

Thanks for the detailed report.

The 400.12 is what's returned by Central which I think is good and expected. I think you probably should terminate requests on that as well since it's not recoverable. The 500 is likely because you got to the point of reaching the KnexTimeoutError which @Thalie shows in her screenshot.

Chatted briefly with @issa about this and it does seem like there's a database connection leak though Central code does seem to free connections when it should. It doesn't appear to be related to memory. First thing will be to see whether it's reproducible on v1.2 where the database layer has been switched out (in part because you experienced an issue we thought was fixed by @issa's patch to the database library).

3 Likes

Thanks, that's good enough for me for now.
I'll repro with Central 1.2 and take it from there.
Looking forward to the release!

1 Like

Apologies for jumping in on the thread here...

One of our servers running 1.1.2 is failing intermittently with Knex timeout errors. Service docker log:

mservice  | KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
mservice  |    at Client_PG.acquireConnection (/usr/odk/node_modules/knex/lib/client.js:348:26) {
mservice  |  name: 'KnexTimeoutError',
mservice  |  sql: undefined,
mservice  |  bindings: undefined