Struggles in keeping the database to a reasonable size (AKA deleting submissions, deleting/clearing submissions attachments)

1. What is the issue? Please be detailed.

Since a while now I’m struggling to find a way to keep the database from growing to sizes that are hardly manageable. I have postponed any action for a while, but now the DB size is about 120GB and I really need to find a solution.

Among the other options I’m exploring the possibility to move the attachments to an S3 compatible storage

but this solution is not free of doubts -that I have outlined in the post linked here above- and anyway it would add a layer of complexity that I would prefer to avoid, unless really necessary.

Since a while I have then started to look into deleting submissions or deleting submissions attachments. There are 2 threads that I have carefully read

and where I also left some comments and already received answers, but to be honest I’m still not 100% clarified in this matter so I’m looking for some final answer/confirmation (or not) and this will hopefully help others facing the same problem.

node:internal/modules/cjs/loader:1386
throw err;
^

Error: Cannot find module '/usr/odk/lib/bin/purge-forms.js'
at Function._resolveFilename (node:internal/modules/cjs/loader:1383:15)
at defaultResolveImpl (node:internal/modules/cjs/loader:1025:19)
at resolveForCJSWithHooks (node:internal/modules/cjs/loader:1030:22)
at Function._load (node:internal/modules/cjs/loader:1192:37)
at TracingChannel.traceSync (node:diagnostics_channel:328:14)
at wrapModuleLoad (node:internal/modules/cjs/loader:237:24)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:171:5)
at node:internal/main/run_main_module:36:49 {
code: 'MODULE_NOT_FOUND',
requireStack: 
}

Node.js v22.21.0
  • In the threads I liked above there are also valuable comments by other users and by @yanokwa but there are also statements I really struggle to understand. For example ODK Central - How to delete a submission - #34 by yanokwa it ends with “So in summary, if you delete data in Central, it eventually frees up space for re-use inside the database, but it may not free up space for re-use by the operating system” but I really can’t understand what this means. On a “normal” server the space that a DB can use to store data is the free space on the disk where the PostgreSQL data folder is in… so what is the difference?

Sorry for the long post and thanks in advance

Specifically in response to:

On a “normal” server the space that a DB can use to store data is the free space on the disk where the PostgreSQL data folder is in…

Yes, that space, but also, the space marked as free, internally by PostgreSQL, inside its data files. So for instance, you might delete a bunch of rows, and if enough and perhaps consecutively stored, this eventually frees up a filesystem page (usually 4K or 8K) somewhere halfway one of PostgreSQL’s backing files. PostgreSQL will mark that space (a “hole” in the file, so to speak") as free to reuse for itself, but doesn’t give it back to the operating system (although it could, on some operating systems and some file systems, with (if I recall correctly) some “hole punching” flag on a fallocate system call).

IIRC a vacuum full would return that space to the filesystem because it’ll rewrite the backing files without holes.

1 Like

thanks for the explanation.

Hi @manghig, I'm replying to a variety of your posts and comments, partly trying to make sure other forum threads don't lead others to confusing outcomes later.

As you saw in my other response Deleting media from Central submissions - does it reduce database size - #7 by ktuite "clearing" a submission attachment and "deleting" a submission attachment are the same concept.

I think it should be possible in your case to reduce your database size by clearing submission attachments but not deleting the entire submission. You've identified that the deleting in Central usually just means "soft-deleting" and you also need to either wait for or manually trigger the purging process. The problem you're having with the purge script is that it is no longer called purge-forms.js, just purge.js. This is the kind of small change leading to big problems that I would personally find very frustrating, so I'm sorry if we've caused that for you! I just updated another thread ODK Central - How to delete a submission - #40 by ktuite to correct this there, too.

I'm now very invested if this works for you:

  1. clear submission attachments (see blobs still exist in database)
  2. run purge script
  3. see that there are now fewer blobs in database taking up less space
  4. vacuum or whatever needs to happen to reclaim that space

I also think the daily purge script should have deleted/removed any blobs for attachments you previously cleared, without the 30 day wait.

Edited to add: Do your submissions have multiple versions / have they been edited? We just realized that if that is the case, clearing the submission attachments will only work on the latest version of the submission, in which case the underlying blob wont be truly unlinked in the database and wont be cleaned up by the purge script. :frowning: :frowning: :frowning: I've put this in an issue. Maybe we need a more powerful attachment delete.

1 Like

@ktuite yes! with the command you provided here ODK Central - How to delete a submission - #40 by ktuite I can now use a relatively simple workflow that allows to clear attachments, purge them and then recover disk space.

Thanks for the heads-up. Fortunately in my case edited submissions are only a few. But good to know you have identified this issue.

Thanks!

1 Like