Since a while now I’m struggling to find a way to keep the database from growing to sizes that are hardly manageable. I have postponed any action for a while, but now the DB size is about 120GB and I really need to find a solution.
Among the other options I’m exploring the possibility to move the attachments to an S3 compatible storage
but this solution is not free of doubts -that I have outlined in the post linked here above- and anyway it would add a layer of complexity that I would prefer to avoid, unless really necessary.
Since a while I have then started to look into deleting submissions or deleting submissions attachments. There are 2 threads that I have carefully read
and where I also left some comments and already received answers, but to be honest I’m still not 100% clarified in this matter so I’m looking for some final answer/confirmation (or not) and this will hopefully help others facing the same problem.
This comment by @ktuiteDeleting media from Central submissions - does it reduce database size - #2 by ktuite seems to imply that there is a API method to delete submissions attachments which I can’t find in docs. I’m assuming that it would be something different from the “clear submission attachment” one and assuming that it really exists, is also unclear if that would trigger the database size to shrink
Deleting a submission entirely https://docs.getodk.org/central-api-submission-management/#deleting-a-submission seems to be the only clear way to get the database to shrink in size. It is still unclear to me if after the 30 days retention period is necessary to manually issue a VACUUM or VACUUM FULL commands inside the database or if database size will shrink automatically
API docs refer that “Purging deleted Submissions can also be triggered from the command line” but I can’t find here https://docs.getodk.org/central-command-line/ how that command would look like. In this other comment by @ktuiteODK Central - How to delete a submission - #19 by ktuite there is a reference to the command docker compose exec service node lib/bin/purge-forms.js --help but this does not seems to work on recent Central versions (see here below). If this command still exists, would this help avoid wait the 30 days retention period, and would it trigger the database to shrink in size?
node:internal/modules/cjs/loader:1386
throw err;
^
Error: Cannot find module '/usr/odk/lib/bin/purge-forms.js'
at Function._resolveFilename (node:internal/modules/cjs/loader:1383:15)
at defaultResolveImpl (node:internal/modules/cjs/loader:1025:19)
at resolveForCJSWithHooks (node:internal/modules/cjs/loader:1030:22)
at Function._load (node:internal/modules/cjs/loader:1192:37)
at TracingChannel.traceSync (node:diagnostics_channel:328:14)
at wrapModuleLoad (node:internal/modules/cjs/loader:237:24)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:171:5)
at node:internal/main/run_main_module:36:49 {
code: 'MODULE_NOT_FOUND',
requireStack:
}
Node.js v22.21.0
In the threads I liked above there are also valuable comments by other users and by @yanokwa but there are also statements I really struggle to understand. For example ODK Central - How to delete a submission - #34 by yanokwa it ends with “So in summary, if you delete data in Central, it eventually frees up space for re-use inside the database, but it may not free up space for re-use by the operating system” but I really can’t understand what this means. On a “normal” server the space that a DB can use to store data is the free space on the disk where the PostgreSQL data folder is in… so what is the difference?
On a “normal” server the space that a DB can use to store data is the free space on the disk where the PostgreSQL data folder is in…
Yes, that space, but also, the space marked as free, internally by PostgreSQL, inside its data files. So for instance, you might delete a bunch of rows, and if enough and perhaps consecutively stored, this eventually frees up a filesystem page (usually 4K or 8K) somewhere halfway one of PostgreSQL’s backing files. PostgreSQL will mark that space (a “hole” in the file, so to speak") as free to reuse for itself, but doesn’t give it back to the operating system (although it could, on some operating systems and some file systems, with (if I recall correctly) some “hole punching” flag on a fallocate system call).
IIRC a vacuum fullwould return that space to the filesystem because it’ll rewrite the backing files without holes.
Hi @manghig, I'm replying to a variety of your posts and comments, partly trying to make sure other forum threads don't lead others to confusing outcomes later.
I think it should be possible in your case to reduce your database size by clearing submission attachments but not deleting the entire submission. You've identified that the deleting in Central usually just means "soft-deleting" and you also need to either wait for or manually trigger the purging process. The problem you're having with the purge script is that it is no longer called purge-forms.js, just purge.js. This is the kind of small change leading to big problems that I would personally find very frustrating, so I'm sorry if we've caused that for you! I just updated another thread ODK Central - How to delete a submission - #40 by ktuite to correct this there, too.
I'm now very invested if this works for you:
clear submission attachments (see blobs still exist in database)
run purge script
see that there are now fewer blobs in database taking up less space
vacuum or whatever needs to happen to reclaim that space
I also think the daily purge script should have deleted/removed any blobs for attachments you previously cleared, without the 30 day wait.
Edited to add: Do your submissions have multiple versions / have they been edited? We just realized that if that is the case, clearing the submission attachments will only work on the latest version of the submission, in which case the underlying blob wont be truly unlinked in the database and wont be cleaned up by the purge script. I've put this in an issue. Maybe we need a more powerful attachment delete.