Apologies if the subject of this post have been already discussed elsewhere, unfortunately I don’t have the time I wish I had to closely follow ODK developments.
What high-level problem are you trying to solve?
Nowadays every time a smartphone/tablet takes a picture and is attached to a form, there is a good chance that several MB of data are added to the ODK database backend. Even with a limited number of forms/devices/collaborators this means that the size of the DB quickly escalates, making management of disk space available problematic (especially but not limited to who uses VPS servers), not to mention backups and restores.
Any ideas on how ODK could help you solve it?
Before you “stone” me, yes I know that attachments can be placed in a S3 compatible storage, which of course is a good option to have. Anyway this adds a layer of complexity (and another possible point of failure) that not all can handle. 3rd party providers are plenty (AWS, Backblaze, etc.), but my experience is that the cost at the end of the month is often unpredictable because is not just a matter of how much storage is used, but also how much data is moved back and forth, the number of API calls, etc. Self hosting is an option too of course, but minIO open source is dead (they have now a “free” tier, but is to be seen what entails) and other projects like garageHQ or RustFS are very green and unclear if compatible with ODK Central. Speaking of the setup it is still very unclear to me what “The names of objects stored in S3-compatible storage do not stand alone and must be converted to useful filenames and connected to the right forms and/or submissions by Central. For example, object names will look like blob-412-950ababd4c8cf8d11rf5421433b5e3dafx5f6e75” entails (as seen in the Central docs).
Has the possibility to use “normal”/system disk space for the attachments be considered? If it was discarded as an option and/or if is not on the roadmap, why?
That said, I think it would be nice if there were improvements to help keep the database size at bay. From what I have understand the only real option is to delete submissions (that have attachments).
From the UI is possible to delete submissions, but unless I’m blind I can’t see a way to delete multiple submissions at a time, only one by one. When having thousands of submissions to delete that is impractical. Checkboxes to allow multiple selection or a way to select by date range would help a lot.
About the 30 days retention time for the “soft delete”: it always seemed arbitrary to me. For who is in desperate need to shrink the database size (and reduce the overall amount of disk space used on the server) this is a long wait. Why not letting the system manager decide with a configuration parameter? Why not adding a button in the UI (with all the flashing warnings that are necessary) to allow a project manager to hard delete the previously soft deleted submissions?
Last but not least: deleting an entire submission with heavy attachment to reduce the DB size seems a bit drastic. Just deleting the attachment(s) (that in the meantime could have been download by other means, like the UI or the API) would be much better. I was expecting the API method “Clearing a Submission Attachment” to do just that, but I tried it and even after a manual VACUUM FULL the database size does not change, so I’m miss what that is really for.
In an old thread I found the reference to what apparently is/was a real “delete submission attachment” API method, but can’t see it anywhere in the docs, so maybe I misinterpreted the post or maybe ir was removed(?). Either way this would also help.
Regards