ODK Central - How to delete a submission

Am I right to say that at this time (August 2024) there is still no way to delete submissions from the Central GUI or odk API? If the answer is "yes" then this is really surprising, I can think at least 3/4 good reasons for that, one of them has been pointed in this thread: to programmatically allow to recoup space on the server where Central runs. A project/form with a few tens of app users sending over data with heavy media (photos/videos) will quickly drain disk space especially on cloud servers. And why I should want to keep them? (Hopefully) the API allows to download submissions + media filtering by date, so it is very reasonable to throw away submissions older than a specific point in time.

At very least I would expect an API call/method to delete/remove media files from submissions...

Anyway... as I guess that there is still no way to delete submissions as one would expect, what would be the correct procedure to delete them from within the DB? I can't really afford to mangle the DB/form/central installation and the last time I checked the data model used by Central was all but obvious.

Thanks in advance.

1 Like

Hi @manghig,

The API now supports submission deletion: https://docs.getodk.org/central-api-submission-management/#deleting-a-submission

https://docs.getodk.org/central-install-digital-ocean/#using-s3-compatible-storage might solve your storage space constraints.

As for partial downloads, the API definitely supports pagination as well as filtering by e.g. date.
Both pyODK (https://getodk.github.io/pyodk/submissions/#pyodk._endpoints.submissions.SubmissionService.get_table) and
ruODK support pagination via top/skip and OData filters where you can e.g. filter for submission date, see examples.

Downloading data via ruODK::odata_submission_get() doesn't take all that long as it skips already downloaded attachments.

2 Likes

Great news, thanks!

Yes of course, but that is not an option for many.

Yes I am aware that the API can do that, my whole rant (and I'm sorry for that) in other posts too, is that with briefcase it was also possible without needing to write "complex" scripts: just parameters on a easy to use command line tool (I'm not even taking into account the GUI that was a blessing for many not tech people).

Only rust?

regards

Hi @manghig, the ability to delete Submissions from the UI in Central is coming soon in the next release! We know this is a highly requested feature and we don't want to limit it to just API users

2 Likes

That is great news, thanks!
Would be possible to select >1 submissions to be deleted?

In general (API or GUI), deleting submissions will recoup disk space when they contain media files? I remember something about PostgreSQL and some extra step to do when deleting records with BLOBs and disk space...

MAYBE this answers the question - assuming that the delete submission function works in the same way...

So it might not be immediate recuperation of space, but once the 'trash' is purged I think that it will be recovered...

Any idea of how the SQL command looks like? Also some help on how to run it with CRON given that the database is inside docker...

Sorry, way beyond my competence - that's under the bonnet / here be dragons kind of stuff. All I know is that there is a CRON that does the necessary compaction of the database when the media 'expires' in the trash (what I'm referring to as a purge) - my guess is that this now takes place as part of deleting the submission (i.e. the submission and its media go to trash). An earlier post in this thread indicated a way of shortening the 'wait time' to purge, but that has a 'with caution' caveat on it...

That was all I needed to know - that it happens, not how to do it manually... I self host, so better for me to push the envelope than throw it on the fire.

So are you saying that it is all automatic? No need to set up that cron job manually?

That's my understanding - have a look at the thread that I linked to from @ktuite - it says "a daily cron job for purging deleted forms ... will do its work." and being a little 'retentive' I asked the very same question:

Not sure if this will help, but here is a basic pyODK script that works for deleting specific attachments, so should be possible to adapt very slightly for the new delete submissions [remove +'/attachments/'+str(filename) from the end]... You will need to set up the credentials for pyODK - which are in the DOCS.

This would give you the opportunity to loop through a series of submissions and/or attachments in standard python methods - you may be far more skilled with pyODK, so please don't be offended by me offering this:

from pyodk.client import Client
client = Client()
projectId = [a number]
xmlFormId = 'your_form'
instanceId = 'uuid:...'
filename = 'e.g. epoch_time.jpg'
response = client.delete('/projects/'+str(projectId)+'/forms/'+str(xmlFormId)+'/submissions/'+str(instanceId)+'/attachments/'+str(filename))
print(str(instanceId)+' / '+str(filename)+' status: '+str(response.status_code))

(the projectId and xmlformId can be found in Central under the link 'analyse with OData' in the form submissions page)

Also, I don't know how to retrieve attachments that are in Trash - so some caution is required...

Good luck with the deletions and disk space recovery.

It should not so so hard to require any luck... It is really a basic feature... and there almost no point to delete submissions with media files if disk spaced is not recovered in an automatic way...

The 'luck' is in getting everything right, not how hard it is - if, like me, you have thousands of images to manage and some to delete, you might wish for a little luck that you don't have to come to the forum asking for help to undelete / recover something you did by mistake. And that you'd be lucky enough for someone to answer...

Whilst I might agree that being able to delete submissions should be a core function, I do understand why the ODK team have been cautious in letting something out into the wild that could have devastating consequences to data integrity, possibly in a mission-critical situation (by which I mean life or death, not 'am I going to get into trouble for deleting a spurious image'). Nothing basic about that.

And I would encourage you to read the posts that I linked to before repeating the complaint about automatic disc space recovery - you are turning it into a conspiracy. It does happen automatically and it was confirmed unequivocally by a member of the ODK team. That's why I linked to it.

I understand your frustration with this, but please remember that this is open source software - my time is not free, but I might choose to give you it. I could not operate my business as effectively without ODK but I don't pay cash to use it. The least we can do is to contribute something and support its continued development...

2 Likes

As @ktuite noted, when you delete a form or submission in Central, all of its resources and attachments are soft-deleted (put in trash) for 30 days. During those 30 days, you can restore that data. After the 30 days, the data is purged (trash is emptied) and marked in the database as "dead".

PostgreSQL, the database used by Central, automatically vacuums up "dead" data and makes it available for re-use. When the autovacuum happens and whether or not it returns space to the operating system (as opposed to only the database) is decided by Postgres based on how it's configured and how you are using it.

Postgres doesn't always return disk space to the operating system because doing so would be an expensive operation and so there is an optimization tradeoff that it makes. True recovery (aka full vacuum or shrinking) by Postgres typically requires blocking reads/writes and twice the size of the database because it's essentially a backup and restore. This is one of the reasons why most cloud databases (e.g., RDS) can't shrink a database once it grows.

If you want to force vacuuming, you can do so:

  1. Run docker exec -it central-postgres14-1 psql -U odk -W odk
  2. Enter the default password: odk
  3. Run vacuum
    • Reclaims space and makes it available for re-use by the database. Does not always return space to the operating system.
  4. Run vacuum full
    • Reclaims more space, but takes much longer, locks the database so you can't use it, and requires extra disk space. Returns space to the operating system.

So in summary, if you delete data in Central, it eventually frees up space for re-use inside the database, but it may not free up space for re-use by the operating system.

2 Likes