Bulk delete (and restore) submissions on ODK Central using pyODK

I am sharing this in case it is useful to anyone who wants to bulk delete submissions from ODK Central leveraging the API and pyODK with the appropriate permissions to access your ODK Central instance. I'm not pretending it is well constructed, so please feel free to pimp my script!

And maybe share examples of other criteria you have used to identify submissions for deletion.

NOTE: You need to be running ODK Central v2024.2 or later for this functionality to work. And I'm assuming you have pyODK set up - I use Jupyter Lab

Summary
The script loads the submissions from a Project / Form into a dataframe and then uses criteria to filter those submissions for deletion. In this case I've chosen a cut-off date (so anything older than a particular date is deleted), but you could delete rejected submissions using the same principle.

The script iterates through the dataframe and if it finds and deletes a submission (i.e. Response = 200) it changes the field __system.status to deleted and exports the dataframe to csv as an audit log.

This will also delete the attachments associated with the submission (so it opens the door to reducing the database footprint)

#delete submissions - equivalent of API call:
#DELETE /v1/projects/{projectId}/forms/{xmlFormId}/submissions/{instanceId}

import pandas as pd
from pyodk.client import Client

#Delete submissions from the specified projectID and xmlFormID, that are older than the specified cutoffDate
#change these variables to suit your Central Server
projectId = [number]
xmlFormId = 'YourFormID'
cutoffDate = '2024-01-01T00:00:00Z'
output = 'local/path/to/audit/folder/' #output folder for recording deleted submissions
client = Client()
submissions = client.submissions.get_table(project_id=projectId, form_id=xmlFormId)
df = pd.json_normalize(data=submissions['value'], sep='.')
#loop through submissions, 
for index,row in df.iterrows():
    submitted = df.loc[index, '__system.submissionDate']
    instanceId = df.loc[index, '__id']
    if submitted < cutoffDate:   #check if submission date is before the specified cutofDate
        response = client.delete('/projects/'+str(projectId)+'/forms/'+str(xmlFormId)+'/submissions/'+str(instanceId))
        if response.status_code == 200:    #update status to deleted in df to identify the submission
            df.at[index, '__system.status'] = 'deleted'
        print('Instance: '+str(instanceId)+' submitted on: '+submitted +' status: '+str(response.status_code))
    else:
        print('Instance: '+str(instanceId)+' submitted on: '+submitted +' not deleted')
#export submission list to csv
df.to_csv(output+xmlFormId+cutoffDate[:10]+'.csv', columns=['__id','__system.submissionDate','__system.status'])

For safety purposes I also allow myself the luxury of restoring deleted submissions given that they are soft-deleted. This uses your audit-log.csv Remember that this is time-limited (30 days by default) so be sure to check your work before opening any refreshments!

#restore soft deleted submissions (must be done within the purge period - 30 days by default)
df = pd.read_csv(output+xmlFormId+cutoffDate[:10]+'.csv')
for index,row in df.iterrows():
    submitted = df.loc[index, '__system.submissionDate']
    instanceId = df.loc[index, '__id']
    status = df.loc[index, '__system.status']
    if status == 'deleted':
        response = client.post('/projects/'+str(projectId)+'/forms/'+str(xmlFormId)+'/submissions/'+str(instanceId)+'/restore')
        df.at[index, '__system.status'] = 'restored'
        print('Instance: '+str(instanceId)+' submitted on: '+submitted +' restored')
df.to_csv(output+xmlFormId+cutoffDate[:10]+'.csv', columns=['__id','__system.submissionDate','__system.status'])

Again this produces a csv audit log.
NOTE: this does not change __system.status on the Server, just your local audit log file.

Hope this helps with the scary world of deleting submissions... Maybe it's just me :slight_smile:

REMEMBER: once it's gone, it's gone. Measure twice, cut once...

9 Likes

Is code deleting from the actual central server?

Hi @Kon_Li
Yes! The code sends the Delete command via the API to ODK Central and actually deletes submissions (sends them to trash). I have assumed that you have pyODK installed and have set up the account credentials.

I added the local audit log so that I can see which submissions have been affected, and restore any if I change my mind. Of course, I have a copy of the data before running the script - including any repeats and attachments!
After running the delete script, if you log into your ODK Central server and list submissions for the selected form, you will find that the chosen submissions have disappeared. If you run the restore script, they reappear!

Again, note that the soft-delete becomes hard-delete after a period (usually 30 days unless you've manually changed settings in ODK Central).

Yes i have the credentials already. Will try it then