I am sharing this in case it is useful to anyone who wants to bulk delete submissions from ODK Central leveraging the API and pyODK with the appropriate permissions to access your ODK Central instance. I'm not pretending it is well constructed, so please feel free to pimp my script!
And maybe share examples of other criteria you have used to identify submissions for deletion.
NOTE: You need to be running ODK Central v2024.2 or later for this functionality to work. And I'm assuming you have pyODK set up - I use Jupyter Lab
Summary
The script loads the submissions from a Project / Form into a dataframe and then uses criteria to filter those submissions for deletion. In this case I've chosen a cut-off date (so anything older than a particular date is deleted), but you could delete rejected
submissions using the same principle.
The script iterates through the dataframe and if it finds and deletes a submission (i.e. Response = 200) it changes the field __system.status
to deleted and exports the dataframe to csv as an audit log.
This will also delete the attachments associated with the submission (so it opens the door to reducing the database footprint)
#delete submissions - equivalent of API call:
#DELETE /v1/projects/{projectId}/forms/{xmlFormId}/submissions/{instanceId}
import pandas as pd
from pyodk.client import Client
#Delete submissions from the specified projectID and xmlFormID, that are older than the specified cutoffDate
#change these variables to suit your Central Server
projectId = [number]
xmlFormId = 'YourFormID'
cutoffDate = '2024-01-01T00:00:00Z'
output = 'local/path/to/audit/folder/' #output folder for recording deleted submissions
client = Client()
submissions = client.submissions.get_table(project_id=projectId, form_id=xmlFormId)
df = pd.json_normalize(data=submissions['value'], sep='.')
#loop through submissions,
for index,row in df.iterrows():
submitted = df.loc[index, '__system.submissionDate']
instanceId = df.loc[index, '__id']
if submitted < cutoffDate: #check if submission date is before the specified cutofDate
response = client.delete('/projects/'+str(projectId)+'/forms/'+str(xmlFormId)+'/submissions/'+str(instanceId))
if response.status_code == 200: #update status to deleted in df to identify the submission
df.at[index, '__system.status'] = 'deleted'
print('Instance: '+str(instanceId)+' submitted on: '+submitted +' status: '+str(response.status_code))
else:
print('Instance: '+str(instanceId)+' submitted on: '+submitted +' not deleted')
#export submission list to csv
df.to_csv(output+xmlFormId+cutoffDate[:10]+'.csv', columns=['__id','__system.submissionDate','__system.status'])
For safety purposes I also allow myself the luxury of restoring deleted submissions given that they are soft-deleted. This uses your audit-log.csv Remember that this is time-limited (30 days by default) so be sure to check your work before opening any refreshments!
#restore soft deleted submissions (must be done within the purge period - 30 days by default)
df = pd.read_csv(output+xmlFormId+cutoffDate[:10]+'.csv')
for index,row in df.iterrows():
submitted = df.loc[index, '__system.submissionDate']
instanceId = df.loc[index, '__id']
status = df.loc[index, '__system.status']
if status == 'deleted':
response = client.post('/projects/'+str(projectId)+'/forms/'+str(xmlFormId)+'/submissions/'+str(instanceId)+'/restore')
df.at[index, '__system.status'] = 'restored'
print('Instance: '+str(instanceId)+' submitted on: '+submitted +' restored')
df.to_csv(output+xmlFormId+cutoffDate[:10]+'.csv', columns=['__id','__system.submissionDate','__system.status'])
Again this produces a csv audit log.
NOTE: this does not change __system.status
on the Server, just your local audit log file.
Hope this helps with the scary world of deleting submissions... Maybe it's just me
REMEMBER: once it's gone, it's gone. Measure twice, cut once...