ODK-Central_how to delete a submission

Hi, dear all.
Please, can somebody tell me how to delete a submission for a given project?
In advance, thanks for your help.
Best,

We don't yet offer a UI to delete submissions (but it is doable at the DB level).

Learning more about the reason you want to delete will help us shape the feature. For example, is it because you have training data that needs to be cleared out? Did someone make a mistake in entry that needs correction?

Yaw

@yanokwa,
thanks for your reply..
The reason is that we have training data to be cleared out.
Note:
please, what do you mean by " DB" in "DB level"?
Thanks, again.
Amal

Sorry for using technical jargon! DB means database. If you have direct access to the database, you can delete the submissions.

For training, you can use two forms, but one technique I like is adding a question at the start that asks if the form is for training or for real. This approach is nice because then you are using the exact same form so there is no confusion. It's also nice because your data collectors can practice even if there is a real campaign going on.

who can have direct access to the DB?
how to access directly the DB?
I can do it without creating damage.

Thanks for this solution that I find longer than deleting at the DB level.
Amal

Can you please give a little more explanation?

@Amal It's generally not a good idea to edit ODK's underlying database, and I'm regretting even motioning it because it can be tricky to do correctly!

@Raj_Pravat Add a select_one question at the beginning of your form that asks "Is this data collection effort for practice or for real?". Then when you download your CSV at the end of your campaign, you can filter by that column.

1 Like

Hi @yanokwa,
Thanks a lot. I am so slow sometimes, excellent trick.
Regards, Raj

@yanokwa,
thanks so much.
Best,

Hi All,

I'd like to restart this discussion on deleting submissions tied to a form. I would be happy to have this exposed as just an API call if needed, or for someone to help me figure out the SQL code to run to delete the submissions associated with a form.

The scenarios above don't work well at our scale. Here is why:

  1. We are managing well over 6,000 devices, totally offline. The devices are set up centrally, deployed for the training, testing, and data collection away from any network access. Many of our data collectors are not experienced android users and get confused when we have multiple forms to choose from. To avoid getting training data in the real dataset or vice-versa, we prefer to only have a single form.
  2. We have collected over 9 million records so far. An additional question, in addition to the risks of data entry errors above, takes time and consumes phone battery. Adding a "training or not" question is silly if you know that all your data past a certain date will be production data.

Why do we want to delete records?

We have dashboards that pull directly from ODK Central to show progress of our data collection campaigns. Instead of having a caveat of having to delete data in each connected service, it would be nice to just remove the submissions from Central.

Again, happy to have someone teach me the SQL and I can do it myself, but I think this need might be felt by other orgs operating at our scale.

Thanks!

Since you can QA records in Central, you could mark test or training submissions as rejected (at your scale probably best in bulk, through the API) and exclude these from your dashboards. What do you think of that approach?

Thanks for your reply Florian.

That might work. Is there a way to do so in a bulk way? Specifically, mark all of the records in a form as "rejected"?

You could go via the API:

  • Download all submissions
  • Based on your data, identify which submissions to "soft delete" (how and which depends on your use case, so that's hard to generalise)
  • For each submission ID of the soft delete list, send an API call to mark as rejected. I'd have to dig through the API docs, but this might be https://odkcentral.docs.apiary.io/#reference/submissions/submissions/updating-submission-data - a surefire way would be to reject a submission through the ODK Central GUI and nto inspect the requests being sent.
1 Like

Thanks Florian. I have ~2K submissions to reject, and I worry about being rate-limited with time. I'll give it a shot though.

I would advise a way to either (1) bulk reject the form submissions or (2) allow for bulk deletion of submissions. I think that (2) probably will satisfy more use cases than (1) in the long run.

@jniles There is no rate-limiting on the API, so it should go by quickly. I'm not a Python expert, but here's a quick script that rejects submissions before today.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from datetime import datetime, date
from time import strftime
import json
import pytz
import random
import requests
import sys

server_url = "https://example.getodk.cloud"
admin = "admin@example.com"
password = "horsestaplebatterygenerator"
form_url = "/v1/projects/1/forms/my_form"

review_states = ["approved", "hasIssues", "rejected"]


def get_admin_token():

    admin_token_response = requests.post(
        server_url + "/v1/sessions",
        data=json.dumps({"email": admin, "password": password}),
        headers={"Content-Type": "application/json"},
    )

    if admin_token_response.status_code == 200:
        return admin_token_response.json()["token"]


def get_submissions(admin_token):

    submissions_response = requests.get(
        server_url + form_url + "/submissions/",
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer " + admin_token,
        },
    )

    return submissions_response


def review_submission(admin_token, instance_id):

    review_submission_response = requests.patch(
        server_url + form_url + "/submissions/" + str(instance_id),
        data=json.dumps({"reviewState": review_states[2]}),
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer " + admin_token,
        },
    )

    return review_submission_response


def error(error):
    print(strftime("%Y-%m-%d-%H-%M-%S"), error)
    sys.exit()


# don't do this. cache the token!
admin_token = get_admin_token()
if not admin_token:
    error("get_admin_token")

submissions = get_submissions(admin_token)
for submission in submissions.json():
    submission_date = datetime.strptime(
        submission["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z"
    )
    if submission_date < datetime.now(pytz.utc):
        review = review_submission(admin_token, submission["instanceId"])
        if review.status_code != 200:
            print(review.text)
2 Likes

Thanks Yaw!

No worries, I've been working on NodeJS and figured it out already. I really appreciate the effort you put in helping me.

If you would consider this feature for a future roadmap, that would be welcome. At the moment, I just have a "clean" button that downloads all submissions, loops through them, and fires off rejection queries, then cleans out the local cache. This works and I appreciate the advice to do so.

2 Likes