ODK-Central_how to delete a submission

Hi, dear all.
Please, can somebody tell me how to delete a submission for a given project?
In advance, thanks for your help.
Best,

We don't yet offer a UI to delete submissions (but it is doable at the DB level).

Learning more about the reason you want to delete will help us shape the feature. For example, is it because you have training data that needs to be cleared out? Did someone make a mistake in entry that needs correction?

Yaw

@yanokwa,
thanks for your reply..
The reason is that we have training data to be cleared out.
Note:
please, what do you mean by " DB" in "DB level"?
Thanks, again.
Amal

Sorry for using technical jargon! DB means database. If you have direct access to the database, you can delete the submissions.

For training, you can use two forms, but one technique I like is adding a question at the start that asks if the form is for training or for real. This approach is nice because then you are using the exact same form so there is no confusion. It's also nice because your data collectors can practice even if there is a real campaign going on.

who can have direct access to the DB?
how to access directly the DB?
I can do it without creating damage.

Thanks for this solution that I find longer than deleting at the DB level.
Amal

Can you please give a little more explanation?

@Amal It's generally not a good idea to edit ODK's underlying database, and I'm regretting even motioning it because it can be tricky to do correctly!

@Raj_Pravat Add a select_one question at the beginning of your form that asks "Is this data collection effort for practice or for real?". Then when you download your CSV at the end of your campaign, you can filter by that column.

1 Like

Hi @yanokwa,
Thanks a lot. I am so slow sometimes, excellent trick.
Regards, Raj

@yanokwa,
thanks so much.
Best,

Hi All,

I'd like to restart this discussion on deleting submissions tied to a form. I would be happy to have this exposed as just an API call if needed, or for someone to help me figure out the SQL code to run to delete the submissions associated with a form.

The scenarios above don't work well at our scale. Here is why:

  1. We are managing well over 6,000 devices, totally offline. The devices are set up centrally, deployed for the training, testing, and data collection away from any network access. Many of our data collectors are not experienced android users and get confused when we have multiple forms to choose from. To avoid getting training data in the real dataset or vice-versa, we prefer to only have a single form.
  2. We have collected over 9 million records so far. An additional question, in addition to the risks of data entry errors above, takes time and consumes phone battery. Adding a "training or not" question is silly if you know that all your data past a certain date will be production data.

Why do we want to delete records?

We have dashboards that pull directly from ODK Central to show progress of our data collection campaigns. Instead of having a caveat of having to delete data in each connected service, it would be nice to just remove the submissions from Central.

Again, happy to have someone teach me the SQL and I can do it myself, but I think this need might be felt by other orgs operating at our scale.

Thanks!

Since you can QA records in Central, you could mark test or training submissions as rejected (at your scale probably best in bulk, through the API) and exclude these from your dashboards. What do you think of that approach?

Thanks for your reply Florian.

That might work. Is there a way to do so in a bulk way? Specifically, mark all of the records in a form as "rejected"?

You could go via the API:

  • Download all submissions
  • Based on your data, identify which submissions to "soft delete" (how and which depends on your use case, so that's hard to generalise)
  • For each submission ID of the soft delete list, send an API call to mark as rejected. I'd have to dig through the API docs, but this might be https://odkcentral.docs.apiary.io/#reference/submissions/submissions/updating-submission-data - a surefire way would be to reject a submission through the ODK Central GUI and nto inspect the requests being sent.
1 Like

Thanks Florian. I have ~2K submissions to reject, and I worry about being rate-limited with time. I'll give it a shot though.

I would advise a way to either (1) bulk reject the form submissions or (2) allow for bulk deletion of submissions. I think that (2) probably will satisfy more use cases than (1) in the long run.

@jniles There is no rate-limiting on the API, so it should go by quickly. I'm not a Python expert, but here's a quick script that rejects submissions before today.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from datetime import datetime, date
from time import strftime
import json
import pytz
import random
import requests
import sys

server_url = "https://example.getodk.cloud"
admin = "admin@example.com"
password = "horsestaplebatterygenerator"
form_url = "/v1/projects/1/forms/my_form"

review_states = ["approved", "hasIssues", "rejected"]


def get_admin_token():

    admin_token_response = requests.post(
        server_url + "/v1/sessions",
        data=json.dumps({"email": admin, "password": password}),
        headers={"Content-Type": "application/json"},
    )

    if admin_token_response.status_code == 200:
        return admin_token_response.json()["token"]


def get_submissions(admin_token):

    submissions_response = requests.get(
        server_url + form_url + "/submissions/",
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer " + admin_token,
        },
    )

    return submissions_response


def review_submission(admin_token, instance_id):

    review_submission_response = requests.patch(
        server_url + form_url + "/submissions/" + str(instance_id),
        data=json.dumps({"reviewState": review_states[2]}),
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer " + admin_token,
        },
    )

    return review_submission_response


def error(error):
    print(strftime("%Y-%m-%d-%H-%M-%S"), error)
    sys.exit()


# don't do this. cache the token!
admin_token = get_admin_token()
if not admin_token:
    error("get_admin_token")

submissions = get_submissions(admin_token)
for submission in submissions.json():
    submission_date = datetime.strptime(
        submission["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z"
    )
    if submission_date < datetime.now(pytz.utc):
        review = review_submission(admin_token, submission["instanceId"])
        if review.status_code != 200:
            print(review.text)
2 Likes

Thanks Yaw!

No worries, I've been working on NodeJS and figured it out already. I really appreciate the effort you put in helping me.

If you would consider this feature for a future roadmap, that would be welcome. At the moment, I just have a "clean" button that downloads all submissions, loops through them, and fires off rejection queries, then cleans out the local cache. This works and I appreciate the advice to do so.

2 Likes

This is our case of use, in need for permanent deletion of a submission. We want to free up the ODK database as much and as soon as possible because the service runs in a VPS with low storage. We cant use ODK DB as storage for all the field captured data in the duration of the project, we can use it as a temporal Point of entry of the data to then pull it to our own data models and servers. Is there still no way to delete a submission? can the waiting time of the trash (recycle bin) be reduced bellow 30 days?

Welcome @JTrejos! Please take a moment to introduce yourself so that we can learn a bit more about your projects.

At the moment, there is still no way to delete a submission from Central, but it is on our roadmap. We understand there are issues with the database storage size, especially when using Central as part of a larger data pipeline.

If you have a good technical understanding of how Central is set up (which it sounds like you might), there is a way to lower the waiting period by editing a config file in the backend server. It is not officially documented and it is not configurable via the main .env configuration (used in 2023.2 and later), but is something we use in testing. Please use responsibly/with caution and know it could make upgrading awkward or change in the future.

Alternatively, without changing any config files, you can run the form purging command directly with a special flag. See the flag options here:
docker compose exec service node lib/bin/purge-forms.js --help

1 Like

I'm trying to avoid any procedure in the database or operating system. I deleted the form and then uploaded the same form again, I received a warning that the form exists in trash can, and, if I move forward, the deleted version would no longer be available for restoring. I accepted and it was marked as deleted without the waiting period. Does this mean that the form and submission are imediately deleted without a waiting period?

Hi @JTrejos, The message about a deleted form not being available for restoring just means you can only have one form with a particular ID active/undeleted/outside of the trash at one time. If you deleted the new form, it would soft-delete it/move it to the trash and you would be able to undelete/restore the old form.

In other words, no, the data is still in the database and has not purged.

Although, looking at your screenshot and the deleted dates, I imagine the top form(s) has really been purged by now?

"deleted" = soft-deleted = in the trash = not showing up in form lists or accepting submissions, etc, but the data is still in the database and could be recovered
"purged" = all data associated with a form and its submissions deleted from the database

I hope I am making things more clear instead of more confusing!