Deleting many, (but not all) submissions from ODK Aggregate database

What is the problem? Please be detailed.
I want to remove (several hundred) old submissions from the ODK aggregate form that they are stored on, because the only way to view the newest submissions is to plough through pages of these historic records, during which process it frequently crashes (because we have too many records on there, because deleting is SO tedious) and then we have to start all over again making our way to the present day through pages of historic records.
So. Making myself a promise to be more pro-active about database management in future, I spent the morning deleting submissions. I did this one at a time, using the two-step process (hit the red X, confirm that I want to delete) which is the only way I know how to delete submissions. As I was nearing the target record from which I wanted to leave the submissions alone, it crashed (i.e. went blank, and threw me back onto our original database which seems to act as a landing page). When I went back into the form I had been working in, ALL the records were still there!

What ODK tool and version are you using? And on what device and operating system version?
I'm using ODK aggregate on my laptop, I don't know if it has a version or release number.

So: What is the best method to delete (many) selected submissions from a form, (because I need to tidy up five more forms) and
Why did it restore the records I had so painstakingly deleted?
Just to add that we have almost NEVER been able to export these databases successfully (it just doesn't seem to work).

1 Like

I have had the same persistent problem for the past 5 years. There is an option on one of the menus that says it will delete all the records up to a specific date - but I have tried it many times and have had no indication that it has ever worked. It is a feedback-less option. However, I would note that once it was selected, there was zero evidence that any submissions had been removed, so there is no reason to believe that it actually worked.

Manual deletion, one record at a time does work. But it is painstakingly slow and prone to the refresh problem you have described. The way we do it that seems to work okay, is to delete a couple at a time. This way, one can constantly monitor what has actually worked and what has not. It also tends to lessen the crashes. But, oh so tedious.

What we do is try to schedule a manual deletion session about once a week. In our case, it means that the the build-up is kind-of manageable and also that the time-outs/crashes are also manageable. Also, the smaller number of records that need to be deleted at a time, means that the time spent deleting them in any one session is manageable.

I have submitted a feature request that would allow the cleaning up of Aggregate using a command line option on Briefcase, but this was considered to not be a good idea. I tend to think it would still be a great idea given the perpetual difficulties the alternative brings. If I had the time and means, I would volunteer to fix it myself - it really is the main operational bug we have found with using Aggregate over the years.

Thanks Mark for this response. It at least lets me know that the time spent this week would have to have been spent regardless: either all in one, long, mind-numbing session, or in a series of more manageable ones! Guess what my New Year's resolution will be...and how effective it will be...

I think the reason Aggregate's delete doesn't work or seem to work is that App Engine kills the task. @ggalmazor any improvements that we can make here?

Just an option to disable the "Are you sure" step would reduce the risk of carpal tunnel syndrome by half...
In all seriousness, it's such an amazing system for collecting and storing data that it really needs a similarly slick way to selectively manage/delete that data. I have between 6 and hundreds of records arriving every day. I'm sure this does not make me in any way unique.
There must be a better way to manage this than deleting record by record.

1 Like

@ggalmazor @yanokwa
Or a check box based delete mechanism - so that multiple records could be selected, then deleted at once.

But, the best option would be to be able to do it via Briefcase. Once the record has successfully been pulled into the next system via Briefcase for downstream processing is exactly the point at which we want to remove it from the database (or if we are being more careful - make a record of which ones we want to delete, and the delete later).

1 Like

First, let me say it's super important, valuable, and useful having users like you expressing their needs. It helps a lot to move the tools forward.

Regarding @Ljforster1's matter at hand, it looks like you're running Aggregate with MySQL or PostgreSQL which would make it possible to remove data efficiently with some SQL wrangling. I could provide some guidance there if you're interested.

Regarding @Mark_Schormann1's comments, Aggregate is limited by what AppEngine & DataStore will let us do: we can't have long processes and normal database interactions (like easily deleting stuff). I know it's frustrating for you and other users, and it's frustrating for us as well each time we want to help you and we can't.

Unfortunately, we can't provide a new feature that would let users delete submissions in bulk while Aggregate supports AppEngine.

AppEngine has let hundreds of users easily deploy Aggregate at virtually zero cost for years, but we are also aware there is another segment of users that are running big/long campaigns and will eventually run into these kinds of data maintenance problems, and we're looking for solutions for them as well.

Looking forward, I think we have a couple of good things coming:

  • First, there's ODK Central, which will eventually replace Aggregate and won't have any of its current limitations.
  • Second, we will be eventually dropping support for AppEngine from Aggregate, probably in an upcoming v2.0 version, but I can't be specific on this topic just yet. We've just started discussing this in the @TSC.
    • If that happens, we could totally provide a new feature to delete submissions base on filters and/or checkboxes.
1 Like

Thanks for the response @ggalmazor . The other possible mechanism that would be useful and would seem to be do-able (I suspect) with the current AppEngine technology and codebase would be to have a switch on Briefcase whereby a deletion of a specific submission is allowed. The switch would take a UUID and delete the submission that corresponds to that UUID. This mechanism would be

  1. Relatively safe, in that the exaxt UUID would need to be known in order to make the deletion
  2. Affecting only a single submission at a time, so in effect operating at the same level as a "pull"
  3. Work within the aforementioned time/resource constraints as it only affects a single record at a time.
  4. Sufficient - in that it doesn't matter how long the delete procedure takes - as long as it doesn't take hours of clicking....

Anyway - perhaps there is some way this could be done....

1 Like

@Mark_Schormann1 Are you proposing to invest resources in taking this on?

@yanokwa - if I could I would. But I don't seem to be able to get to a position where I can. Sadly.