Removing entities by batch

1. What is the issue? Please be detailed.
On ODK central, i can only delete entities one by one. I would actually like to delete many more all at once. Is it possible?

2. What steps can we take to reproduce this issue?
Simply upload way to many entities.

3. What have you tried to fix the issue?
I deleted entities one by one... that's too long for 75K

4. Upload any forms or screenshots you can share publicly below.

I don't have experience doing so, but this can be done programmatically via the Central API. There is an endpoint for deleting an entity: https://docs.getodk.org/central-api-entity-management/#deleting-an-entity. If you try using the API and have problems, someone on the forum should have insights.

1 Like

Agree this is more of an API thing. You could use pyodk with some basic knowledge of Python:

  • Return all entities in an entitity list.
  • Iterate and call the API to delete required entity IDs.

However you mention deleting 75k entities at once.

This would be really hard to achieve via UI: selection of 75k items to delete.

Instead, could you possibly just delete and re-create the entire entity list?

I think pyODK is a good way here (probably for all of your workflow) as it has some functions to make entity management more straight-forward:

You can do this by creating a csv (on your PC) with all the entities that you wish to delete (e.g. download the Entity List from Central). Then run a loop to provide pyODK with the uuid for each entity that needs to be deleted (this is stored in central as __id).

Take a look at the example for creating entities:

Then adapt it for the delete function...

OR

You could adapt the script from Bulk delete (and restore) submissions on ODK Central using pyODK to use the endpoint suggested by @danbjoseph - I think this is more complex (my script!)

I think at the moment there is no function to delete an Entity List, but I believe that it is in progress - this would be 'too easy' :slight_smile:

My advice would to become familiar with managing submissions and Entities through pyODK (or ruODK if you already work with R) - working at scale of 75k+ will give you a big headache otherwise. And if you go to data a collection phase without these tools, it will hurt. A friend told me...

If you are working with geometry based entities, check out QuODK as it might help (assuming that you are familiar with QGIS!).

Other tools are available, and there are still gaps in functionality for that specific 'recipe' I describe above. But it will get you closer.

I know that this is not a 'cut and paste' answer, but it is a fishing rod, rather than a fish... Good luck.

1 Like

Thank you for your quick answers, i appreciate.

I want to be able to manage this big list of entities over time, so i will rather use pyODK to manage the entities. Is there a way to filter the entities based on their proprieties - nothing is mentioned in the pyODK doc? So i need to maintain the all list also on my side to filter which ones to delete right?

Can you share a little bit more about the workflow? That context would help us make more helpful recommendations. Questions that come to my mind:

  1. When and why is a plot deleted? Would a plot need to be undeleted?
  2. You have an external system that has the plot data. Are you doing a one time import into Central and managing everything there or are flowing data in between ODK and some other system?

Yes, we have a large dataset of 75K plot mapped over the years. The project will carry on in mapping new plot but also updating the plots in case of change of ownership. For example an owner pass away and his 2 sons are inheriting part of the lands. We need to delete the father's plot entity and create 2 new entities.
In terms of implementation with ODK, we need a form to "deleting" the farther's plot. Then the field agent will create 2 entities through the usual new mapping form. Now once the mapping have been done and the 2 new entities created, the data engineer have to clean the geometry (remove self intersection, duplicated points, ...) this mean accessing the geometries fixing them with QGIS if necessary then updating the entities back in ODK with the fixed geometry.

Is that answering the question?

1 Like

Thanks for the additional context.

You could store everything in ODK and use that as the source of truth.

Instead of deleting, you could likely unlock more value by storing statuses and relationships. Here's a quick model.

plots:
  - id: 123
    status: inactive
  - id: 456
    status: active
  - id: 789
    status: needs_cleaning
plot_relationships:
  - parent_id: 123
    child_id: 456
  - parent_id: 123
    child_id: 789

Yes, you can have a form that deactivates the father, activates the children, and so on. Or use some Python script to do that.

You could use the status property to identify which plots need cleaning and have another script export just those Entities as GeoJSON for the data engineers to import and re-import.

1 Like

That's essentially what I designed QuODK to be able to do (version 1.2). The last (or maybe latest) piece in my puzzle is using the pyODK merge function to allow update of specific properties within the entity (e.g. just the geometry or an owner ID) - work in progress!

Not disagreeing with @yanokwa on the use of GeoJSON especially if you have a dedicated data engineer to write scripts and transfer data. However, if you are already using QGIS, QuODK will load submissions and entities to the Canvas as temporary layers to allow editing (of geometry and attributes)... Then you can export any features as a CSV (with a subset of attributes if required) to send back to Central using pyODK. Not quite automated, but hopefully just enough checkpoints in there to prevent inadvertent data loss.

1 Like