Entity uuid - cannot be duplicated on the same Central server

1. What is the issue? Please be detailed.
When uploading Entities via pyODK it is not possible to upload another copy of a CSV if it includes a pre-defined __id / uuid column for the entity. Even in a separate Project. Or after the entities have been deleted.

I suppose this is logical and SHOULD be expected, I just didn't anticipate it being 'site-wide' or include entities in Trash.

In my case I am generating the CSV from QuODK and it includes a UUID so that it is linked to the feature within QGIS...

2. What steps can we take to reproduce this issue?
Create a CSV with a column __id (or any other name an refer to it as the uuid). Upload it to Central.
Use the script below (adapted from the pyODK docs). The entity list is created and populated correctly. The linked form can update and create entities within this list.

from pyodk.client import Client
from csv import DictReader
from pathlib import Path

projectId = 3 
listname="mylist"
csv_path = "path/to/my.csv"
entity_label_field = "label"
entity_properties = ("geometry", "Type", "Notes", "length", "stroke")
eid = "__id" # This is the uuid field for the entity

with Client(project_id=projectId) as client, open(csv_path) as csv_file:
    entity_list = client.entity_lists.create(entity_list_name=listname)
    for prop in entity_properties:
        client.entity_lists.add_property(name=prop, entity_list_name=listname)
    for row in DictReader(csv_file):
        client.entities.create(
            label=row[entity_label_field],
            uuid = row[eid],
            data={k: str(v) for k, v in row.items() if k in entity_properties},
            entity_list_name=listname,
            )

Then, go to a different project (in my case projectId = 6) and repeat... The entity list is created but the upload of entities fails (this is the second error I encountered - I tried deleting the entities, but they are sitting in the Trash, so are still technically on the server):

PyODKError: ('The request to https://mycentralserver/v1/projects/6/datasets/mylist/entities failed. Status: 409, content: {"message":"The following UUID(s) cannot be used because they are associated with deleted Entities: (baabe091-cd2f-4469-a3ed-797ddf24bf45).","code":409.19,"details":{"entityUuids":["baabe091-cd2f-4469-a3ed-797ddf24bf45"]}}', <Response [409]>)

3. What have you tried to fix the issue?
As above, deleted the entities in the other entity list.

I haven't waited 30 days for the trash purge (impatient, I know, sorry!) and have not manually purged.

I can work around this by omitting / commenting out uuid = row[eid] (I confirm that this works) - but this breaks the link to QGIS as the entities have a new uuid that has been generated by Central.

There may not be a 'solution' except manual purging of Trash, but I thought it might be helpful to document my heuristics (mostly Error, and plenty of Trial) - in case it saves blushes for anyone else.

I will update the script to include an option to remove the pre-defined uuid.

We made the decision early on to have the entity UUIDs be globally unique across a server, partly with the idea that we could always relax this constraint if needed, but not go the other way. For example, if we started with UUIDs that were unique-per-project (or even per entity list) it would be hard to update things to make them globally unique.

However, in practice, it seems like there are many awkward issues arising from this constraint. You can't move entities from one project to another. You can reuse a UUID if the entity is deleted and purged, but that takes time.

Maybe it's time we consider making the entity UUIDs just unique across a project.

Thanks for clarification - I figured it was a deliberate, and wise, choice.

I can see advantages and disadvantages but don't have a big enough brain to be able to conclude which scenario would be better. I think if it is well documented as a 'feature' we can find ways to work with globally unique UUIDs consciously - we can map relationships either internally or externally to Central if it becomes a need to share (beyond the testing phase, for example). I didn't find anything prior to this to warn me, so I just wanted to share :slight_smile:

This is the one i hit the other month, the desire being not having to create new and then add the new parent id to the new child entities.

I also didn't expect the behaviour and had to ask to confirm it was intended

I believe it's preferable for UUIDs to be globally unique across the entire server, rather than just within individual projects. The main reason is that in certain scenarios - such as when appending or aggregating data across multiple projects (forms) or entity lists - it's critical that UUIDs remain unique without requiring manual intervention.

Could you please confirm if the same uniqueness applies within forms across different projects as well? Specifically, if I were to append data for the same form coming from two different projects, can I rely on the UUIDs being globally unique? From what I've observed in the structure of the ODK Postgres container, it seems that UUIDs are indeed unique across projects at the server level - but I just want to be sure.

This is definitely true -- if you want to aggregate data across multiple projects, globally unique UUIDs are important. I guess the problem is that manual intervention is currently required for some other reasonable-seeming things that people want to do, like what's discussed in this post.

Which UUIDs are you referring to here?

1 Like

Hi! I'm sorry for the delayed response. What I meant was: Submission UUIDs across multiple projects / forms:

Thanks..!

Ah, thanks for the clarification!

These submission instance IDs are only constrained to be unique within a form. You could theoretically have the submissions with the same instance ID uploaded to different forms in the same project. If you delete a submission and it's in the trash, you can't re-use that ID in that form until the submission (or form) is purged from the trash.

In practice, these submission instance IDs (and the entity IDs) generated by Collect/another client are v4 UUIDs and unlikely to collide.

1 Like