Updating entities with csv upload

dast · August 28, 2024, 10:26am

We would like to update a part or all entities of an entites list by uploading a csv to central containing the updated data.

More specifically the original entity list was downloaded from central and especially the geometry data modified in QGIS. The aim now would be to update the existing entities on central by uploading the csv exported from QGIS containing the modified geometries and all other (unchanged) fields as you would do to add new entities to a entity list.

Reading the documentation it is not quite clear to me wether by uploading a csv into Entity List it is only possible to add new entities or/and wether existing entities can get updated.

Another issue related to the same procedure is that in the meantime more entities were added in Central to the entites list. The idea would be that the csv containing the updated geometry data only updates the corresponding entities of the entities list on central while leaving the entities added later untouched.

I also remarked, that the downloaded template from the entity list does not contain the __id field, which would be needed to uniquely identify an entity to update it the way we intend.

Updating each unit individually would be very tedious and error-prone.

Thanks for guidance.

Sadiq_Khoja · August 28, 2024, 9:05pm

Hi @dast

Currently, it is not possible to use CSV to update entities, but it is in our plan to implement this feature. The risk associated with such a feature is quite high, especially the unintended overriding of data, so we aim to design a careful and safe UI/UX.

For now, you can create a script to read rows from the CSV file and call Entity Update API to update the Entities. If you prefer python then you can use pyODK to do the same:

Sample script:

Warning: This is an irreversible action, so make sure you are totally certain that the data in your CSV is correct.

from pyodk.client import Client
client = Client()

import csv

# Replace with the name of your Entity List 
entity_list_name='trees'

# Replace the value with your Project ID
project_id = 513

client = Client()

# Change the path here:
with open('/Users/johndoe/Downloads/trees.csv', mode='r') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        # Remove the system columns
        row.pop('__createdAt')
        row.pop('__creatorId')
        row.pop('__creatorName')
        row.pop('__updates')
        row.pop('__updatedAt')

        # If the Entities have been updated since CSV file was downloaded 
        # then API call will fail, should do the exception handling 
        base_version = row.pop('__version')

        client.entities.update(
            row.pop('__id'),
            entity_list_name,
            project_id,
            row.pop('label'),
            row,
            base_version=base_version
        )

dast · August 29, 2024, 7:20am

Thanks @Sadiq_Khoja for your reply and for sharing the script! I already thought the solution would go in this direction. As I am not yet familiar with PyODK nor using the API, it will take some time to test and implement this workflow.

Just one further question: Does this script:

override all versions of the dataset or
override just the latest version of the dataset or
add a new version of the dataset while preserving the previous versions?

Thank and best regards, Daniel

Sadiq_Khoja · August 29, 2024, 10:14pm

The script (and underlying API) will create a new version of the Entities that are in the CSV file and all the previous version will remain intact. On the Central, you will be able to see all the versions and the difference between them.

Lindsay_Stevens_Au · August 30, 2024, 8:15am

Not sure when it will be released but for future reference, today we've merged this PR for a pyodk feature to help managing insert/update/delete for entities: client.entities.merge. Example script, method details. Allows specifying match keys, keys to add a properties, etc. to help with different scenarios depending on the source of the data.

dast · September 3, 2024, 2:56pm

I have tested the script with pyODK in Jupyter after having gone through the great introductory video by @LN and performing some exercises. But I encoutered some problems with the "system" columns and the "conflict" column. The csv (created with OData in Excel) apparently also added a BOM to the __id column ('\ufeff__id'). The script now should handle these issues.

I also added a feature to increment the version of the csv by 1. This insures the script updates if no new version was added in the meantime, without having to increment the version number manually in the csv.

Some debugging is also added. Thanks for the support!

import csv
from pyodk.client import Client

client = Client()

# Replace with the name of your Entity List 
entity_list_name = 'your_entity'

# Replace the value with your Project ID
project_id = 513

# Change the path here:
with open('/Users/johndoe/Desktop/your_entity.csv', mode='r', encoding='utf-8-sig') as file:
    csv_reader = csv.DictReader(file)
    
    for row in csv_reader:
        # Strip whitespace from the keys and remove BOM if present
        row = {key.strip(): value for key, value in row.items()}
        
        # Remove the conflict field if it exists
        row.pop('conflict', None)
        
        # Remove the system columns
        row.pop('createdAt', None)
        row.pop('creatorId', None)
        row.pop('creatorName', None)
        row.pop('updates', None)
        row.pop('updatedAt', None)

        # Increment the version by 1
        try:
            base_version = int(row.pop('version')) + 1
        except ValueError:
            print(f"Invalid version number in the row: {row}")
            continue

        try:
            client.entities.update(
                row.pop('__id'),
                entity_list_name,
                project_id,
                row.pop('label', None),
                row,
                base_version=base_version
            )
        except KeyError as e:
            print(f"KeyError encountered: {e}")
            continue
        except Exception as e:
            print(f"An error occurred: {e}")
            continue