pyODK merge - updates and conflicts

The force and base_version options are relevant in contexts where your pyodk script has a local representation of Entities that could potentially get outdated and that should not be used as the primary source of truth. This could happen if you download an Entity List at one point in time and then try to use it as the basis for updates at a significantly later point in time.

In more dynamic contexts, there might be updates coming in at all moments from different Collect users, for example. That means that even if you download an Entity, make a change to it and immediately upload it again, there's a risk that the server's version would have changed. This would represent a conflicting update. If that's possible, it's highly recommended to specify the base version when making an update. That way, if someone else makes an update to an Entity while your script is running, your update will be rejected and you can decide what to do.

merge is appropriate when you have some Entity data source outside of Central that's intended to be the source of truth or you're making an update that definitely won't overlap with other users'.

For example, maybe you have an external process for registering participants (or trails) and whatever workflow you have in ODK is done in phases with batches of participants/trails. You could use merge periodically to set up the new batch. When you upload a data source without any overlap with the existing Entities in the system, all existing Entities will be deleted and all new ones will be added.

Another example could be that you want to add a new property that comes from an external data source. If your data has a natural id (something like a participant id or a trail id), you could use that as the sole value in match_keys. You could also specify the new property as the sole value in source_keys so only the values for that property would be updated. In this case, you don't have to worry about conflicting updates because you're updating a property that wasn't set before.

Or maybe you really do want to update all properties from an external data source. That would be fine to do without worrying about base version as long as you can either guarantee that other work that would result in Entity updates is paused or that your external data source contains the most accurate/true data.

If you think that's likely and both processes can update the same properties, I would recommend using update with base_version specified for any updates you want to make. You can still use merge with update_matched set to False to delete Entities not in your external data source.

Yes, exactly, you can specify only the columns to update. Those not specified will be "forwarded" from the existing version on the server.

Let's say you have some CSV with a column header of TrailName. Maybe that's the column you want to use as your Entities' labels. You could do that by specifying TrailName as source_label_key. In that case, you may also want to explicitly specify source_keys to exclude TrailName since it's already being remapped to label.

If you want to use the trail name as the key to match between your local data and remote Entities, you would specify that as label because of the remapping.

Hopefully those explanations help, do let me know if you have further questions. I agree it would be nice to have something like a Jupyter notebook with some examples. If you end up producing a resource as you're experimenting with these concepts, please share!