I am using pyODK to download our data from Central Server. As our database will be growing, and I do not want to download the entire database on a daily basis, I am trying to create a sync-script that checks changes (new submissions, deletions and modifications). My idea is to check the database on an existing 'submission-id' and 'updatedAt'. By doing so, I can efficiently check if an id exists and/or was modified (by comparing the 'updatedAt'). However, for the deletions this might cause issues: As far as I know there is no variable in the API-json that says that a submission was deleted. And, API's can experience hickups causing that certain id's are not downloaded. So simply coding that id's that are not present can simply be deleted from the database might cause problems. Any idea's on how to solve this? Will there be a sync-json in the future that only shows the changes (modified, new- and deleted submissions) in the Central Server? Thanks for your ideas.
I am not familiar with pyODK, but check the audits log api. It will give you everything that happened on the server between start and end timestamps.
If you are only after submissions, set the action query parameter to submission and it will give you all the submission related log entries. For every submission log entry, you'll have an actee and details properties. From there extract the projectId, xmlFormId and submission guid. Then depending on the action (i.e. submission.create, submission.update, submission.delete, etc.) action accordingly.
Did not know ruODK but a quick read told me it's for R coding. I am not familiar with R. I think the solution of Punkch is most promising/low hanging fruit for me. But thanks for the help!
Any idea how I can add an 'action' to the GET (Audit log) request? For example, I want to see the Audit log on the deletion of submissions: "submission.delete". How do I add this action to the GET request?: audit_report = client.get('audits').json(). Thanks for the support!
The overview of deleted submissions works fine with the audits log api. However, I am now also trying to get an overview of modified submissions. I assumed to find this also in the audits log api with the action submission.update. However, I can't find the instanceId's that were modified in there (at least not the instanceId and the date of modification updatedAt). Am I mistaken or should I find it somewhere else (another api or with another action)? The only thing I need is the instanceId and the updatedAt so that I now which instanceId in my (Postgres) database needs to be overwritten (by the modified instanceId at Central Server). Thanks for the feedback!
There are a couple of different submission update events:
submisison.update is logged when a submission is approved/rejected/etc. and just changes the submission metadata without changing the underlying submission data
submission.update.version is logged when the data of the submission has been modified.
The details of events with action submission.update.version do contain the instanceId even though submission.update's details don't. If you're looking for which submissions have had their data actually change, this is probably the event you want anyway.
You mentioned monitoring deleted submissions, too. The submission.delete event's details also contains instanceId.
Thanks for the response. Your confirmation that I was looking in the right direction made me dive into it again and after more puzzling I finally found out how the update json works. The confusion was caused by the key names in the json. As it appears an uploaded instance get's a meta_instanceID and a __id (key). Both are identical after the first submission (e.g. 7YH and 7YH). After a modification, the meta_instanceID changes and is passed to the update request json (submission.update.version) (e.g. 7YF). The meta_instanceID is also changed in the submission data (also 7YF). Then the submission (now having 2 different keys: 7YH and 7YF) can be linked to the update json (having key: 7YF) and be updated. This explanation to others that also want to do this.
The code is straight forward: audit_report_modified_submissions = client.get('audits?action=submission.update.version', headers={'X-Extended-Metadata': 'true'}).json()
Then I parse through the json for the variables I need, for example:
for json_in in audit_report_modified_submissions:
created_at = json_extract(json_in, 'updatedAt')[0]
meta_instanceID = json_extract(json_in, 'instanceId')[0] # be carefull: the variable in this json is instanceId NOT instanceID!
updated_at = json_extract(json_in, 'loggedAt')[0] # As is seems, the 'loggedAt' key represent the modification date
But for me the confusing part was the content of the json I got back. As I mentioned in my previous post the audits?action=submission.update.version gives back an ID that is different from the instanceId that is generated during the registration. It continuously creates a new ID that you have to link again to your instanceId. To understand it, it is best is to download the CSV file after you did a modification, then do another modification and download the CVS again. Then compare both CSV files, so you see how the ID's are being created and how they are linked.