Synchronising database with API

Hi all,

I am using pyODK to download our data from Central Server. As our database will be growing, and I do not want to download the entire database on a daily basis, I am trying to create a sync-script that checks changes (new submissions, deletions and modifications). My idea is to check the database on an existing 'submission-id' and 'updatedAt'. By doing so, I can efficiently check if an id exists and/or was modified (by comparing the 'updatedAt'). However, for the deletions this might cause issues: As far as I know there is no variable in the API-json that says that a submission was deleted. And, API's can experience hickups causing that certain id's are not downloaded. So simply coding that id's that are not present can simply be deleted from the database might cause problems. Any idea's on how to solve this? Will there be a sync-json in the future that only shows the changes (modified, new- and deleted submissions) in the Central Server? Thanks for your ideas.

1 Like

Hi,

I am not familiar with pyODK, but check the audits log api. It will give you everything that happened on the server between start and end timestamps.
If you are only after submissions, set the action query parameter to submission and it will give you all the submission related log entries. For every submission log entry, you'll have an actee and details properties. From there extract the projectId, xmlFormId and submission guid. Then depending on the action (i.e. submission.create, submission.update, submission.delete, etc.) action accordingly.

3 Likes

ruODK supports the various audit endpoints, and pyODK supports generic queries to endpoints it doesn't support by name yet.

I was not aware of these audit log api's. This looks promising. Thanks for the help!

Did not know ruODK but a quick read told me it's for R coding. I am not familiar with R. I think the solution of Punkch is most promising/low hanging fruit for me. But thanks for the help!

1 Like

Hi Punkch,

Any idea how I can add an 'action' to the GET (Audit log) request? For example, I want to see the Audit log on the deletion of submissions: "submission.delete". How do I add this action to the GET request?: audit_report = client.get('audits').json(). Thanks for the support!

Sorry @Edmonds I am not familiar with pyODK so I hope someone else can help you out.

Instinctively, I feel something like the below might work

audit_report = client.get('audits?action=submission').json()

You might also need to set the header X-Extended-Metadata: true which I have no clue how to achieve with pyODK.

Hi @Edmonds, @punkch has the right intuition.

?action=submission and action=submission.delete would both work to filter by all submission actions or just submission delete actions.

You can get extended metadata by adding headers={"X-Extended-Metadata": "true"}. This will expand some of the information in the audit response.

  • actorId => actor object with more details about the actor
  • acteeId => actee object with more details about the object (for a submission action, this will actually be the form)

Example:

client.get(
    url="audits?action=submission.delete", headers={"X-Extended-Metadata": "true"}
)
4 Likes

Hi ktuite and punkch,

Worked smoothly. Just what I needed. Thank you both for the help!

1 Like

Hi ktuite,

The overview of deleted submissions works fine with the audits log api. However, I am now also trying to get an overview of modified submissions. I assumed to find this also in the audits log api with the action submission.update. However, I can't find the instanceId's that were modified in there (at least not the instanceId and the date of modification updatedAt). Am I mistaken or should I find it somewhere else (another api or with another action)? The only thing I need is the instanceId and the updatedAt so that I now which instanceId in my (Postgres) database needs to be overwritten (by the modified instanceId at Central Server). Thanks for the feedback!

There are a couple of different submission update events:

  • submisison.update is logged when a submission is approved/rejected/etc. and just changes the submission metadata without changing the underlying submission data
  • submission.update.version is logged when the data of the submission has been modified.
  • (more actions listed here)

The details of events with action submission.update.version do contain the instanceId even though submission.update's details don't. If you're looking for which submissions have had their data actually change, this is probably the event you want anyway.

You mentioned monitoring deleted submissions, too. The submission.delete event's details also contains instanceId.

Hi ktuite,

Thanks for the response. Your confirmation that I was looking in the right direction made me dive into it again and after more puzzling I finally found out how the update json works. The confusion was caused by the key names in the json. As it appears an uploaded instance get's a meta_instanceID and a __id (key). Both are identical after the first submission (e.g. 7YH and 7YH). After a modification, the meta_instanceID changes and is passed to the update request json (submission.update.version) (e.g. 7YF). The meta_instanceID is also changed in the submission data (also 7YF). Then the submission (now having 2 different keys: 7YH and 7YF) can be linked to the update json (having key: 7YF) and be updated. This explanation to others that also want to do this.

Thanks for pointing me to the right direction.

2 Likes

@Edmonds please share the whole code you have come up with if you dont mind

Hi @Kon_Li ,

The code is straight forward: audit_report_modified_submissions = client.get('audits?action=submission.update.version', headers={'X-Extended-Metadata': 'true'}).json()

Then I parse through the json for the variables I need, for example:

for json_in in audit_report_modified_submissions:
    created_at = json_extract(json_in, 'updatedAt')[0]
    meta_instanceID = json_extract(json_in, 'instanceId')[0] # be carefull: the variable in this json is instanceId NOT instanceID!
    updated_at = json_extract(json_in, 'loggedAt')[0] # As is seems, the 'loggedAt' key represent the modification date

But for me the confusing part was the content of the json I got back. As I mentioned in my previous post the audits?action=submission.update.version gives back an ID that is different from the instanceId that is generated during the registration. It continuously creates a new ID that you have to link again to your instanceId. To understand it, it is best is to download the CSV file after you did a modification, then do another modification and download the CVS again. Then compare both CSV files, so you see how the ID's are being created and how they are linked.