Include server log in data download from Central

Florian_May · April 15, 2021, 7:35am

Here's a scenario from a non-clinical context:

A corporation operates in a biodiversity asset's home range, which potentially endangers the asset. Think turtle nesting beaches being impacted by infrastructure developments.
A regulator monitors the biodiversity asset using ODK, and sends the QA'd data to modellers.
The modellers use the data and the underlying assumptions, caveats, limitations, study methodology to infer knowledge on the population health of the biodiversity asset.
Should the modellers find that certain metrics of the biodiversity asset decline over a defined threshold, all regulatory hell breaks loose, meaning the corporation has to pay extra offsets and faces potential restrictions to their operations.

This means that lots of money rides on the veracity of the analysis and the trustworthiness of the data.
Since observational data captured by human enumerators consists of their claims to have encountered and measured specific things, we don't have truth, only claims. QA operators overlay the initial claim (the unedited, raw Submission) with their opinion (Submission edits) based on their expert knowledge.
This means that the audit trail of submission edits is a critical piece of information which could be subject to hostile inquiry.

If submission editing comes to ODK Central (in addition to editing the exported submissions in a downstream data warehouse, where these edits are logged), every pathway to access and analyze the submission edit trail would assist in making edits done inside ODK Central transparent and defensible.

How we would use the ODK Central submission edit logs:

On a per record basis: Filter Submissions to "edited", inspect each by hand, as seen in above screenshot. This can answer questions for an individual record, but might not scale terribly well.
On a per form or project or server basis: Get a very generic table of form ID, submission ID, submission version (one for the original submission, one per edit), submission content (the entire submission after that edit), editor, datetime, (edit comments?). Answer some questions in bulk with some analysis of that entire table. This would certainly be a good workout for the poor analyst, but could scale well. I don't have a hostile inquiry here yet so I don't know which questions might be asked.

The outcome would be that we can defend the data as "as truthful as we can make it" by showing every edit to the inquiry.