Collect: keep history of changes to values in the form

I agree that it's redundant and not exactly when the enumerator modified the data but my sense is that it will be easier for users to analyze the data if those columns are populated. That is, they at least give a sense of when the change happened and so it's possible to get a reasonable picture of the edits made to the data by filtering the CSV to see only value change events.

1 Like

It would definitely be preferable (ideal) to get a clear change history so you can see not only the changes made but the order that was done. Ideally date/time stamp.

1 Like

I agree that we want to have timestamp because it does make it a lot easier to order the changes. And I'd also argue for including the location values too. I've updated my post accordingly.

This will be a lot of data, yes, but this is an opt in feature and Central and Aggregate on Tomcat can both zip the data in transmit.

Decisions like the structure of the log are very hard to change in the future so I want to make sure we're carefully considered which is going to be most useful to users. That depends on the type of analysis that is eventually going to be done and I don't have a good feel for that.

I see three different options that have been discussed:

  1. Adding old-value and new-value columns and tracking both as part of a question event.
  2. Adding a value column and tracking the current value as part of a question event.
  3. Adding a new value changed event and a value column that would only be populated for that new event.

To make things concrete, I have put a form and examples of the three logs in this Gdrive folder. Each log the following form-filling session: swipe through and fill all fields, swipe back from school details to age to see age without modifying it, swipe forward to school details without modifying it, swipe to the end screen, swipe back to school details and fix teacher name and clear first class time, swipe to age and modify it, jump to last name to view it and then jump to end. @Grzesiek2010 also has two prototype implementations at collect#3042 and collect#3024.

The big question for me is whether users want to being able to identify when values have changed with simple spreadsheet analysis rather than through visual inspection or more sophisticated analysis software. To make this concrete, do we think users will want to answer questions like:

  • which question's value was revised the most times (this could help identify an unclear question if done across enumerators)
  • how many questions' values were modified by this enumerator?

If this is desirable, then I think always logging the current value when there's a question event (option 2), may not be ideal because detecting when a change occurred requires doing comparisons across rows.

I don't have a strong sense of the tradeoffs between options 1 and 3 when it comes to analysis. There's also the possibility of a hybrid approach where old-value and new-value are only logged when there's a change. I have included that as option 4. This makes it easy to identify when changes occurred and what those changes are. Logging only when there's a change can't be done with a single value column because then there is no difference between no change and a change to a blank value.

1 Like

I would be quite interested in this feature, particularly for clinical trials where by an audit trial is needed for trial monitoring and reporting.

Thanks for these examples, I think a visual inspection of the values that have changed is reasonable, as well as the means to undertake a more sophisticated analysis with software. I would prefer not to have the old and new value columns, presumably new columns would be added every time a value is changed?

I would prefer the new value changed event and value column approach, (3audit-values-new-event). But if it not possible to keep the values in one column then option 4 (4audit-values-old-value-new-value-only-on-change) seems valid and appropriate. It also makes it easier to read across the rows to see when and what the change was.

best,
John

1 Like

Thank you for sharing your preferences, @CharlieKeyes!

That is the 4th option I showed an example for. The columns could also be populated every time a question is visited (1st option). But presumably the 4th option is always more desirable than the 1st and provides just as much information so probably I should not have included the first one at all.

It would be great to hear from another user or two, particularly @dr_michaelmarks or @chrissyhroberts.

Hi @LN
I agree that option 4 is best as is concise and can be interpreted by eye by non-specialists

I also tested the demo form and worked nicely, though didn't yet test with encryption.

Other things like username need to be included in the audit if to be used for things like clinical trials. I think that we provided a list of these criteria, but can send again if needed.

Best wishes
Chrissy h

Agree with above from @chrissyhroberts

This feature shipped in Collect v1.22 and the docs are at https://docs.opendatakit.org/form-audit-log/#change-tracking.

1 Like