Approaches for linking form instance changes to individuals

LN · June 1, 2019, 1:07am

Link each change to form data to an individual describes a high-level need to keep track of who has made changes to form instance contents across instance opens.

Below are some options of how this could be done, none of which I am fully satisfied with. I hate to provide so many options but I think there are lots of different ways we could meet this requirement and I think putting them out there will help get to a better approach. I’m very interested in:

Which option(s) you like
Which options you’d like to take off the table
Any other directions we should explore

Option 1: generic way to clear a form field on instance load and require it filled before form save

Introduce an event that is dispatched every time an instance is loaded (as opposed to odk-instance-first-load which is dispatched only the first time an instance is loaded). This would make it possible to author a form that clears one or more fields on instance open using setvalue. For example, consider fields that prompt for the “Last person to modify this form” or “Comments about the latest revisions to this form”. If these are required and cleared on form open, whoever is editing the form would need to populate them. Only the last value would be saved in the data submission but if the audit log is configured to track changes, all values would be saved in the log.

In Collect, this would also require introducing a configuration option to validate on save. Currently validation is only required on finalization (and by default it's done on forward swipe as well). I think design decisions of when to validate, user interface settings, etc, have been left up to individual clients so I don’t believe there would be spec modifications needed for this.

Option 2: audit log attribute, custom client UI and new audit log convention

Add an audit attribute (e.g. odk:track-user). If that attribute is set to true, clients will define UI outside the form to prompt for user identification each time that a form instance is opened. For example, Collect might pop a dialog up while the form instance is loading in the background.

Sub-option 1:
Add an audit log event type (e.g. user details) and when the user provides identification, write one of these events with the user identification in the new-value column.

Sub-option 2:
When the user provides identification, write the user identification to the new-value column of the form start or form resume event.

Option 3: generic way to identify a field as needing confirmation

Add a bind attribute that can go on any field and that if set to true, prompts the user for confirmation with client-specific UI any time an instance is opened. This is similar to the option above but it’s on specific fields and the latest confirmed value is what gets written to the form instance. Interim values are saved to the audit log by the usual mechanism.

Option 4: generic way to link forms

The idea here would be to make it possible to log change information with a companion form. The main form would prompt the person editing it to fill this companion form on each open.

This might best be done in a client-specific way. For example, this is almost possible today in Collect using a combination of the activities the app exports and the external data questions. The big thing that’s missing is a way to pass the instanceID of the main form to the dependent form. I’m guessing it’s either possible or almost possible in Enketo with HTML links. No, this should be done in a portable way.

Option 5: generic way to insert a node on instance load

Introduce something like the W3C XForms insert action, perhaps limited to repeat instances and an event that fires every time an instance is opened. This would make it possible to add arbitrary new repeat instances with required questions (or perhaps fields not in a repeat).

Like with option 1, clients would need a way to require validation on save for this to force user identification.

The fifth option (generic insert) is the most principled but it also represents a ton of work and I’m not sure it provides a ton of user benefit beyond this context.

Introducing an event that’s dispatched on each instance open (perhaps this is xforms-model-construct-done?) would be the simplest. I’ll admit option 1 does feel like a bit of a hack, though.

I’m getting pretty intrigued by the possibilities doing this with linked forms might open up.

paul_macharia · June 3, 2019, 6:37am

@LN,

Option 2 sounds to me the most efficient and effective way to document user changes to form instances.

Paul

dr_michaelmarks · June 3, 2019, 1:05pm

These all sound plausible.
I agree Option 2 sounds pretty efficient.

dr_michaelmarks · June 3, 2019, 1:42pm

@LN - can I add that regardless of which option is selected that I feel that adding a "Validate on Save" option (proposed as part of option 1) is an independently useful and non-disruptive option which would be very beneficial within ODK Collect.

Xiphware · June 3, 2019, 11:54pm

It seems like the desired requirement here is for the most part already being accomplished via the existing audit mechanism, albeit perhaps without logging (and explicitly confirming?) the current enumerator/user whenever a form is opened. So I think something based around the audit log makes the most sense (ie I dont quite see why coming up with an entirely new mechanism for this feature would make sense...).

Another variation to consider, which might better support asynchronously working on different forms concurrently, would be to log the userid on each audit value-changed event (along with, say, the timestamp). Then it'd simply be matter of filtering the audit log for each desired form to extract what you need (as opposed to having to track 'signs-in' / 'sign-outs' when jumping between forms...)

I assume then intention is that this per-form audit track gets pushed back up as part of form submission (as an new attachment)?

LN · June 4, 2019, 6:54pm

Agreed that option 2 would be expedient but it's also the least flexible. That is, if it's defined as prompting the user for an alphanumeric value, that's all it can do. It can't collect signatures, identifiers from a fingerprint app, etc. Similarly, once it's decided that it tracks username and comment, that's all it can do. With some of the other options, the form designer could choose to do things like require comments on certain fields.

That sounds right.

Agreed that the goal is to get a user identifier written to the audit log. A question underlying the various options I've provided is how much flexibility to give the form designer to achieve that. We have this wonderful, flexible tool for defining data capture (XForms) and it feels unfortunate to go completely outside of that.

I agree this is appealing. It would require all clients to have the notion of a session and of users logging in and out which I don't think any do at the moment. This needs to happen offline so using a remote server for identity confirmation is not an option.

Exactly, it is per-instance and uses the same machinery as upload. More in the spec.

dr_michaelmarks · June 4, 2019, 7:33pm

The technical aspects are slightly above my pay grade but I do agree that a flexible system that can do more than (whilst facilitating) collecting a user ID is preferable if implementable.

Xiphware · June 5, 2019, 2:10am

Re-reading your original post:

4.9.3 Any change or correction to a CRF should be dated, initialed, and explained (if necessary)...

An audit log would accomplish dated via its timestamp. You could probably argue that all changes made between the audit log's 'sign-in' and 'sign-out' are effectively initialed [or worst case, you'd add the userid explicitly to each change entry]. But I dont really see any reasonable way to start inserting user comments into the (background) audit log, as necessary to explain potentially every change! I think that might to almost require a parallel form (ie option 4) and hence would pretty much rule out using an audit log, right?

LN · June 6, 2019, 11:18pm

I think you see why I've changed my mind a few times about this already.

I do think a generic way to link forms could address this and a lot of other use cases in an elegant way from a form design perspective. From an analysis perspective, until there's some kind of server support, it would mean joining the value change info from the audit log to one of these secondary forms. Definitely doable but much less convenient than having everything in one place.

Also, I wrote "This might best be done in a client-specific way" in my original post but I think I had a weak moment of laziness (forgive me, @martijnr). That would really be a missed opportunity. It would be such a useful feature that it should be portable. So I take that back.

My sense is that these corrections wouldn't happen all that often and that the comments wouldn't be all that long. I'm imagining something like "inverted 9 to 6" or "verified sensor reading" or something like that. Plenty of studies use paper margins for this which can be cramped. Really, there is no limit to what could be stored in the audit log and it would be plenty comfortable to read in a spreadsheet program.

Xiphware · June 7, 2019, 4:09am

Just to throw something crazy out there...

Instead of pushing back an external attachment containing the (selective) change log info captured with the form, would it be possible to submit an XML result containing both the primary and a secondary instance (!)? In this case, this secondary instance would record changes made to whatever was tagged in the primary instance as requiring change logging (ie date,userid,description). Benefit of having this in an external instance is that it may make it susceptible to being handed via our existing XForm functionality (as you state: "We have this wonderful, flexible tool for defining data capture (XForms) and it feels unfortunate to go completely outside of that.")

Any controls/fields requiring these changes logs would be tagged with xforms-value-changed, this could fire a client-specific UI popup to capture optional description, userid (signature, text initial, fingerprint, iris scan widget...) which would then be inserted the necessary data into the secondary instance, possibly using the equivalent element names and group hierarchy. Thus not polluting the original primary instance with a lot of audit info, but (somehow? TBD?) still allowing us to specify and capture the necessary audit data within the confines of our existing XForm framework.

And the whole she-bang would be submitted back as potentially one XML blob, containing both a primary and this secondary instance. Or the submit process could strip out this secondary instance and submit it as an external XML attachment.

Like I said, 'crazy'...

[I guess this is a bit like Option 4 (secondary instance vs linked form), with a smattering of Option 5 (XForms events) thrown in]

aurdipas · June 7, 2019, 12:39pm

form our med dptmt:

" also think option 2 is the most promising.

The important thing for an audit trail is that for each data point changed the following is captured (in addition to the original entry):

ID of person making the change
change made
date of change made
reason for change (not sure this is possible with option 2, though)

If a data point is changed 3 times, this should be visible in the audit trail.

In a typical clinical trial, electronic CRFs usually have the option to put questions to individual data points (with ID, date/time), e.g. when a study monitor finds that a data point is not reflected in the source document (e.g. by a transcription error) he flags this in the CRF. This form is then sent back to the investigator who in turn replies to the monitor's comment and makes the appropriate corrections (again with ID and data / time). So in the audit trail we clearly see why a change was made, by whom and when. If ODK is to be used for regulatory clinical trials, this is something that the system should be able to do. Note also that there are normally not that many data changes."...

PS: waiting for a second opinion from med dptmt data manager, that is the expert of the electronic data capturing tool we use for Clinical trials.

martijnr · June 7, 2019, 4:11pm

Very thorough options! I'm going to absorb these options for a bit, but wanted to quickly point out that when reading this, I'm realizing OpenClinica (clinical trial software) has built something very similar on top of Enketo (i.e. ODK XForms). Audit trail, comments, reason-for-change (and a lot more). There is some magic involved, and the form format extensions we used would not pass muster for our generic specification (stored as stringified JSON in a single XML node per question), but maybe some elements are of use. It is built on top of a basic comment feature in Enketo, which I think would pass muster spec-wise (I didn't get it into the ODK spec though tried). FYI, see this XLSForm (it can be tested on http://opendatakit.org/xlsform/)

yanokwa · June 7, 2019, 8:13pm

Option 2 is nice because it's going into the audit log where we've started to put these things. Also, it's relatively fast to do.

I like it because you can imagine getting this dialog at the question level instead of the form level and that'd better satisfy the requirement. That workflow would be something like this:

On first launch, you enter the username/id. Maybe we auto-fill this if you have the username question.
On second launch, if you change a question, you confirm username and enter a short reason for change. Timestamp and the change itself are already in the log and the dialog keeps the previous entered information to speed up entry.

Yes, this narrows the type of data you can collect, but:

Systems with CRF support get this same narrow set of data and everyone is fine with it.
We can't even use fingerprints in Collect outside the audit. There isn't widely available hardware so why let this be a blocker?
Signatures are pretty large images. This could be fixed, I suppose.

To me, the big negative is that if we go down this road, Enketo won't have this feature, but Enketo (and thus Central at some point) could support it with relatively little work. Of course, we have to convince @martijnr, but he's less fanatical than he's ever been !

Xiphware · June 7, 2019, 8:51pm

This is an important point, which may preclude certain options (eg I think it could be difficult to accomplish using a parallel form/secondary instance/per XML element metadata). If each and every change must logged - even undoing an edit and setting a field's value back to its original during the same session, this rather implies a continuous change event log, ie audit lot.

LN · June 12, 2019, 4:56am

I'm starting to warm up more to a specialized feature linked to the audit log (option 2) with all the arguments made here. It sounds like "for each change made on form re-entry, gather an alphanumeric user identifier and free text comment" is a very common requirement and narrowly addressing that case in a way that is easy for form designers would provide a lot of value. We could write a spec that describes what pieces of information need to be gathered from the user, when the user is required to enter them, and where in the audit log they get written to. Clients could choose an appropriate presentation.

If we had existing full-featured XForms engines to work with, there would be lots of interesting options but we don't and the XForms-y solutions I've come up with are either a ton of work for little marginal user benefit, not a great user experience, a bit of a hack, or a combination of the former.

Agreed the specs for for and the comment appearance look good. When those comments are used to track reasons for change, is there something that forces the comments to be filled in? Is user identity tracked based on who is logged in to a server?

I do think that for our users, separating the form data that and the audit metadata is valuable. It's not always the same people who analyze the two and working with XML is generally a higher technical bar. All that to say, if something like the for bind attribute were used for this purpose, I think the servers would have to do additional steps to extract the audit information. Given server fragmentation, I would say that's less than ideal.

Similarly, regarding the form design side, @martijnr wrote

That doesn't feel particularly more flexible than a single attribute that magically requires comments on all fields.

What I'm most nervous about are follow-on requirements like coded (rather than free-form) reasons for change.

dr_michaelmarks · June 12, 2019, 6:19am

@Xiphware asked "If each and every change must logged - even undoing an edit and setting a field's value back to its original during the same session"

Several levels of answer here.

the audit function @LN @yanokwa built does this already (for example answer question, swipe forward, swipe back change answer).

This is actually a level of detail BEYOND what redcap actually does.
On the other hand it is an accurate reflection of what should ideally happen (i.e Redcap misses some timepoints where ideally you would have audit data).

The key here therefore is not tracking the changes (which the newly done audit log does) or when they were made (again already done) but the WHY and the WHO
In general what the main user wants to export is the FINAL data values.
The audit log is so one can go back if requested and see what changes etc but is not routinely looked at per se as part of analysis

aurdipas · June 12, 2019, 10:10am

second feedback:

"Option 2 looks most promising. In addition to that, for clinical trials I would like to add that:

It is absolutely key that there is user verification with password when entering data (whether initial entry or changing data). Otherwise you cannot trust the name that is assigned to the action in the audit trail. The audit trail is used in clinical trials to prove no one had access to the data that should not have had access to the data, that only trained and qualified personnel has entered data, to check if data has been entered according to a logical time frame (done to detect errors and/or fraud).
It would be ideal if you can set reason for change as mandatory at a question level when building the CRF. Because if it is optional, you will still need to check if reasons for change are being given, which leads to extra work. "

Xiphware · June 12, 2019, 12:39pm

I could see easily standard responses (aka select one) being/becoming a high priority fairly quickly. Having to type the same change reason in repeatedly gets annoying quickly. That is, @dr_michaelmarks' "...but the WHY".

martijnr · June 12, 2019, 5:25pm

I agree with @aurdipas, that user information ideally comes from the credentials used to retrieve the form, and not from a user-entered field (and OpenClinica maintains sessions to add user info automatically).

I'm fine with option 2 as well.

FYI, to share how Reasons For Change (RFC) are done in OpenClinica. First of all there are special views that require RFCs and others that don't have them at all. I believe it's related to the role of the user and the stage of the review process. This is probably easier to do with webforms than with a mobile app. For the views that require this, we automatically add an input field at the bottom of the page for each field the user changes. These fields are separate from the form. The fields have to be filled in before page-flipping or submission is allowed. There is an option to fill in all of the fields together with one reason or add reasons for individual questions.

Xiphware · June 12, 2019, 9:39pm

Could you post example of how these look in an XForm (XML?) definition? Or is this tagging accomplished entirely outside the form definition?

I'm thinking

<bind ... jr:rfc-prompt="what is your favorite color" jr:rfc-required='true()'... /bind>

to trigger popup when filling in form... Or perhaps better in the control definition itself?