Approaches for linking form instance changes to individuals

Thanks so much to all who have contributed to improving this spec. @yanokwa has invited me to be on the TSC call tomorrow. I can be there about 15 minutes in.

I took a moment to listen to the last call. I should have done that before responding. Luckily what I wrote in my last comment is still relevant but I realize it may need more context. First, thanks to @aurdipas for bringing up the question about the non-blank to blank to non-blank case and to @adam.butler for the explanation of the intended behavior in the current spec draft.

Why not ask for change reasons only on form re-open

An earlier draft of the spec proposed asking for change reasons on form re-open but as mentioned by @tomsmyth in the call, it’s common to partially fill a form and to complete it on re-open. It doesn’t make sense to ask for change reasons then.

Also, asking for change reasons any time a non-blank value is changed better matches paper. For example, if I write down a child’s age as 12 and then I need to cross it out to write 13, I would need to initial and explain this because it was written in ink. This is the case even if I immediately catch the mistake and make the fix.

Possible strategies to overcome issues with asking for change reasons only on form re-open

The big downside of letting the form designer choose when to ask for change reasons (always, after save, after finalize…) as @tomsmyth suggested or leaving it up to the enumerator as @Xiphware described is that both require more training and leave more room for enumerator error. For example, if the form designer can choose when change reasons are requested, enumerators have to be trained on when to save vs. when to finalize and have to be given instructions on what to do if they do the wrong one.

In the case of the enumerator deciding, another downside is that it leaves the door open for both accidental and deliberate opt-out.

Why tracking pristine status doesn’t seem worth it

On the call, @martijnr described requesting change reasons when non-blank values are changed as a crude stand-in for requesting change reasons when a previously-set value is changed. That’s correct, though it’s not clear to me that requesting reasons for changes to previously-set value is much better.

Tracking whether each field has ever been set (“pristine status”) is possible. Then explanations for changes could be requested when changes are made to non-pristine values regardless of whether the new value is blank or not.

I believe this would lead to the following two differences when compared to what is currently written in the spec:

  • Changes to default values would not require explanations because they are pristine
  • In a case where the user blanks a value and then sets it again, explanations would be requested both when blanking the value and when it is set again (even if immediately after).

I’m not entirely sure that those are improvements. In the default case, I can see it either way. In the blanked to non-blank case, requesting two reasons (one for clearing the value and one for re-entering a value) seems like it would generally be redundant.

Tracking pristine status would add complexity and possibly have performance implications. I think it would need to provide clear user benefit for it to be worthwhile.

Am I underestimating the importance of the blanked to non-blank case?

See my previous post for more on why I think the blanked to non-blank case is uncommon and above for why I think suggested ways to address it are not much better or have other downsides. But am I underestimating its importance and/or overblowing the challenges with the alternatives? @aurdipas, is it possible you reacted really strongly to this because the wording in the limitation section made it sound like no change reason was asked for at all in that case?

When to log events when there are multiple questions on a screen

As @Xiphware has pointed out in his recent post and @martijnr highlighted in the doc, it’s not as clear when question events should be logged when there are several questions on a page and especially when some question types can be updated continuously.

This affects all audit features and not this one specifically. Currently, Collect defines the start and end times of question events from within field lists as the times when the field list is entered and exited, respectively. This definitely needs to be improved, but I don’t see a reason for it to block this spec.

I propose we make separate decisions on this, possibly for each question type separately, as other clients get ready to implement the audit log features.

Study design standards

On the call, @martijnr also asked about study design standards. See bullet 2 in the original feature description for an example. One thing I’ve understood from @dr_michaelmarks (correct me if I’m getting this wrong) is that protocols and training are more important than tech.

That is, the technology has to make it possible to collect things like user identifiers and reasons for change but beyond that, a robust protocol with things like independent oversight go a long way in determining whether an approach is standards compliant. It’s possible to design a bad protocol with great tech or to design a great protocol with limited tech (e.g. paper).

3 Likes

100% correct @LN

In reality many minor changes don't have an explanation given either; i.e if I cross out age 8 and put 9 (or even say 20 and put 40) I might well just initial & date-time.

That is to say that ideally there needs to be an option to record changes but its not compulsory to do so (but it is compulsory to date & identify who made the change)

1 Like

Correct, although its not quite so bad to continuously logs value changes when moving a slider. But you certainly dont want to pop up and block on a 'Reason for change?' window every time... [I'm trying to think how on earth I'd implement this tracking for my current sliders :anguished: ]

I violently agree :grin:

1 Like

I agree with the fact that many minor changes don't need a reason. As I said in the call, ideally the reason for change should be requested at question level. It would be great if in the xls form we could specifify for a question if a reason for change is required or not.
Depending on the CT (let's say more or less strict) it is not needed a reason for all changes.
But if we can have compulsory to date & identify who made the change that is OK.

I didn't react "really strongly" :slight_smile: I just just said that if it is a choice this should not be listed as a "technical limitation" .

@dr_michaelmarks Are you going to the ECTMIH in Liverpool? I'll be there and present the ODK implementation of the WHO VA. It would be cool if you are around and we could meet in person.

1 Like

Yes I'll be there on the Thursday and Friday

2 Likes

Thanks for inviting me to the TSC call and to all TSC members for being so thoughtful in your feedback.

On the call, we decided the next step would be for @aurdipas to get one last round of feedback from his team. I also mentioned I'd show a sample interaction possibly from a form @aurdipas provided.

I went ahead and built a quick example with a simple toy form. Please note that this shows the interaction in a one-question-per-screen context. There will need to be an addition made to the spec for clients that show multiple questions per screen by default. See conversation at Define audit log event behavior when there are multiple questions on one screen.

This is an extremely rough prototype so the dialogs are unstyled, the text is not polished, the question shouldn't change until the reason for change dialog is dismissed, etc. What it does show is the flow intended by the current spec. You can also see the resulting audit log with a description of how events relate to the video in this Google Sheet which also contains the blank form.

Its probably worth making explicit (since I'm not sure it has yet?) that the sticking point seems to be around specifically when to prompt the user to enter change reasons.

Requiring the enumerator to enter their id/name at the start (and assume an implied opt-out if they dont?), and then logging this info and a timestamp whenever anything changes (including blank-to-non-blank!), is probably a no-brainer; its a minimal amount of additional audit data, and it happens entirely behind-the-scenes on the client. However, what is important is when we will interrupt the user while they are in the midst of filling in a form, and require them to enter a RFC (Reason For Change), eg via a popup.

I'm not saying this gets us any closer to answering the question (alas) but maybe it'll help focus on the core issue? IMO its not quite so much "When to link form instance changes to individuals", as it is "When MUST we interrupt users to enter RFCs".

there's no link to a google sheet :frowning:

Yikes! Fixed. :see_no_evil:

Yes, thank you for making that really explicit. With that framing, the little demo video I put together and the resulting audit log is meant to show what it looks like when the reason for change is requested when any non-blank value is changed.

Another good point. I believe this means we can file issues and start work on the odk:identify-user portion of the spec since it is now entirely disaggregated from reasons for change. Does that sound right?

Yes, I believe we can probably disaggregate logging user + timestamp from RFC's in the change log, and make these low hanging fruit (so we can at least make some progress whilst struggling with RFC's...). But whether we should I'd probably still want to hear back from some users of this feature if they see any problem with this (@dr_michaelmarks? @aurdipas?). The only potential issue I can foresee is that you could well end up with entries in your change log without an associated reason; although it should be a simple matter to filter these out.

Do note that timestamp and change values are already in the spec and supported by Collect as described in https://opendatakit.github.io/xforms-spec/#audit-attributes and https://docs.opendatakit.org/form-audit-log/#change-tracking.

The spec additions for identity tracking would be an odk:identify-user audit attribute and a user column in the log. Clients that support odk:identify-user would be responsible for getting a string user identifier and including that identifier with each event. I believe that in prior discussions we agreed that this would be useful even without reasons for change (I feel like maybe you even suggested the disaggregation initially!).

1 Like

I have been digging in to this a bit more.

Here is an example of the change log that redcap (widely considered GCP compliant) makes

As you will see if captures username, time date and change.

As far as I can see there is actually NO function in redcap via the online or mobile interface to capture reasons for change.

The following is taken from a redcap FAQ page:
"For each event that changes data in the database, REDCap records the time and date, the username of the person logged in at the time, the type of event, and the changes made. The entire audit trail is stored during the lifecycle of the database."

So it seems to me that any Reason for Change data we collect will be beyond what most people do.

I would therefore suggest we record it at the level of the whole form. I.e not per question or per screen.

@LN can you remind me what triggers an audit log capture? Definitely if you move off the screen but can you remind me if it captures changes made whilst still on the screen (i.e i type 7, delete and type 6) (I'd think that was overkill).

I got a feedback from our Datamanager in the Med department.

"Regarding the spec, all fine except as I commented before:

User not having to login with a password to confirm identity is a major weakness and for this reason the ODK audit trail/reason for change is not suitable for 'electronic' clinical trials. "

But this we knew as a limitation. ODK will be let's say "paper equivalent". But if we consider that new regulation for Clinical Trial are now moving to Electronic data capturing tools, if at one point ODK collect will allow user login with password this will be super :smiley:

On the other hand I got the attached doc.

User Story_Reason_for_change.docx (12.9 KB)

Have a great weekend
Aurelio

@dr_michaelmarks see you soon in Liverpool

Technically this is not required by GCP.

GCP states

5.5.3 When using electronic trial data handling and/or remote electronic trial data systems, the sponsor should:

(a) Ensure and document that the electronic data processing system(s) conforms to the sponsor’s established requirements for completeness, accuracy, reliability, and consistent intended performance (i.e. validation).

(b) Maintains SOPs for using these systems.

(c) Ensure that the systems are designed to permit data changes in such a way that the data changes are documented and that there is no deletion of entered data (i.e. maintain an audit trail, data trail, edit trail).

(d) Maintain a security system that prevents unauthorized access to the data.
(e) Maintain a list of the individuals who are authorized to make data changes (see 4.1.5 and 4.9.3).

(f) Maintain adequate backup of the data.

(g) Safeguard the blinding, if any (e.g. maintain the blinding during data entry and processing).

In fact the word password occurs at no point ever in the GCP regulations.

My question @aurdipas for your data manager would be

"Where is the password for paper"

We propose to get round this with other approaches

  1. Device level locking
  2. If necessary one username/password per device
    i.e
    Different Collect account for every fieldworker

this means to me that a sort of password protected. But I'll ask our data manager to be more precise.

FDA is also expecting that

Access must be limited to authorized
individuals
• Each user should have an individual
account/password
• Passwords should be changed at
established intervals
• The system should limit and record the
number of unauthorized log-in attempts
• Automatic log off for long idle periods

we are not on paper we use an electronic tool for data collection Macro

And our data manager agreed that this system is paper equivalent (without password).

I'll ask more details to him as soon as possible and share with you.

Yes so we propose that you can achieve:

(d) Maintain a security system that prevents unauthorized access to the data.

At the device level - just password protect the phone/tablet.

Maintain a list of the individuals who are authorized to make data changes
This is normally at the server level I think; i.e who can change the raw data.
But could also be done by giving each data collector a unique username.

the login/password is a wish for me in the future.

For the moment being paper equivalent is OK.

this means a tablet per fieldworker that can not be used from someone else or you have multiple users per tablet?

The limitation of not having a login is also that the user that first fill the form is not tracked (the reason for change it happens for a modification not for first fill the form).

not necessarily at server level. We are talking also of access to the tablet.
And giving each data collector a username does not suppose a sort of login?

We can continue the discussion on Thursday next week with a beer, I pay the first one :smiley:

I think between this, and longitudinal studies, we may want to consider moving the convening to a brewpub! :grin:

This is certainly possible but it seems like it would then be worse than paper. I think from a user experience standpoint, it's also easier to explain a change right after it has been made rather than explaining multiple changes at the end. Are you suggesting this as a simplification or do you see an advantage to it?

In a one-question-per-screen context (Collect's default), changes while on the same screen are not captured. You can see this in action at second 9 of the little demo I built:

The resulting log would look like this.

I'm looking forward to seeing the notes from that conversation! :wink::beers:

I can see pros and cons of per question and per form reason for change recording. It's fidelity Vs intrusiveness i guess.