OpenRosa spec proposal: add matchExactly attribute to form list response

My apologies for slacking off this past month... So it looks like there was already a Roadmap issue ostensibly covering the client-server form sync requirement (added by @yanokwa), so I've added a link to this thread to it. But neither are on the agenda for next the upcoming TSC call, and it looks like we have a full agenda covering post-convening goverance, funding, etc topics... Can this be pushed to following TSC call for review/approval, or has it become more immediately pressing from a development standpoint?

FYI plan is to review this feature proposal (and hopefully approve) in next TSC meeting, which will be Jan 8. Do you @ln or @issa want to call-in?

We want to get it right or at least to an acceptable compromise state so it can take the time it needs to.

Fantastic, I'll be there!

Thanks for digging further, @adam.butler and sorry I've taken so long to react. I share your instinct of wanting to match what's there already. @issa and I discussed various possibilities including something similar to this and I may actually be relaying arguments she made.

It's intended that there could be multiple xforms-group blocks and a mix of groups and loose forms. The groups give clients information to visually organize forms from the form list. Collect doesn't have support for xforms-group at the moment but with Central having more capacity for grouping forms, I think it's something we'd want to add. What we want to express with this new feature is that the whole package of groups and forms needs to be exactly matched by the client so unfortunately I do think that we need to use the xforms root node to add this modifier.

When it comes to element vs. attribute, there are no hard rules in XML so we could do either. In this case, all of the existing elements feel like data whereas this is a modifier. That's what leads me to think it "feels" more like an attribute. Additionally, parsers will generally happily ignore attributes so if we're adding this optionally, an attribute is safer. If we add an element, we run the risk that different clients of the form list might choke on an unrecognized element name.

Good points! I agree this feels a bit more like a 'modifier' than actual meaningful data ( :wink: ), which as you say are easier for a client/parser to basically scan over and ignore if unsupported.

1 Like

Thank you @TAB for staying a little longer to discuss this one today!

There continues to be a fair amount of discomfort around this. Some of the themes that came up echo earlier parts of the thread and some were new. Here is a summary:

  • It feels like adding a feature that's not quite right and is going to become obsolete as a client setting API and a more managed longitudinal data collection experience are introduced (perhaps with client multi-tenancy). Attempts at addressing this here and here.
  • Requiring an explicit user action to trigger the form update feels incomplete to some TSC members. It feels like if the project designer's intent is for client devices to have an exact form list, that should happen automatically.
  • Relatedly, it doesn't play well with the existing auto update features that exist. Enketo periodically polls to update existing forms. Collect can also do this if a client setting is enabled (Form management > Form update). It feels like there should be a companion setting to hide forms that are not part of the form list when that check happens.

Hopefully I have accurately captured the major concerns. As we agreed on, I will chew on those and consult with @issa (or of course feel free to answer directly). If anyone has alternate approaches to suggest, please do.

i hope i have made the underlying user problem clear here. i'll restate that from the research i have done, solving that problem will make a significant impact on users: actual hundreds or thousands of hours of frustration and error may be returned to people's lives.

i don't feel i understand the ecosystem or Collect well enough to recommend a specific path forward relative to the sensible concerns expressed here, and i leave it with faith in y'all's hands to find it.

If the ultimate desired behavior is to ensure the forms present on a deployed device remain synchronized (aka 'match') with the associated deployment server, then yes - I agree this could probably be accomplished better; specifically, a setting of some nature - either client-side, or remotely configurable, or both - which [1] maintains an identical list of (fill-able) forms on the device, and (2) ensures the specific version of said forms remains likewise identical (subject to connectivity...)

The current form-update feature seems to only constitute half of the desired behavior: forms presently loaded the device should auto-update their version, and this is presently is a purely client-side/manual config option (but which says nothing about what forms should actually be on the device). Whereas the proposed matchExactly only constitutes arguably the other half of the desired behavior: what forms should be present/available/permitted, and as proposed would be a server-side/automatic config option (but which doesn't dictate what specific form version thereof?).

It feels like:

  • if the ultimate (90%) desired behavior is simply synchronization (between deployed client/deployment server), then we should be able to accomplish this with a single configuration option; eg for lack of a better term: sync [that is, I dont see compelling project deployment usecases for maintaining synchronization of forms versions that ignores synchronizing the list of forms itself. Or vise versa]
  • this single, collective setting should either be client-side, or remotely configurable (which opens up the whole remote mobile management/profile can-o-worms...), or both.

Perhaps an initial first pass is to at least add a new (manual) client-side (ie Collect) config option that dictates whether to auto-sync both the list of available forms [eg periodically or whenever launch the app then - if online - check the list of forms and (re)load as necessary]. Then decide if and how this functionality should best be exposed for remote management [sic]?

[aside: I do agree with @tomsmyth that this may well spill over elsewhere, specifically longitudinal surveys, and indeed potentially whitelisting ODK Collect clients. eg in GoMobile we have an distinct client configuration API to set both client branding and configure app behavior (eg synchronizing forms, etc) upon app launch. This could be a similar path ODK Collect might need to consider] .

I have been trying to avoid the word "synchronize" because to me it implies something more complete that would include submissions and settings. We know that if we are going to provide a more managed experience where multiple forms are connected, we will have to have a more complete synchronization. I would like to reserve the word "sync" for that. Ideally "sync" would include the client to server direction (submissions get synchronized from the client to server). If that feels too pedantic, I think that "sync" should at least be qualified as "sync form list" because it is not a full synchronization.

It should have no impact on what is permitted to be submitted. As described, it should just mean that the form_id-version pairs of forms in the form list and downloaded to the device should match exactly.

I don't know how common this is but I have seen it and it is currently possible: field workers fill out forms that go to different servers in a single encounter. That means that when a particular server is selected, the forms that it serves should be updated but other forms should not be touched. If we think this is not common enough to maintain compatibility for, then yes, we could maintain a single setting.

I said something to this effect before but I'll say it again. It's a little frustrating that while this is very easy for forks/alternate implementations to do, it's a heavy process for the core to undertake. I don't have a great alternative to propose and I think the TSC structure is a great one but I just want to flag that.

Since adding an attribute to the form list response is not looking like the no-brainer we had hoped for, I'm going to see if I can spend some time putting together a rough proposal for pushing settings from servers to clients. This is something @issa and I have also discussed but we started with this proposal mostly because we thought the settings route led to too many contentious issues (see this comment for some). Hopefully we were wrong on that one!

Perhaps I'm not understanding what the exact proposed behavior will be. For example, if I have a client currently loaded with forms:

  • formA v1
  • formB v1
  • formC v2
  • formD v1

and the client 'refreshes' its form list against the server and receives a matchExactly OpenRosa formlist specifying:

  • formA v1
  • formB v2
  • formC v1
  • formE v1

what form(s)+version(s) are presented on the device after this operation that the user can fill out? [please include any other permutations I might have missed]

I believe the desired user-story here is that the client would have:

  • formA v1
  • formB v2
  • formC v1
  • formE v1
  • plus any previously forms that were perhaps started but haven't yet been completeld/submtited (irrespective of whether they're in the matchExactly list or not) but otherwise you couldnt start a brand new form with.

Have I got that right?

Thanks for bringing up a concrete example, that's helpful. I believe it is exhaustive. The desired behavior should be the same whether we go with an attribute on the form list response or a client setting so it's worth taking a step back to make sure we are aligned on it.

And you're right that presented is the right thing to think about. With the proposed spec, the client would present exactly what was returned by the form list:

  • formA v1
  • formB v2
  • formC v1
  • formE v1

Like we said on the TSC call, I think this would be all that the spec mandates.

In Collect's case, formB v1, formC v2 would be soft deleted, which is the current behavior when form versions are superseded. This happens regardless of the version string so v2 -> v1 is a valid transition if the server has mandated a downgrade. The soft delete is so that records created with the superseded form versions can still be used. I don't think the spec would have anything to say about this so other clients could choose alternate behavior like not allow a form version change until all records have been submitted. Collect would likely additionally fully delete soft deleted form versions once there are no records associated with them.

On Collect, formD v1 would also be soft deleted which would be new behavior.

In the Collect universe, records are made available separately from form definitions so I wouldn't tend to include that bullet in the same list. But I believe you're stating what is intended. In your example, if the device had any records corresponding to the formD v1 form definition, those would still be viewable/editable/submittable even though it would not be possible to fill out a new record from formD v1. For a client that makes all form actions available after selecting a form definition (e.g. select formD v1 to view/edit/submit records), "fill new" functionality should be disabled.

But the sticking point is that FormE v1, FormB v2 and FormC v1 wouldn’t actually be loaded onto the device yet; this would required a subsequent manual operation. Correct?

That's one of the sticking points. :smile:

The OpenRosa specification would not say anything about that. Or do you think it should?

With Collect, the model has always been that users have control over what forms are on their devices. With default settings, there is currently no attempt to fetch the form list until the user explicitly triggers one. We know that in some data collection campaigns, data collectors are trained to perform updates at certain times. That's why in Have Collect exactly match the forms on Central we have described that in Collect, the match would only happen on explicit user action.

However, it's true that in a world where a project manager specifies a form list to be matched exactly, expectations may be that this operation happens automatically. This is in contrast with in the Aggregate world where every data collector sees every form and likely needs to pick and choose relevant ones to download. We could do something like have Collect always perform a form list request the first time that the user has connectivity after the server setting is changed. If the response has the matchExactly attribute, then we could set Collect to periodically poll the server and automatically match the form list. The user could additionally manually trigger the update from Get Blank Form as convenient.

That's why this doesn't feel quite right... obtaining the form list has to be manually initiated anyway; something the campaign manager cannot presently control remotely. The retrieved list of forms (specifically those in 'open' state) already unambiguously states the list of forms that the server will accept, ie submittable. It would therefore seem that the default client behavior should be to only present to the user those forms which may be submitted [sic]. So in that sense adding matchExactly doesn't seem to provide any useful additional information in terms of conveying to the client what forms should be presented (ie those which are submittable). Which is to say, the actual client-initiated operation of calling GET formList in the first already implies the client wants to know what forms are submittable and should be presented as such. So why [that is, what usecase is served?] after calling GET formList, would a client present (existing) non-submittable forms as candidates be filled out for submission?!

I feels like what we're actually trying to accomplish here is to instead perform remote client configuration, specifically turning on/off a "Show Unsubmittable Forms" client option (which one would expect default=No), and doing so somewhat indirectly by piggy-backing an optional flag to the existing OpenRosa formList API. That just doesn't feel right... :neutral_face:

If matchExactly (in whatever form it may take: API vs config option) isnt somehow going to automatically fetch and load any missing forms automatically, then I dont feel like its adequately solving the problem of ensuring remote deployed devices maintain the forms the project manager requires, especially since it still requires a best practice manual intervention to initiate getting the formList - and hence this flag - in the first place.

I am struggling to follow this back and forth between @LN and @xiphware but it sounds like the best way of achieving our goal here is to:

  1. Create a new Form List Sync setting, and
  2. Fast track a remote settings management feature and API spec

Do I have that right?

I like this direction. It feels like mustering a strong effort that will achieve two big wins in one go, versus mustering a "good-enough" effort that may add to our legacy support burden down the road.

Of course we will need to be deliberate with the settings push design, but I think we are up to it.

I know that touching the OpenRosa spec can be burdensome, but I think it's deeply in our interest to streamline that process. The long term sustainability of this project hinges on our ability to innovate in an agile way. I would be happy to help come up with process streamlining ideas for same. @LN, thoughts on this? Would this be helpful?

1 Like

To perhaps state more succinctly...

matchExactly, as stated, only advises the client to hide (ie not present) any forms/versions not in the list. It does so by adding a new flag to the existing formList API. It does not perform any synchronization of what forms are actually loaded onto the device.

From an API purest standpoint, IMO the current OpenRosa formList API is already well specified and unambiguous in terms of its intended purpose: listing the forms you can download and for which submissions will be accepted. In particular, advising the client what to do with other forms not in this list is out-of-scope of this specific API endpoint. Instead, whether to show or hide '3rd-party' forms is a client-side configuration option - of which there are potentially many - and controlling such things remotely probably requires introducing a new API.

The counter-argument is this matchExactly provides a quick (and dirty?) low impact mechanism to tell ODK Collect to hide these other forms from the user (overriding Collect's current default behavior), which doesn't require developing any new APIs.

Thanks for the patience as I work through being sick and having my kiddo out of daycare. :nauseated_face:

Starting to explore a more general feature for servers to communicate settings to clients sounds like a good next step.

I strongly agree that we will need to keep evolving server-client specifications and that if we can find ways to be more agile about it that will have a lot of user benefits.

I’m going to need to step away from this for a bit as I prioritize Collect will need to stop using /sdcard/odk for files. We need to start making changes towards scoped storage to stay in the Play Store.

1 Like

FYI I'm happy to remain involved in discussions around client-server configuration, and perhaps share how we went about it. It may also overlap with ideas about how we might accomplish re-branding of a common ODK Collect client for different 'consumers', eg KoboCollect --> ODK Collect... [we - aka GoMobile - uses the same API to communicate both client-side default config settings as well as council branding info: eg council logo, name, color palette, operating hours, etc].

1 Like

Sounds great! Do you want to share documentation for your existing API and settings keys?

I'll hopefully get a chance to do a more complete writeup soon but here is a taste of the challenges:

  • client settings have not been standardized so Enketo and Collect (and likely iXForms) have some settings that do the same things but with different names.
  • worse, it's likely that clients have slightly different takes on similar settings concepts so we have to decide what to do with those (keep both, try to design a middle ground, etc)
  • Collect has an admin/general settings split which I don't think other implementing clients have
  • users can specify a server in Collect while offline and this is relatively common. So there's some design work to do on when settings get pulled. I assume you do something like pull them on successful login?

None of these are insurmountable and certainly knowing that there's energy behind getting this done is really positive.

For those who want to follow along, we have a user-facing feature description and a new proposal for adding a client-configuration endpoint.