OpenRosa spec proposal: add matchExactly attribute to form list response

LN · November 4, 2019, 4:42am

In support of the user-facing feature described at Have Collect exactly match the forms on Central. The high level goal is to give servers a way to indicate that the form list they provide should be matched exactly by the client. All forms in the list should be downloaded and any other forms the client knows about should become inaccessible to the data collector.

This is a proposal to amend the form list response document specification with the following:

“if a matchExactly attribute is provided on the root <xforms> form list node, with a value of true, the client MAY treat the provided list of forms as the exact set of forms and versions that should be available on the client.”

The specification has, as far as I know, never been amended. Because this is purely additive, client-optional and doesn’t affect clients like Enketo that don’t have a landing page for locally-available forms, I think it would be ok to add while maintaining the OpenRosa API version at 1.0. I’m interested to hear what others think.

Xiphware · November 4, 2019, 7:35am

This doesn't feel quite right... The form list returned by Central is, in my view, fully specified: it is the response to a well-defined REST query and represents the definitive list of available forms, and those for which submissions will be accepted (and by implication, anything else will almost certainly be rejected). What the client should do if it just so happens to have obtained other forms (from elsewhere?) - or otherwise unknown to the Central server - is completely a client-side configuration option IMO; not something the Central server that you happen to be talking to should dictate (or even care about...)

The need for a matchExactly implies that the original REST operation used to obtain the list of forms is inadequate in some way.

[REST or OpenRosa for that matter, depending on which API you use to get the list. Either way, it still represents the definitive list in response to the original query].

seadowg · November 4, 2019, 7:43am

Yeah I think I agree with @Xiphware here. In my head it's also simpler for the client to be able to make the decision about whether it's going to pull down all the forms/delete "old" ones up front rather than parsing the response and then making that decision.

Are there reasons for this not to be a client side configuration in Collect?

LN · November 4, 2019, 5:21pm

Thanks for the thoughtful feedback, @Xiphware and @seadowg! I can't say I disagree with y'all.

The primary motivation here is giving project managers more control over what enumerators experience without needing to have physical access to the devices. One piece of feedback we routinely get is that forms that need to be filled out often change within a training period or across campaigns and that having clients be out of sync with what the project manager has set up is confusing at best and leads to incorrect data at worst (more in the user-facing feature description). Paging @Ukang_a_Dickson @Tino_Kreutzer @tomsmyth to see if users of alternate servers provide feedback on this pain point as well.

Again, the idea behind letting the server dictate this is that it gives the project manager more control from the server interface that they're already familiar with.

If a project manager needs to manually change client settings, project setup becomes a lot more time-consuming and error-prone. The eventual goal is "a single button push that will sync forms, submissions, and settings" (from What's coming in Central) but there are a few challenges with getting there:

there is no specification that unifies client settings. That is, Collect and Enketo have their own configurable options but there has been no effort to match their naming or behavior
different data collection campaigns may require different settings. Doing this right may require some kind of project sandbox concept on the client side. That is, something like clients being able to switch between different groupings of forms and settings

The idea with something like this matchExactly proposal is that it's a step to giving project managers more control that is straightforward for any client and server to implement.

That said, Collect can be configured via a QR code and in fact Central currently must be configured that way because of the way auth works. Another option could potentially be to introduce a client-side setting like @Xiphware and @seadowg have described and to include the setting in the configuration QR code.

If you have other ideas for satisfying the need of project managers to set exactly what forms are available on clients that have offline form lists, please share.

Xiphware · November 4, 2019, 8:56pm

I think this is a very legitimate usecase. Indeed, I do precisely this with our solution (GoMobile); basically the mobile client app first loads a bunch of (council specific) server-side settings when you login: eg logo, branding colors, auto-generate PDF on submission, etc. However, this is a separate - and largely client-specific - API, which is entirely distinct from the generic and standardized XForm REST/OpenRosa APIs load used to retrieve form lists, downloading forms, etc.

There is probably a very good usecase for a mechanism to communicate client-side settings, but I think these are quite distinct from and ideally should not 'contaminate' what should remain clean, well-defined REST/OpenRosa APIs for the more fundamental XForm operations.

eg this is some of the config we fetch (as JSON blob):

backgroundcolour = EEEEEE;
controlcolour = 346BC3;
fontcolour = E6E6E6;
gradientendcolour = 03562C;
gradientmidcolour = 03562C;
gradientstartcolour = 03562C;
titlecolour = 185563;
toolbarbackgroundcolour = 03562C;
autodeleteaftersubmit = false;
loadprevious = true;
reinspect = reinspect;
status = status;
showtimetable = true;
start = "7:00";
end = "19:00";
"council_name" = "GoGet City Council";
isTest = 0;
"logo_url" = "https://dl.dropboxusercontent.com/u/30978741/logo2.gif";
"min_supported_version_android" = "0.3.0.0";
"min_supported_version_ios" = "0.9.0.0";
"min_supported_version_winphone" = "0.0.0.0";

[as you can see, very client-specific...]

If in reality this is the goal here - to perform the equivalent of a config'ing Collect with a QR code when they connect to a Central server - then perhaps that should be exactly what we do: an (new) simple API on Central to retrieve the QR code config data? eg it could be returned as an additional field along with the authorization token? or it could change in response to client identifying itself in the HTTP header (ala User-Agent used for browser detection)? ... (obviously excluding the server url/username/password that an actual QR code can include)

issa · November 6, 2019, 12:17am

i think my defense of the proposal as it stands is that the flag is not a command so much as it is a semantic declaration of the meaning of the provided data. in this sense, perhaps matchExactly is not the correct name as the term should be something declarative.

but i think the goal behind the proposal as-offered is:

allow the server to express the semantic intention that the provided list of forms is to be interpreted as the canonical full set of forms the client is meant to use, rather than as a catalogue of forms the user can pick and choose from. whether the client chooses to honor this intent can be implementation- and situation-specific.
do so in a way that minimally impacts the existing ecosystem.
do so without requiring us to answer much more difficult questions like "what would a client configuration specification look like?"
as @ln suggests, allow greater and finer control over rote administrative tasks from the server, and in a manner that offers much greater peace of mind that the administration is airtight.

issa · November 6, 2019, 12:36am

i have replied to the feature thread accompanying this specification proposal with some context that might lend insight to the suggested direction.

tomsmyth · November 6, 2019, 3:38pm

This makes sense to me. I think it's quite imaginable that an enumerator could end up with a bunch of old forms on their device if they've been at it for awhile. I don't see this as a deficiency of the API or anything. It's just the way things have worked up until now. You periodically fetch new forms and they stick around on your device even if they go away on the server. Adding the ability to semantically declare that this is not the desired behavior makes sense.

As with spacebar heating, I bet there are people out there who also rely on the old workflow... perhaps some folks fill out forms from two different servers on one device and toggle their server settings back and forth and don't want to have to re-download forms... So getting rid of the old behavior is probably not an option?

LN · November 6, 2019, 5:08pm

That's a good point. I haven't come up with something I love yet but some alternate attribute name ideas that might change the way @Xiphware and @seadowg are thinking about this are authoritative or complete.

I think this is big and points to a challenge we keep running into. While it makes total sense for @Xiphware or maintainers of other forks/unofficial tools to introduce a client-specific API and settings keys, it's the kind of thing that doesn't feel appropriate for Central or Collect to do. In particular, users do seem to mix Collect and Enketo in the same deployment. I understand what @Xiphware suggests with something like a User-Agent key but that pushes the burden of maintaining several disparate client configs to Central. It therefore feels like we should make an effort to unify possible settings as much as possible before moving forward on an API and that this API should include consultation with other server creators. This means we have no control over the timeline.

This proposal is compatible with an eventual new API for settings and provides significant user value on its own, including to users of other clients and servers because it's really straightforward to implement (or ignore).

And is it something that you imagine nemo having support for?

Agreed.

tomsmyth · November 6, 2019, 5:29pm

Yes I'd there is a good chance of that.

issa · November 6, 2019, 9:11pm

I understand what @Xiphware suggests with something like a User-Agent key but that pushes the burden of maintaining several disparate client configs to Central.

i think my issue here is not so much burden as it is unpredictability and quirk fragmentation. the whole client config area of things feels like the sort of thing that will just end up being really messy and difficult for users to reason about unless somehow they can be really nailed down and standardized, which itself seems like a very difficult proposition.

otherwise the only reasonable approach is to just allow a bunch of arbitrary k/vs and force the user to read the documentation and leverage correctly for their particular client. i also vote yuck on that.

on the other hand, on the protocol side rather than the content side, the original thought by @xiphware:

If in reality this is the goal here - to perform the equivalent of a config'ing Collect with a QR code when they connect to a Central server - then perhaps that should be exactly what we do: an (new) simple API on Central to retrieve the QR code config data?

i think is a sensible option to offer; configure over web rather than by qr code if you'd like.

Xiphware · November 6, 2019, 11:29pm

I think this is what is making me most uncomfortable, and why I made the passing comment that "The need for a matchExactly implies that the original REST operation used to obtain the list of forms is inadequate in some way."... I dont actually believe there is any semantic ambiguity to OpenRosa's formList (or xFormList) API. Its well-defined as returning the definitive list of available forms, and (ignoring things like user roles, form status, etc) the definitive list of forms for which the server will accept submissions. Upon a client calling this API they know a priori exactly what they are getting; that is, this list is already both semantically authoritative and complete.

I do feel the real (and useful!) usecase of introducing something like matchExactly isnt so much for sematic rigorness, but rather to convey a client-side directive to purge everything from the client UI not in this list. This feels like a client-specific, and quite possibly session-specific, configuration setting and is not a semantic clarification necessary to correctly interpret the response data.

Controlling how a client should present form data is a very good usecase, but I feel this should be kept distinct from the more fundamental, open APIs around what data the client should ~~present~~ consume. Inter-mixing the two indisciminantly could lead to an increasingly less 'open' API that becomes highly customized to a specific client-server implementation. I would be far more receptive to remotely configuring client behavior - eg only display submittable forms, delete after submitting, generate PDF, etc - via a separate API or similar mechanism, and keep the actual data APIs minimalist, generic, and distinctly open.

issa · November 7, 2019, 2:59am

i think i would argue the following things:

my goal is not semantic rigor, but rather semantic expression.
arguing that the present API is already authoritative or complete is to argue that the only semantically reasonable way to present the formList is as a menu of every possible thing the server would ever accept, and that this is well-suited to all implementations. from my conversations with users, it is not.
i think the approach of using a separate configuration specification is dicey: i have no guarantee as a project implementer that a client is actually using the correct configuration relative to the particular formList that it is accessing. having the two resources be completely orthogonal to each other invites air gaps for deployment failure and we are back to where we started.
i also think everything to do with a client configuration specification is more risky to the openness of the API because having attempted to do the homework on this i have found no remotely satisfactory way to describe such an API without mandating proprietary client behaviour across the ecosystem. i think if this is the approach you would like to push, i would like to see what you have in mind in concrete terms.
wrapping back around to #1, i explicitly specified "MAY" in the proposal because again, i think even in the case of Collect there may be uncommon cases where whatever this flag is called ought to be ignored. it is up to the client to determine the client's behaviour in response to the information. relatedly i don't think in my head i would describe the goal here as "controlling how a client should present form data"; i think it is "describing the meaning of the formList given."

edit: i am just beginning to understand that perhaps your intent here is to say that a client configuration API should explicitly not be part of any standard openrosa specification, but rather in our case for example a proprietary Collect/Central API that we work out internally. i guess i would have no problem with this because it would allow us to move forward, but i feel like it's counter to the spirit of the open API?

Xiphware · November 7, 2019, 3:41am

For general reference (and not arguing one way or the other), here's the relevant formlist spec:

Form List API

This standard specifies how clients discover a list of available blank forms on a server.

Discovery Request

The discovery request should be sent in compliance with the HTTP 1.1 protocol.

If a server will filter the set of forms based upon the user's identity, then the server should require that the user be authenticated through either the Authentication API or through an alternative authentication mechanism. The server can then make use of the user's authenticated identity through those mechanisms to filter the set of forms to be returned.

The device will make a discovery request to a configured URI with a single query parameter, the deviceID. The deviceID should be the same id as provided by the default population mechanism defined in the Metadata Scheme. The server may filter the set of forms returned using this information.

Together, the authentication and deviceID enable a server to tailor the set of forms to both the user and the device (and therefore the device's capabilities).

Query Parameters

Optional query parameters MAY also be supplied:

formID If specified, the server MUST return information for only this formID.

verbose If specified with the value true, the server MAY include a or element providing a longer description of an XForm.

listAllVersions If specified, provides a listing of all hosted versions of each form (including the element) in the response document (see below).

...

I've highlighted what I think may be of interest to this discussion.

Is the concern that the formList returned is still excessive (even after filtering on user and/or device) and perhaps needs to be filtered on additional dimensions?

issa · November 7, 2019, 6:30am

no; the concern is described here.

central already provides the exactly list of forms that the device is intended to carry. the issue presently is that because the only mechanism for describing "a list of forms" only allows the expression of "available blank forms to choose from" as you highlight, there is no way for it to declare anything other than "here are some forms maybe."

the goal here is to provide a way for a server to say, "here are the exact forms this server's administrator expects to exist on the device." whether the device honors that information or not is up to the client software.

LN · November 15, 2019, 10:24pm

@Xiphware, I'd love your thoughts on @issa's latest response.

I see what you're getting at. I think that with the current ecosystem members, a small, optional addition to an existing API is much more likely to get adopted than a whole separate API. I think we do want to design that API for broader configuration in the near term but it's likely that it will remain fairly niche. In that sense, the former feels more 'open'. So basically I agree with @issa's point 4 above and would be interested in a concrete example of how a settings API could be more open.

Xiphware · November 15, 2019, 11:45pm

I think adding an optional flag on the form list as a quasi-directive to tell the client - specifically ODK Collect - to flush anything not in the list is a relatively lightweight and unobtrusive means to solve an immediate (and significant!) pain-point for ODK tool-stack users.

The only readily obvious alternatives would be trying to come up with a whole new spec around remote client configuration (a lovely concept but... OMG, where to start!?!), or perhaps adding a brand new config option to the base ODK Collect build that, by default (ie unless explicitly disabled by the user), performs this flush automatically whenever you refresh a form list. But anything that auto-deletes stuff is extremely dangerous and has to be VERY carefully thought thru (whist survey deployments continue to suffer having enumerators inadvertently filling in out-of-date/deprecated forms...)

So yup, this is OK. Pragmatism Rulez!

There's probably a subset of common (but optional) client-side workfows that we could come up with that could be candidates for remote config. eg delete submissions upon successful upload, disable validation, treat all XPath references precisely as specified ( )... yadda, yadda. But maybe that's something the respective client-owners could flesh out (over a beer) and we go from there?

LN · November 16, 2019, 2:58am

So maybe the next step is to draw TSC attention here and see if anyone else has other comments? Then perhaps it can briefly be discussed at the next meeting.

That's sounding pretty great! Let's see where we go with this and then schedule the international beer session.

Xiphware · November 16, 2019, 7:24pm

Is there an existing github issue I can refer to? (Presumably ODK spec?). I can open a roadmap issue referencing it, and these associated forum discussions, and put on next TSC agenda to review. [I figure we might as well follow the process we’re supposed to be following... ]

Btw is matchExactly (or what-have-you) a Boolean attribute, ie can also be false?

LN · November 19, 2019, 7:49pm

No GitHub issue yet. It looks like at some point @yanokwa added https://github.com/opendatakit/roadmap/projects/1#card-28845254 which seems sufficient to me.

It might be good to have at least a note about pushing client configuration from servers in the proposed column.

I think the allowed values for the attribute would be true and false. I still have a slight preference for matchExactly but I do see an argument for authoritative or some other declarative name.