OpenRosa spec proposal: add optional /client-settings endpoint

LN · February 26, 2020, 9:09pm

This is a proposed optional extension to the core OpenRosa standards for OpenRosa-compliant servers to provide client settings. This is in response to ongoing user requests to configure a fleet of ODK clients remotely and an alternative to the rejected matchExactly proposal.

The only component of this extension is a /client-settings endpoint. A GET request to this endpoint with OpenRosa headers results in a response with OpenRosa response headers and a JSON body. ~~It does not use the OpenRosaResponse XML envelope.~~ (edit: not relevant)

The response body is JSON. There are at most two top-level objects with keys "general" and "admin". Each of these objects has a set of optional properties.

{
  "general" : {
    "protocol": {"odk_default", "google_sheets", "other"},
    ...
   },

  "admin" : {
    "admin_password": Boolean,
    ...
    },
}

The general and admin split is because Collect lets users set an admin password and disable some subset of functionality. Having two separate objects makes it possible to use the same key for the actual setting and whether or not it should be enabled. This also matches the structure of the existing Collect settings QR code. Clients could choose to support some subset of the optional settings keys.

I’ll create a thread to standardize settings keys and values since that feels like a separate discussion from how the data would be exchanged.

Similar to the formList endpoint, the standard would leave it up to servers how to generate this settings file. For example, it may be up to a user to log into the server and select from a series of named configurations or a project administrator may impose a configuration on all users collecting data for that project. This will depend on the design of the server. CC @Ukang_a_Dickson

The standard would leave it up to clients when (if ever) to actually fetch those settings. This is not a trivial question for Collect to answer since it is never guaranteed to be online.

My current thinking is that in Collect, when any change is made to server URL or credentials, a settings fetch attempt would be made. If that attempt failed because of no connectivity, Collect would try again the next time it got connectivity. It would also try to fetch settings every time a form list request was made. The Last-Modified and If-Modified-Since headers would be used to avoid unnecessary transfers.

I’m looking forward to seeing what others think, particularly @Xiphware and @tomsmyth who provided valuable feedback on the previous related proposal. Some guiding questions:

Should OpenRosa headers be used?
Are we ok with a JSON response ~~and ditching the OpenRosaResponse XML envelope~~ (edit: the envelope is only relevant for transactional APIs)?
Is client-settings a reasonable name?

tomsmyth · February 28, 2020, 3:59pm

THIS IS SO EXCITING! Thanks Helene!

I'm wondering if it's worthwhile doing OpenRosa headers when they don't add much and I'm thinking out strategy should be to call this part of a new API standard that wraps OpenRosa rather than extending it. More on that in a separate post... But yeah let's nix the OpenRosa stuff?

By extension of the above, yes I think it's fine to go with JSON.

I like client-settings if we are thinking of a "client settings" as a singular resource. The word "settings" is always confusing there because it is both a collection of individual settings and a singular resource (the collection).

Using dashes is a break from OpenRosa which used lowerCamel. But I'm fine with that. Also Central appears to use dashes, as does NEMO.

Thanks for matching the QR code structure. That's a big win.

Does this mean the spec should also require support for a HEAD request? Or is the idea that Collect would GET the whole thing but only apply the settings if they had changed? Kind of confused...

I think working out when Collect asks for settings is a separate discussion. But one thought did come to mind that could have impact here: folks may end up using both this feature and the QR code feature, right? The latter to initially config the device with just the server and credentials, and then relying on settings push from then on? This is intricate but I think it makes sense.

Another possibility is that an org may have a whitelabeled build of Collect that includes a certain default server and initial, generic "bootstrap" credentials. When Collect is first installed or opened it may call the server to get its proper settings including the correct username and password for an individual enumerator the server selects. This could actually be a very powerful way of setting up a large fleet of devices without having to scan barcodes over and over again. All you'd have to do is install and run the app. The server could e.g. decide to associate phone #123 with user X, phone #124 with user Y, etc., and provide some really nifty web-based UI for showing how all the devices have been allocated so they can be labeled and handed out. But this would require the incoming GET request to identify the device somehow. So can we include the new Collect: style device ID in the GET request params? Compliant servers can then decide to use it or not.

Another option would be to lean into REST and treat the device ID as the identifier for the resource, so like /client-settings/Collect:ABCDEFGHIJKLMNOP. This seems semantically more correct than putting it as a QS param, because it's not really a filter. Servers not using the device ID can just ignore it and return the same settings for each device.

Anyway lots to be discussed here but yeah, super exciting, and thanks a ton!

LN · February 28, 2020, 5:20pm

That sounds fine to me. Somewhat related, I thought about identifying or versioning this API in some way but I don't think it's necessary. I think that as long as we can agree on a fixed format for the returned JSON and that we say that all keys are optional, clients just have to be resilient against keys they may not be aware of.

Perhaps client-configuration is more clearly a singular resource?

No. A server adds the Last-Modified header when it serves a resource. The client then puts that last modified date it received in If-Modified-Since request. If the server doesn't have a newer resource to provide, it responds with a 304 (not modified). More on MDN.

Agreed. We just have to make sure that we don't design a beautiful API that doesn't make sense for any of the existing clients.

Yes, absolutely.

I think this is a great concept and agree that just as servers use the deviceID when receiving submissions they may want to use it here. Both auth and deviceID may be used by servers to make decisions about which configuration to send. Because there are both of these types of information possibly available, deviceID does feel more like a filter to me so I'd tend to put it as a query parameter. The other advantage to that approach is that it matches the OpenRosa endpoints.

tomsmyth · March 2, 2020, 7:11pm

I like!

That's fine, I don't feel strongly about it.

Thanks for the replies.

LN · March 13, 2020, 5:31pm

Here is a revised proposal taking into account @tomsmyth's excellent comments:

This is a proposed optional extension to the core OpenRosa standards for OpenRosa-compliant servers to provide client settings. The only component of this extension is a /client-configuration endpoint. A GET request to this endpoint results in a response with a JSON body.

The JSON body contains at most two top-level objects with keys "general" and "admin". Each of these objects has a set of optional properties.

{
  "general" : {
    "protocol": {"odk_default", "google_sheets", "other"},
    ...
   },

  "admin" : {
    "admin_pw": String,
    ...
    },
}

The general and admin split is because Collect lets users set an admin password and disable some subset of functionality. Having two separate objects makes it possible to use the same key for the actual setting and whether or not it should be enabled. This also matches the structure of the existing Collect settings QR code. Clients could choose to support some subset of the optional settings keys.

See also Proposal: publish a settings key/value standard.

Similar to the formList endpoint, the standard would leave it up to servers how to generate this settings file. For example, it may be up to a user to log into the server and select from a series of named configurations or a project administrator may impose a configuration on all users collecting data for that project. This will depend on the design of the server.

The standard would leave it up to clients when (if ever) to actually fetch those settings.

Remaining questions:

Does this make sense for Enketo (@Martijnr) and iXForms (@Xiphware)? For Enketo, I imagine it would be requested when form resources are fetched from the server. I believe iXForms has a required login so that would be a natural time to make the call.
Is the general/admin structure ok even if it sounds like Collect may be the only client that uses admin?
Is it ok to say all keys are optionally supported by clients? This puts a burden on users of verifying which settings are supported by which clients. It could be confusing if a user sets some keys, uses both Collect and Enketo, and only some subset of keys are honored by each. I'm thinking that we should add that clients must show the user a list of unsupported keys in the configuration.
Should this spec say anything about whether clients should continue to allow manual changes to settings after having pulled a remote configuration or about periodic polling? Or is this up to each client to define and document?

Xiphware · March 13, 2020, 11:10pm

At the risk of starting a REST vs GraphQL holy-war (and I wont say which side I'm on)...

Given there appears to already be a desire to allow the client to retrieve subsets of settings, and I could envision that with a potentially diversity of clients, there could potentially be a large diversity of client-specific settings. This sorta situation is where a GraphQL approach - whereby the client can convey via the GraphQL API precisely which settings they desire - might be preferable to a REST API where you pretty much get everything dumped into the REST JSON payload whether you want it or not. And GraphQL tends to be a bit more amenable to incremental revisions (which may make adding misc settings over time smoother), as opposed to REST's more typical abrupt version changes (which require either changing HTTP headers to indicate version, and therefore implementing these headers server-side, or changing the endpoint path to include a version).

Anyway, I'm not suggesting we must necessarily dump REST in favor of GraphQL for settings, rather just raising something to perhaps think about API-wise before we commit anything to stone.

[in truth, I'm more a REST aficionado than GraphQL groupie ]

martijnr · March 17, 2020, 3:13pm

Thanks @LN! I wanted to give some initial feedback, but I am still trying to form opinions.

Yes, and when Enketo checks for a form update. Makes sense.

I don't completely understand the need to maintain this separation in this new spec. If an admin setting disables something but this conflicts with a general setting that enables it, it would still be the responsibility of the client to know that the general setting should be ignored? Couldn't we let the server have the responsibility to never serve conflicting settings? Or is this separation really related to still maintaining local device overrides by users (that could partially be disabled with an admin setting)?

Maybe we should make a distinction between required and optional settings in order for a client to claim compatibility. UI stuff optional and anything dealing with logic required perhaps? I imagine it will be difficult to determine this for some settings.

I think it may be good to describe that.

LN · March 17, 2020, 6:38pm

I think that the more we can avoid this the better but I also recognize that it's going to be tough given the different mental models that the different clients have. Ideally we can at least document official keys and have a process for proposing and approving additions. Naturally, nothing would stop "rogue" clients and servers for going outside of that list.

I see what you're getting at but I'm not really sure that it would be a good thing for the clients to get only the subset that they care about. There's benefit to having a smaller response payload but the cost is having a request payload. That payload will need to include all of the settings that the client supports even though the server might only need to specify a couple of settings.

What I'm more concerned about is letting somebody know that settings that were specified were ignored by the client. That could even be between versions of the same client. Let's say ClientX v1.2 has awesomeSetting but ClientX v1.1 does not. I think we want someone to be able to tell that awesomeSetting was not applied. Having the client only request the settings it cares about doesn't help with that.

There's a lot going on here. I want to be clear that I'm still forming mine as well and everything is up for discussion.

Yes, exactly. A general setting provides a setting value. An admin setting determines the visibility of its setting/feature. For example, you can set auto-send to true in general settings and then set the corresponding admin setting to false. That means submissions are autosent and the setting is not visible to a user without the admin password. On the other hand, if the admin auto-send setting is true, any user can override that setting.

I think it definitely makes sense to include admin settings in QR code configuration since that generally happens once. You could imagine giving e.g. supervisors the admin password and letting them modify some settings in the field if needed.

User override is much harder to reason about and may not make sense in the case of remote settings management. As you say in your last point, this spec should probably make a statement about whether they're allowed.

I'm guessing that for Enketo it makes perfect sense to just not accept manual settings changes for the current form if remote settings are received. It's a little tricker for Collect because the server/username/password must be reconfigurable somehow or the app ends up forever 'locked' to a specific server unless that server provides the URI and credentials settings for a new server. At the same time, we don't want just any enumerator to be able to reset server and credential settings. Here are two ideas:

Don't include server and credentials in these remote-configurable settings. I don't think they would be useful for Enketo anyway. For Collect, they need to be provided outside of a settings API initially anyway to boostrap. Only include a single admin setting: admin password. It only governs whether server settings can be changed. If there's no admin password, server settings are always available to change (basically logout/login). If there's an admin password, the password is required to change server settings. No other setting is available to change unless the server is changed to one that doesn't provide a remote configuration.
Include server info and credentials. Same thing with the admin password being the only admin setting and governing access to the server and credentials settings. The advantage here is that it makes changing servers or projects much easier. Without those settings in the configuration, switching to new server information e.g. at the start of a new project would require physical access to devices.

I agree with the sentiment. As you say, it's hard to reason about. For example, whether or not drafts may be edited can make a big difference in the meaning of the data collected. I'll try to make a shortlist of keys that seem to change the meaning of data collected in that way when I revisit Proposal: publish a settings key/value standard.

Xiphware · March 18, 2020, 3:17am

I think we may want to tread carefully here, to not intermix server-side config settings with client-side authentication... Admin/root privileges on any client, however they are acquired, typically let said privileged user pretty much do whatever they want (!). Whereas I can see, in the general case, some settings in general may support the (regular) user overriding them (eg show/hide hints ie read/write, whereas others may be strictly read-only (eg branding), and still others potentially both depending on a particular deployment (eg delete-after-submit you may want to enable user to change or not).

So I'm wondering if 'general' vs 'admin' might fundamentally be read-only vs read/write?

I do think that it could be useful for generic settings to have either an implicit (ie defined in the Settings API specification, against which a client implementation can claim 'conformance' depending on whether it obeys accordingly) which settings are user-modifiable and which are not, for regular non-privileged users. Or alternatively conveyed explicitly as additional per setting metadata. (I dont know which approach would be best). Thoughts?

I'm not sure I fully understand the usecase here. Features, and their associated settings, evolve over time, and it is always going to be the case that older versions of something wont support some features that newer versions do; indeed, the older version wont even have any comprehension of them. This seems it would be like MS Word 10.0 telling a user that it doesn't support X, Y, Z features of MSWord 16.0, which seems rather odd... Can you think of a case where the actual older version of a client explicitly report things they dont support which later version does?

LN · April 23, 2020, 7:05pm

How the world has changed since we started these discussions! I believe everyone in this thread has been involved in the COVID-19 response or the getodk migration. Thanks for your efforts on both.

Here are some things I've done related to this:

started designing a profile/multitenancy concept for Collect with @seadowg. We think that a server managing settings for a specific profile makes a lot of sense. The profile concept solves a lot of user issues so we want to make sure that what we come up with here works well with it.
iterated on the Collect-only setting that would achieve the desired formList matching behavior. That was the trigger for this conversation and we don't want to lose sight of it. The sooner we can get this feature to users in a manually-configurable way, the better it will be once servers can mandate it. Preliminary update at Have Collect exactly match the forms on Central - #8 by LN with more next week.
learned more about and used Enketo settings. Like I said before, I want to reconcile the Collect and Enketo settings as much as possible and I hadn't previously had much experience with what is configurable on Enketo. So may of the Collect settings have to do with multiple form management so really aren't relevant to Enketo which deals with single forms. I'm still working through how best to segment these.
reviewed Collect settings and started working towards deprecating some. Related to the previous point, I think it's important to keep the settings list to what is truly useful. I've put analytics on some settings that I figured probably weren't often used and we have some good candidates for deprecation.

Xiphware · April 23, 2020, 8:41pm

Sounds good, I cant wait to see more... Do you think it would be useful to discuss anything on next week's TSC call? Or wait a bit for it to 'firm up'?...

I think we still have a clear agenda [hint, hint, @yanokwa ]. And since I'll be too busy taking notes to keep interjecting, it might even not get derailed this time, haha!

LN · April 23, 2020, 8:47pm

Yes, I do think so! We're aiming to have something concrete around the Collect setting out early next week. I nudged @yanokwa accordingly and I think that will show up on the agenda shortly. Hopefully @seadowg will be able to attend as well.

seadowg · April 24, 2020, 10:44am

@LN I'll be able to make it!