Form spec proposal: support Entity updates from multiple offline clients

Starting with Central v2023.5, it has been possible to update Entities from form submissions. As part of that work, we introduced ODK’s model for handling multiple Entity updates based on the same version: all updates are applied in the order that they are received and Central detects and displays conflicts where they can be resolved or dismissed. You can read about this approach in the original proposal and in the documentation.

Conflict detection is based on an incrementing integer “base version” that each Entity update includes. This represents the Entity version that the update is intended to be applied to. In our original proposal, we wrote:

Sequential integer versions are easy to work with but are not guaranteed to be unique across clients and don’t uniquely identify the contents of updates. That means they do not work well in contexts where multiple offline clients are likely to make several updates AND submit those updates at the same time. For example, clients A and B get EntityA version 1. They both go offline and generate versions 2, 3, 4. If they then submit at the same time, the server may interleave versions from the two clients leading to a difficult history to understand. We believe that this kind of scenario is rare.

As we are getting more deeply into the offline entities implementation, we have decided to better handle this case:

  • We think it will be somewhat common for a server-based edit to happen while clients are offline which can also lead to confusing history if not detected
  • We initially thought we would prioritize assigning Entities to specific users and strongly assume that any given Entity would only be interacted with by one Collect user but we are now hearing of more use cases that involve different individuals interacting with the same Entity
  • We would rather have all of the information to make it possible to reconstruct full history even if we don’t display it in Central right away

This is in line with some thinking that we shared last October.

:zap: Next steps:

  • Post a companion proposal for extensions to the OpenRosa spec
  • Schedule a call for anyone interested to discuss these proposals
  • Continue exploratory implementations to further validate the proposal

The concept: keep track of offline branches

We propose adding the following concepts:

  • Branch ID: a unique identifier (UUID) grouping together submissions that update the same Entity while offline. When a client receives a server update that applies to a specific entity, it generates a new branch ID which gets used in all update submissions until the next server update. This then makes it possible for the server to differentiate between sequences of updates from different clients or made at different times from the same client and to guarantee that submissions within a branch are well-ordered.
  • Trunk version: the last Entity version that the client got from the server. This stays the same for every update made offline whereas base version is incremented by 1 with each update. For the first update in a branch, the trunk version and base version are equal. This information will make it possible for Central to detect if a whole branch was based on an older Entity version.

The branch ID would be included in submissions that result in Entity creation or update and the trunk version would be included for Entity updates only.

We propose using a trunk/branch metaphor that will be familiar to users of version control systems. These concepts will not be exposed to end users who will continue to see a linear update history (at least initially).

XForms spec

We propose making the following additions to the XForms Entities update spec:

  • MAY have a branchId attribute which is populated with a UUID generated by clients with an offline Entity representation.
    • Clients that do have an offline Entity representation MUST keep the same branchId for a given Entity for an entire offline update sequence. They MUST generate a new branchId for a new update sequence after receiving a server update.
    • When a branchId attribute is present in a submission, servers SHOULD attempt to process submissions with that same attribute in the correct order
    • When a branchId attribute is present, a trunkVersion representing the latest version received from the server MUST also be present.
    • When a server receives values for branchId and trunkVersion, it must send those values back to clients as __branchId and __trunkVersion when clients get updates for that Entity.

Adding these new concepts will result in an ODK XForms specification version update as described here. This means older versions of Collect and Central without offline Entities awareness will not accept forms with this new functionality. Clients like Enketo that are not Entity-aware will continue to work as they do now (without an offline representation of Entities).

These changes will be introduced as Entities spec version 2024.1.0. Clients should only apply creates and updates offline for forms with that spec version or higher.

XForms example

<model odk:xforms-version="1.0.0" entities:entities-version="2024.1.0">
  <instance>
	<data id="mysurvey" orx:version="1">
      <meta>
        <instanceID/>
        <entity dataset="trees" id="" update="" baseVersion="" trunkVersion="" branchId="" />
      </meta>
	</data>
  </instance>
  ...
  <bind nodeset="/data/meta/entity/@id" type="string" calculate="/data/tree" />
  <bind nodeset="/data/meta/entity/@update" type="string" calculate="true()"
  <bind nodeset="/data/meta/entity/@baseVersion" type="string" calculate="instance('trees')/root/item[name=/data/tree]/__version" />
  <bind nodeset="/data/meta/entity/@trunkVersion" type="string" calculate="instance('trees')/root/item[name=/data/tree]/__trunkVersion" />
  <bind nodeset="/data/meta/entity/@branchId" type="string" calculate="instance('trees')/root/item[name=/data/tree]/__branchId" />
</model>

The XForm for an Entity create would not include any of branchId, baseVersion or trunkVersion.

No longer planned -- XLSForm: Explicitly opt out of offline Entities

Given the XForms spec proposal above, it's possible to opt out of offline Entity support by omitting the branchId attribute. We could expose this capability in XLSForm. This can be helpful if:

  • We’re not confident all users will be able to update to a version of a client with offline Entity support
  • We know we’re going to be online most of the time and don’t need clients to store state offline
  • We’re using a mix of clients, some of which may not have offline Entities, and we want them all to behave the same way

We're not sure how significant these considerations are so are currently leaning towards NOT exposing this. One of our goals in sharing this proposal is to get feedback on whether this seems important to do.

If it doesn't serve immediate user need, we would rather omit it because each configuration point adds complexity and opportunity for bugs to be introduced.

If we do end up wanting to make this configurable, here is our proposal:

We propose adding this as an XLSForm setting on the entities sheet. This means that when we support creating or updating multiple Entities per form, whether creates and updates are applied offline would be controlled independently for each affected Entity List (and each form since this is a form-based setting).

We propose introducing an offline column to the entities sheet with allowed values ‘yes’ and ‘no.’ When the value is ‘no,’ the trunkVersion and branchId attributes will be omitted in the converted XForm. For now, the entities-version attribute will be set to 2023.1.0 for compatibility with older clients.

Entities sheet

list_name entity_id offline
trees ${tree} no

XForm

<model odk:xforms-version="1.0.0" entities:entities-version="2023.1.0">
  <instance>
	<data id="mysurvey" orx:version="1">
      <meta>
        <instanceID/>
        <entity dataset="trees" id="" update="" baseVersion="" />
      </meta>
	</data>
  </instance>
  ...
  <bind nodeset="/data/meta/entity/@id" type="string" calculate="/data/tree" />
  <bind nodeset="/data/meta/entity/@update" type="string" calculate="true()"
  <bind nodeset="/data/meta/entity/@baseVersion" type="string" calculate="instance('trees')/root/item[name=/data/tree]/__version" />
</model>
2 Likes

Notes from 7/19 specification call:

  • Collect auto-send will do its best to send submissions in order but server processing could happen out of order or someone could use manual submission (not recommended)
  • These spec additions only matters for scenarios in which users are making several stacked offline updates to the same properties AND need the server Entity to be correct (sometimes they just need the info while offline to drive their workflow)
  • Servers could choose to ignore branchId and trunkVersion initially or possibly forever
  • Central is planning to implement a queuing system to process Entity updates within a branch in order but will only minimally surface the concept in the UI

Looking at how this would work in Collect, I ended up wondering about two things:

  1. Should the spec define branchId for create similarly to id and enforce that the form populates it with a UUID? This would feel more consistent and would prevent a client from having to directly manipulate an attribute within the submission. Like with id, we could have XLSForm handle generating a convenience calculation for this.
  2. Should the client update its local representation of branchId and trunkVersion based on an update submission, or should the client be the source of truth (and ignore the values in the entity node)? I'd lean towards the former here as it (again) feels more consistent, although it does mean there's more risk of badly behaved forms creating weird states (a form that always sets branchId to "blah" for instance).

@LN and I discussed these questions on a call:

We're actually going to experiment with an implementation that doesn't include a branchId on the entity node for creation forms as it doesn't seem needed for the server or the client. We'll continue to discuss an alternative way to provide an opt-out.

The client should be the source of truth for __version, __branchId and __trunkVersion. These values should be stored and updated regardless of what their corresponding entity node values (baseVersion, branchId and trunkVersion) end up being. This is only really important for non-spec compliant forms, but is important to define. For example, if a form always sets baseVersion to "5" instead of to the provided __version, Collect would still increment __version in its local representation instead of always ending up with "6".

We certainly got a bit tangled up with branchId in the Entity creation case! I think that as we were discussing it initially we were imagining that like with the update case, the client could have generated the branchId and make it available to the form. But in reality there isn't yet an Entity to query before the creation form is finalized.

We added the idea of using branchId as a way to opt in or out of offline Entities without giving it deep enough thought, I think. The way the proposal is written currently, it's very easy to end up in a situation where for the same Entity List, some forms create/update Entities offline and others do not. That would be implicit based on the version of pyxform that the forms were last converted with. Upon deeper reflection, I think this is way too confusing.

This makes me think that we should not allow opting in or out of offline Entities from individual forms, no matter the mechanism.

Now I'm feeling even less sure that we need a way to opt in or out of offline Entity updates. It does feel like the expected and more desirable behavior in most cases.

If we do want to have a way to opt out of creating/updating Entities offline, I now feel strongly that it should be done consistently for all forms affecting the same Entity List. Some options we've previously discussed:

  • Make it an Entity List setting on the server side
  • Make it a project-level setting on the client side

I'm currently preferring the client-side setting because it feels more flexible. A project manager could configure it within a QR code or someone in the field could change the setting if they find a need to do so.

1 Like

Just reading through this recent discussion! Two topics that stood out to me:

Opting out of offline entities

This sounds more reasonable to me:

The reverse seems confusing: what would it mean for Central to block offline updates? It's main role is just trying to correctly order and track offline updates. But if the setting is on the client, then it just wont send more than one update at a time.

Offline entity creation without branchId

If there are things that should change with Central based on what's actually possible with Collect, let me know!

So if there was an offline branch that started with an entity create, the actions might look like this

action: create, uuid: 123, branchId: null
action: update, uuid: 123, branchId: xyz, trunkVersion: null, baseVersion: 1
action: update, uuid: 123, branchId: xyz, trunkVersion: null, baseVersion: 2

I guess if an update came in before the create and the entity didn't yet exist, instead of waiting for the beginning of the branch, it would wait for the entity with that uuid? And it would know to do so because trunkVersion is null.

1 Like

I have updated the original post to reflect the latest thinking. Specifically, I have removed branchId from Entity create and have clarified that only forms with Entity spec version 2024.1.0 or higher should result in offline Entity creation or update. This is because we want servers to have branch id and trunk version information if there's the possibility of a chain of updates happening offline.

We have decided not to offer an explicit opt-out mechanism for offline Entities because it leads to confusing states when multiple forms interact with the same Entity List. A user or server implementation could choose to generate forms with a lower Entity spec version for a time if they wanted to initially opt-out. They could do this either by manually generating form XML or by using an older version of pyxform.

Because there is a lot of new functionality here with a lot of interaction between client and server, we will roll offline Entities out progressively and make time for quality assurance. As long as we don't find any serious issues with what we have built so far, we currently expect offline Entities to be fully available by late November. Here is a summary of our release plan (click to enlarge):

Note that Central will first have a release in which offline Entities functionality exists but is off by default because it will not generate forms with the v2024.1.0 Entities spec by default. We will then do another release of Central which both generates forms with the v2024.1.0 Entities spec and migrates existing Entities forms for a consistent client experience. @Ukang_a_Dickson this will likely be of interest to you.

Please let us know if you have any questions or comments!

2 Likes

Thanks a lot @LN for the update

1 Like

When can we expect the next release of ODK central v2024.2.

We are currently planning to start regression testing early next week. Unless we find any major issues in that process, we should be able to release by the end of the month, hopefully the week of the 23rd.

1 Like

Thanks a lot for the updates.

1 Like