OpenRosa spec proposal: support offline Entities

So far, Entities support in Collect has relied entirely on existing functionality: CSV form attachments, automatic form updates, etc. You can try out experimental support for offline Entities by joining the beta program and following instructions shared in the v2024.2 beta release notes.

As we work to polish offline Entities support in Collect, we find that we need more information from the server. Specifically, we need to know:

  • if a CSV attachment is an Entity List so that we can share that list between forms
  • whether an Entity that Collect has locally but that wasn’t in the last data refresh should be deleted by Collect

Identify CSV attachments that are Entity Lists

We propose adding a type attribute to the mediaFile node in the OpenRosa form manifest response. Initially, the only supported value for the attribute would be entityList. If a media file has a type of entityList, clients that support offline Entities would be responsible for creating an Entity List representation of that media file and sharing it between forms based on the filename without the .csv extension.

Example manifest response:

<mediaFile type="entityList">
  <filename>people.csv</filename>
  <hash>md5:9fd39ac868eccdc0c134b3b7a6a25eb7</hash>
  <downloadUrl>https://some.server/blobSource?foo=222</downloadUrl>
 </mediaFile>

Given this manifest entry, a client would know to create an Entity List named people and to share it between forms in the current client project. Every form with a mediaFile of type entityList with filename people.csv in the current client project would share access to the same Entity List.

We can’t use existing information to reliably determine whether a media file should be considered an Entity List. Some servers such as Central may serve the same file from different URLs. The md5 hash is based on the current content of the file and could be different even if two forms are updated in close succession.

This change is additive and clients that do not support offline Entities would continue to ignore the new attribute if it’s provided by a server.

Identify Entities that Collect should delete locally

Currently, all Entities are downloaded every time that Collect requests a form update from the server and detects that the Entity List has changed. Entities that Collect previously received from the server and that are no longer in the Entity List response from the server can be safely deleted in Collect’s representation. However, there are various edge cases in which Entities are never deleted. For example:

  • An Entity is created offline, received by the server, and immediately deleted from the server before the client receives it again from the server
  • An Entity List is configured to only create Entities on submission approval. Collect will create Entities offline in that case. If an Entity-creating submission is rejected, the Entity should be deleted from Collect
  • (Future) An Entity is created offline, received by the server, and immediately archived from the server before the client receives it again from the server
  • (Future) Collect only requests Entities created or updated since it last received an update. It will need a way to also learn about deletes/archives.

To support this need, we propose adding an integrityUrl element to mediaFile blocks with type entityList in the OpenRosa form manifest response. The value of this element would represent a URL that the client can use to ensure the integrity of its local Entity List representation.

We propose using the generic integrityUrl name because although that URL would initially only provide information about Entities that should no longer be in the client’s local Entity List representation, we imagine that in the future it could also provide information like the total expected number of Entities. We propose having the server specify the integrity URL in the manifest rather than defining a fixed endpoint so that different server implementations can use URL structures that make sense for their context.

For the current need, we propose requiring that the URL provided by the server accept a query parameter id that accepts a comma-separated list of Entity ids. The server would be responsible for splitting that list and returning for each specified id whether the Entity should be deleted on the client.

Example manifest response:

<mediaFile type="entityList">
  <filename>people.csv</filename>
  <hash>md5:9fd39ac868eccdc0c134b3b7a6a25eb7</hash>
  <downloadUrl>https://some.server/blobSource?foo=222</downloadUrl>
  <integrityUrl>https://some.server/forms/12/integrity</integrityUrl>
 </mediaFile>

A client receiving this manifest response would be responsible for comparing their local people Entity List representation to the people.csv file it downloads. If the client has any offline Entities that are not represented in the CSV it downloads, it would be responsible for making a request to https://some.server/forms/12/integrity with the id query parameter set to a comma-separated list of all the possible candidates for deletion that it identified.

We propose the following structure for the server response:

<?xml version='1.0' encoding='UTF-8' ?>
<data>
  <entities>
    <entity id="24b47424-ccf8-4f4b-b4cd-34ff5c71eddd">
      <deleted>true</deleted>
    </entity>
    <entity id="9e32d18f-d51a-4826-a8b2-e9b1c6d10b58">
      <deleted>false</deleted>
    </entity>
  </entities>
</data>

We propose returning an XML response rather than JSON to match the existing OpenRosa API specification. The data root element is included to provide flexibility to eventually add more data like the total Entity count (XML documents may only have a single root).

The deleted element represents whether an Entity should be deleted in the client’s offline representation.

Feedback wanted

Please let us know if you have any feedback on these proposals. See also a companion form spec proposal.

1 Like