Collect: Coming soon, Offline Entities!

We have made good progress on offline Entities! Read on for some detail on what to expect as a user, some limitations we're currently planning for the first release, and an update on timeline. If you are also interested in the form specification, please see this thread.

Please do let us know if you have any questions or feedback, particularly about some of the limitations we are planning for the first release. We will use the notes in this post for writing user documentation so any clarifying questions you have will be very helpful.

There is a lot of detail here! Don't worry, you don't need to understand all of it to make great use of Entities. We want to make sure this information is available for troubleshooting and so that community members can have input if they would like.

Timeline

Here's our tentative timeline, also linked from the form specification thread (click to enlarge):

First we will release a Collect beta and share access to one of our test servers to try it out with. We will then release Central with offline Entities off by default later this month so that users can try offline Entities with their own forms and so that we can spend time on quality assurance leading up to the full Collect release and then a Central release about a month after that which turns on offline Entities by default.

Old versions of Collect will unfortunately silently fail when attempting to download forms with the new spec version. Our goal is to give enough time for most users to upgrade Collect before we turn offline Entities on by default in Central.

How Central will work

Central's primary goal for offline Entities is to capture enough information about Submissions that affect Entities so that it could reconstruct the full history including offline branches from different clients.

However, Central will not expose all of this to users, instead it will continue to follow the "last write wins" approach with conflict detection. Our goal is to provide just enough information to users when Submissions are known to have come from offline branches so that issues are easier to identify and fix.

When there are multiple updates made offline by the same user, Central will mark these as offline updates:

It will also detect conflicts between multiple offline branches from different clients.

Additionally, Central will have new behavior to handle out-of-order submissions. In the ideal case, when multiple updates have been made from Collect while offline, Collect sends those in the order in which they were created. Central can then process them in the order it receives them and match the intended history.

In some cases, Submissions may be sent out of order. This can happen if Collect is configured to allow the user to manually submit or if a Submission fails. In the upcoming release, Central will detect a form Submission that specifies an Entity update that's out of order and wait to apply it. A Submission that has been held will be processed immediately once the missing earlier Submission(s) are received.

If a Submission is held for more than 5 days, then it will be applied as an update even if there are missing Submissions that should have come before it. In that case, the Submission is said to have been “force-processed” and will be marked as a conflict that you can resolve.

If an update is received for an Entity, but the create is missing (if no create is received for the Entity for 5 days), the update will be force-processed as a create. If the update did not specify a label, the label will be auto-generated (because every Entity must have a label). If the create does finally end up being received after the update is force-processed as a create, that original create will be processed as an update. Central's goal is to try to use all Entity data that’s submitted even if it arrives late or out-of-order.

How Collect will work

Collect will keep a database-backed representation of each Entity List. When a form instance that creates or updates an Entity is finalized, Collect will apply the change to its local Entity List representation (if the Entity spec version in the form definition is v2024.1.0 or newer, otherwise it will leave the Entity List unchanged).

Sending filled forms and receiving form updates will continue to be completely independent from each other. Eventually, we may combine the two into a single synchronization operation but for this initial release, the Collect user experience will remain unchanged. We strongly recommend using automatic submission and form updates when using Entities to keep the server and client data as closely aligned as possible.

Collect will process Entity List updates in the background at form update time. For every Entity in the list, Collect will compare the server version to its local version. If the version is the same or greater, it will take the update from the server. In some cases, this will mean temporarily replacing newer user data with older data that came from the server. Eventually, the user's data will be submitted, it will be processed by the server, and the combined Entity version will be received by Collect.

Collect will always use the highest Entity version between its local representation and the server, regardless of conflict status. Conflict status will only be shown to server users who have more context about what's happening across their full project.

Collect will keep track of whether the Entity version it has was created locally or came from the server. If it came from the server and Collect gets an update without that Entity, it will delete the Entity locally.

To avoid corrupt data while filling out a form, if a form is updating while a user tries to open it, the user will need to wait until the update is done before the form opens. If an automatic form update attempts to run while that form is being filled, the update will be rescheduled for later.

The database-backed representation of Entity Lists will allow faster lookups with less memory. In this initial release only some expressions will be optimized in this way and over time more and more will be.

Planned limitations

In order to release as soon as possible and start gathering real world experience with the system, we are planning to leave a few limitations in the initial release. Some we will definitely address over time but others we may leave if we don't hear that they are blockers for users. If any of these seem critical for your use case or if you have questions about other scenarios, please ask below.

Clients download full Entity List with each update

Entity Lists continue to be served as CSVs with all Entities included. This helps maintain data integrity between client and server but leads to a lot of data having to be shared from server to client, even though the CSVs are zipped. It also means that the server has to do a lot of work. Eventually clients will be able to request Entities that have changed only.

Rejecting Entity-creating submissions on the server will not reject them on the client

In this first release, Entities will always be created offline, even if submission approval is required on the server side. This means that some submissions which have been rejected by the server may have created Entities on the device. This is something we will address better in the future. For now, we recommend keeping Central's default behavior of automatic creation of Entities without submission approval.

This limitation may not be a problem for you if:

  1. Entity creation and update are done by different people
  2. You intend to hold submissions so you can make edits and then always approve them

Entity created locally and immediately deleted from server

In this system, Entity create/update on the client, Entity create/update on the server and Entity updates from server to client are completely independent from each other and can happen in any order. With our current implementation, this means that there are cases in which Entities can be created on the client, deleted on the server, and that deletion is never synchronized to the client. We have a proposed approach to address this described in this thread but it will not be in the first release.

CSV form attachments with the same name as an Entity List will be conflated with the Entity List

If a project contains a form that reads from a CSV with a name that matches an Entity List's name, that form will read from the Entity List in Collect instead of reading from the attached CSV. We expect that this is very rare but will try to address it better soon.

Multiple clients that submit interleaved offline branches

If multiple clients each make several edits while offline and then submit at the same time, there's a risk that their updates will be interleaved. It will be hard to understand what happened from the Central representation and conflicts may be marked as soft when they are in fact more serious.

Next steps

Thanks to all of you who have contributed to making this complex functionality a reality!

Although this post contains a lot of details about special cases, we believe that overall the system will behave like most users expect. If that's not the case, please let us know. We are looking forward to your feedback on the Collect beta and the Central release after that.

4 Likes