@issa and I were discussing version in https://github.com/opendatakit/build/issues/154 and I think this is a longer discussion that should be had more broadly.
Historically, people have used the form
id as a kind of a version. So the first version of the form is
my_form_1 and the second version is
my_form_2. This has been the recommended practice (see #6 in form design guidelines) for a while. And since it's user-defined, there are no checks (e.g., on structure) on what a new version means. Aggregate currently treats them as totally separate forms and it's up to the user to decide how to merge them.
version, the attribute, sometime later and the guidelines read:
a separate version setting is used for revisions that do not change the data being collected or their data types. These 'minor' revisions include adding language translations, correcting spelling errors within the text, changing external dataset files, and updating multi-media prompts. E.g., changing the media files used in audio, image or video labels (prompts). Using the version setting enables form definitions to be enhanced mid-survey while allowing all of the older and newer submissions to be stored in the same data table on the ODK Aggregate server. The version of the form definition used is retained in the meta-data of each individual submission so that it is available during data analysis.
version -- must be a small 10-digit-or-less numeric string. We recommend using strings of the form: 'yyyymmddrr' e.g., 2015012901 (the 1st revision on January 29, 2015). Revised form definitions must have a version string that compares lexically (alphabetically) greater than the current form definition. I.e., ODK Aggregate does not accept changes to a form definition unless the version string is different and lexically (alphabetically) greater than the version string of the existing form definition (and the version string must be a 10-digit-or-less integer).
So that's what Aggregate does, but I don't know what ELMO, ONA, Enketo, Kobo do as far as
My guess is that as we move towards supporting form updates, I think we'll adjust these guidelines, so I think we should have a discussion about what
version means now and what we want it to mean.
Kobo overrides the version setting with an unique alphanumeric id from our database. We allow any number of changes to be pushed in an update, and then we use the version ID to craft the export in a way that it's not mangling data.
A table view is currently in the works which will show exactly which question+version each response corresponds to, and also will allow filtering by version.
Enketo doesn't actively concern itself with versions currently. It downloads the first matching
<formID> found in the /formList, and will pro-actively update the browser- and server-cached form with the first match found. It continuously checks for a new "version" by comparing the cashed hash with the current /formList-published
<hash> of the first
<formID> match. There is no user-interaction and a user is also not able to stop a form update from occurring (whilst online).
The submitted record will contain the version attribute+value at the time data collection started (for a particular record).
A problem we have (had) is when a user wants to edit an already existing record that was created with an older version of the form. A good solution would be to publish all form versions in the /formList perhaps. Not sure if the hash is still required in that case.
Ona will make use of the form version provided in the settings page. In the event it is missing, we generate a date based numeric version. Replacing a form creates a new version. At the moment exports can be filtered by version, this uses the latest form version fields in the export though the data will be for the selected version.
We do have some pull-request to always generate a form version according to the current guidelines https://github.com/onaio/onadata/issues/982; only holdback is the form builder is using alphanumeric versions.
It would be helpful to be able to edit a submission created by a previous version in the event the structure differs considerably.
We used to use integer sequence numbers but we've recently switched to using a random three letter code that doubles as an identifier in coded SMS. So I guess we're not compliant with the aggregate standard but I don't think that's an issue for us.