Match the uid preload structure to the uuid() function

Currently, pyxform and Build (the most commonly-used form builders) generate an instanceID with something like:

<bind nodeset="/data/meta/instanceID" type="string" readonly="true()" calculate="concat('uuid:', uuid())"/>

This works great for records that are only edited once and then sent but the calculate is re-evaluated on subsequent form loads so it's not great for editing records.

There's some discussion over at https://github.com/XLSForm/pyxform/issues/94 about changing the form builder output to use the existing uid preload so that it's evaluated exactly once. That would look something like

<bind nodeset="/readonly/meta/instanceID" jr:preload="uid" type="string"/>

example ID generated by uuid(): 0eff14cb-638e-4b1b-a43c-0ceced25c5d8 (RFC 4122 Version 4 UUID)
example ID generated by uid: W4498RLERDRSPPIILOAEO6IDI (25-char string)

I don't know this for sure but I wouldn't be surprised if users relied on the format of the instanceID for something. Currently, using the preload not only means the UID itself would have a different format but also that there would be no uuid: prefix.

I really don't have a good sense of what the user impact of just making the form builder switch without changing what the uid preload does is. Any sense on that?

For aesthetic and standards-compliance reasons, I would prefer not using that arbitrary 25-char format.

If we do decide to use a preload approach for generating the instanceID we could:

  • change the uid preload to generate an RFC 4122 UUID
  • change the uid preload to generate an RFC 4122 UUID and include the uuid: prefix
  • introduce a new uuid preload which includes the prefix and uses a RFC 4122 UUID
  • do nothing and use the current uid

I'm leaning towards that second option which changes the existing uuid preload to generate an RFC 4122 UUID and include the uuid: prefix. That seems the most consistent with current usage. Reusing the uid preload name means new forms will work with old clients, if with an unexpected format. I doubt anyone would miss the 25-char format but am interested in other opinions on that.

1 Like

Coincidentally, @martijnr and I had some discussion around this just yesterday: https://github.com/opendatakit/xforms-spec/issues/2#issue . The thought was - since preloads are less-than-ideal to begin with - that perhaps if we need to revisit the uid anyway - another possibility is that this (very) important identifier for submissions gets inserted as a (new) property on the XML document root? Since we are already effectively populating the 'universally unique' form identifier there, via the id and version attributes, it could flow somewhat naturally to just add a 3rd attribute, say uid, to then further uniquely identify a specific (submitted) instance of said form (!). And since these root attributes already have to be parsed out and extracted anyway when processing a submission, this would avoid having to then further look for a 'meta' subgroup, and a specific 'instanceID' element therein, just to uniquely identity the incoming submission.

All that aside, I agree that there's no compelling reason to include a "uuid:" prefix, but I think an RFC4122 is sufficient (and preferable) on its own.

Strictly speaking, this metadata calculation should be re-evaluated, just like every other XForm calculation, whenever anything changes in the form (!). And the fact it doesn't include a once() - which would otherwise imply a constant recalculation - appears not to happen only because of a workaround [at least that how's Enketo apparantly handles from my understanding from @martijnr].

1 Like

Can anyone explain to me why are preloads a bad option? I'm asking it because I don't know :slight_smile:

1 Like

I think 'bad' is probably too strong an assessment, but there was some discussion a while back (on Slack? in 2017? although it may have fallen off the history by now...) that these preloads were a bit hacky, and didnt really give the level of event-driven granularity desired in any case. But I cant recall that any decision for-or-against preloads in general was ever concluded.

Mostly its that if we're considering reworking the (currently unsatisfactory?) handling of submission UUID tagging anyway, perhaps we should step back and see if there's a better approach. I still think there are useful usecases for defining best-practice metadata, but giving each submission a unique UUID seems a bit more fundamental than that... Again, I draw the comparison to how ODK currently uniquely identifies each form; uniquely identifying each form submission seems like it should be a relatively small step beyond that, so why not basically follow a similar pattern?

Dunno. I'm not 100% convinced either way, yet (other than the existing preloads dont quite 'feel right'... :expressionless: ) So I genuinely look forward to other's viewpoints; I think there's an interesting discusion to be had here [and I hope @martijnr and @LN dont disagree too much... :slight_smile: ]

2 Likes

To me preloads are just an unnecessary customization to the XForms spec now that we have Actions and Events.

However, I don't think we should be in a particular hurry to remove them.

The current spec for the uid preload actually is exactly what @LN is proposing in the second option (and what Enketo supports):

change the uid preload to generate an RFC 4122 UUID and include the uuid: prefix

So changing the pyxform output to a uid preload seems an improvement to me. The spec also specifies the event and the fact that it will only populate once.

So I vote to go ahead and change the pyxform output to a uid preload after fixing uid in Collect to comply with the spec (and save more drastic changes for the future).

1 Like

Certainly not a coincidence and I should have made that more clear! Deeper changes like changing the way instanceID is represented are certainly worth considering and I trust they would be carefully considered and implemented since they require changes across the whole ecosystem.

My preference would be to consider deep changes to how unique identifiers are represented and computed in the context of formalizing behavior that is directly valuable to users like edit workflows.

But since I may not be around to participate in those discussions (!), my priority in the short term is making sure that if the XLSForm change is considered, there is a public, archived discussion describing the context. It's the kind of thing that seems really minor (and very well might be) but I have a hard time predicting the side effects.

I should have said that from a user perspective, the calculation is recomputed on subsequent form loads. What you describe is not necessarily accurate either because generally, implementations will maintain a dependency graph for performance purposes.

Note that this all has not been much of a user-facing issue because typically when users want to make edits to data, they want those edits to be linked to the prior data but not replace it. In other words, it's not desirable for an edited filled form to have the same instanceID as its parent. That is what deprecatedID is for. Also, clients like Collect don't make an edit workflow all that visible. I presume/hope Enketo uses deprecatedID in the way I described but that has yet to be formalized.

My primary objection is that they're "magic" unrelated to the XForms spec. They're not terrible, though, and do get the job done. But if there are going to be more configuration options made available to the form from the client, I do think it's worth considering alternatives. For example, an option would be for clients to generate or download real or virtual XML documents and for forms to be able to query those in a similar way to how secondary instances are queried -- https://opendatakit.github.io/xforms-spec/#secondary-instances---internal.

1 Like

Ok, but the spec does not match implementation. The uid preload has been in ODK since the beginning of time so I'm not exactly sure how the spec was written that way without discussion.

I think it would be ok to change JavaRosa/Collect but as with similar changes that we have made recently, the form builder change should not come right away in order to minimize user impact.

1 Like

Yes, understood. I was editing my post while you were writing your response. Yes, definitely Collect should be fixed first if that is deemed safe (Enketo has implemented it as written in spec, also since forever I think...).

1 Like

I wonder how the spec ended up written that way... :wink: (For context on my wink for those who may be wondering, JavaRosa/Collect predates the spec document. @martijnr put a lot of effort into bringing together documentation from various places into the original pass at the spec)

Not sure how to proceed here. I have a JavaRosa PR at https://github.com/opendatakit/javarosa/pull/379 and if a couple of folks think through it and agree it is entirely safe/non-disruptive, we could get it into the 2.12 release. @ggalmazor, @yanokwa? That would pave the way for the XLSForm/pyxform change to happen in the relatively short term.

1 Like

I'm a fan of incremental and non-disruptive changes that get us closer to an ideal. @LN's proposal is exactly that, so I'm in favor.

To be specific, I think we should merge the PR at https://github.com/opendatakit/javarosa/pull/379 that changes the uid preload to generate an RFC 4122 UUID with uuid: prefix.

I think it's a safe change because it seems extremely unlikely that someone is using uid in a way where this would be disruptive. They'd have to have read the spec, built their form designer (or hand written their own forms), and then rely on JavaRosa's very specific format.

I think we can ship that in JR v2.12 and set ourselves up for making this change in pyxform. And once that's done, we are no worse off because it's a really narrow change that gets Collect and Enketo matching behavior.

1 Like

Curious... is this prefix actually being stripped off by any of the ODK tools presently, or is the whole UUID string basically being used verbatim?

Can't speak about all the tools, but Briefcase uses the string in creating a folder to store the submission. It drops the : because that won't work on some file systems.

1 Like

:+1:

I might still go ahead and implement 'the ideal' :roll_eyes:, in my iOS prototype... Basically I wasn't actually populating instanceID to begin with (!) because I only just managed to implement the uuid() XPath extension function for libxml2 [which is where things started to go pear-shaped fail, cause due to lack of once() in the original Kobo form binding calculation, my recalculate mechanism got itself into an infinite loop...].

So, as a strawman, perhaps something like?

<data id="household_survey" version="2018061801" instanceID="123e4567-e89b-12d3-a456-426655440000">
...
</data>

[and I thank you @yanokwa and @LN for your continuing 'unflinching patience' with me... :wink: ]

1 Like

[Also, this'll give me some bragging rights over @martijnr's Enketo (and javaRosa?) 'cause I can finally say I implement something 'better' than he does! :stuck_out_tongue_closed_eyes: ]

If you only knew how easy that was! There are a few skeletons in Enketo's closet, but I am of course not going to make it easier for you by revealing them :stuck_out_tongue_winking_eye:

BTW thnx for that! I haven't done so yet, but I know it needs be done for performance reasons. So it is on my TODO list [one of the many things @martijnr and javaRosa do better that I... sigh :pensive:]

Would one of them be that Enketo thinks 1+1=1193? ... :grin:

skeleton.xml (1.0 KB)

Haha, I get 43 before the browser breaks off the infinite loop.

Your form made me realize Enketo Validate needs to detect such self-references though, so thanks!

So what yer saying is: Dont Panic, Enketo will give up after 42 attempts to quiesce?... :wink:

JavaRosa's implementation of the uid preload was released in v2.12.0 and Collect v1.18.0 (10/29/2019). The pyxform update that went out some hours ago (thanks, @Ukang_a_Dickson!) and corresponding updates to XLSForm online and offline (thanks, @yanokwa!) contain the change to make the instanceID set by the uid preload.

Thanks to everyone involved in making these decisions and changes happen.

2 Likes