New number-or-zero() XPath function

Xiphware · December 10, 2018, 7:36am

Good news (and not so good news)... After more experimentation I've found I can actually successfully register a custom libxml2 XPath extension to an existing function - which means I could in fact implement a new number(x,default) function (as well as ODK's round(number,places) Yay!) iff I first explicitly de-register the pre-existing native XPath function of the same name!

The bad news: you've basically 'lost' the original native function, so your new custom function must re-implement all the existing functionality of the original, plus whatever new function you've added. Sigh. Oh well, at least its possible.

That still leaves open my question whether such changes to existing XPath functions, in the same namespace, are legit spec-wise...?

martijnr · December 10, 2018, 5:20pm

Strictly speaking it is probably not the right thing to do, but we agreed to be lenient with functions (only) to be able to hide namespaces from users (see long discussion).

Xiphware · December 10, 2018, 8:44pm

I'm a bit confused then by your Jan 14 comment on the thread:

Perhaps there is no real benefit to namespacing new functionality after all?
There is. It is to avoid collisions between custom extensions by different parties and with the W3C spec(s) (ie. W3C-spec-compliant libraries that may be used in data collection clients).

Which seems to be exactly the (my) problem here, which namespacing would prevent, right?

I guess I'm 'OK', knowing now that at least there's a mechanism by which I can accomplish these custom extensions to existing functions. But I'm not sure I'm 100% comfortable not-doing-what-we-should-be-doing - ie prefixing custom extensions to avoid collisions - just to save XLSForm users from having to remember to type in a function prefix.

Should we instead leverage pyxform to (further) insulate users from XML XForms? pyxform is already checking XPath arguments, so it should be relatively simple for it to check the number of arguments and either substitute number(num) or orx:number(num,default) appropriately in its generated XPath expression output. This lets the core javaRosa/ODK stuff remain strictly spec conformant, where pyxform/XSLForm continues to provide the more user-friendly 'interface' to XForms/XPath.

martijnr · December 12, 2018, 5:40pm

I'm a bit confused then by your Jan 14 comment on the thread:

Sometimes, I put aside my purist tendencies...

I agree it would be a good to consider having pyxform deal with adding a namespace for changes to native functions. We may have already gone too far with deviations, but I guess that's not an argument to make it worse (and we could also try to correct).

Should we be primarily concerned about deviating from XPath 1.0, 2.0 and 3.0? Or also about deviating from XForms 1.0, 1.1?

I did a quick look up in our XForms specification.

So far our XPath deviations are concat(), round() and string-length().

Our XForms deviations are position(), boolean-from-string(), random() (which would be fixable) and now().

Xiphware · December 12, 2018, 7:41pm

Well, I'm known to abandon my purist tendencies at the drop of a half-decent hat too... Unfortunately, in this case, it looks like its not merely a moral failing, but has real consequences in terms of development effort: you basically got to re-implement the entire original function in order to add an 'extension' (at least for libxml2, I'm not sure if same applies to your Enketo's XPath lib).

Xiphware · December 12, 2018, 8:55pm

It would be nice if we could just get away with saying "XPath 1.0 support required", plus such-n-such custom 'odk:' and 'jr:' extension functions. [full disclosure: I say that somewhat selfishly, because Daniel has made it pretty clear libxml2 wont ever support XPath 2.0].

Opps, that should probably be "XPath 1.0 support required", plus these XPath2 functions (if, regex, ...), plus such-n-such custom 'odk:' and 'jr:' extension functions. Since I dont expect javaRosa will ever support a full XPath2 with all its axes...

That seems less an issue, simply because ODK is - for all practical purposes - pretty much the last word in XForms these days [sic].

Xiphware · December 16, 2018, 9:35pm

So before I proceed any further, we probably need to come to some sorta concensus around this: namely, (1) that ODK extensions to existing XPath functions should be added to a different namespace; specifically those adding additional arguments, and if so (2), in addition exploit pyxform to identify - by number of arguments - and add the appropriate namespace prefix to identically-named XPath functions (to better insulate XSLForm users from these implementation details).

It makes little difference to javaRosa per se, because javaRosa custom implements all this XPath stuff itself, so adding additional arguments to existing functions is nominally different than adding a new function. But it appears that duplicating XPath functions (within the same namespace) can have some serious repercussions in other environments that use other native XPath libraries (Enketo, libxml2, ...) for performance reasons. But if we take this stance, then, as @martijnr suggests, we also probably need to revisit some of the 'mistakes' that have already been made and maybe come up with a plan when or even if to fix them.

Is this something we can decide here between ourselves (most everyone who really cares is probably on this thread. With the obvious exception of @LN of course... But hey, when the cat's away the mice can play, eh! ). Or does this need broader TSC discussion and review?

LN · December 18, 2018, 8:28pm

I am still "stuck under a baby" so thanks for bearing with my hit-and-run impressions!

Here is how I would characterize the challenge that needs to be addressed:

form nodes can be empty
empty values need to be handled explicitly in certain contexts (e.g. using aggregation operations)
XLSForm use doesn't require any experience programming and dealing with empty values is a rather foreign concept for someone who hasn't programmed before
it's not always obvious that a node can be empty

I don't think this is a problem for folks with a programming background because the if solution is quite intuitive for them. It's also close enough to a sentence that once it has been explained, users tend to remember and use it. I don't personally love coalesce and would not recommend it to users but I think as @ggalmazor has said, it's pretty clear for someone with a programming background. That suggests to me that the XPath/XForms level is the wrong level to address this at -- it's not a problem for folks who use that level directly.

I think end users get a confusing error and assume they have made a mistake. I suspect that introducing a new XPath function would just result in three different answers to the same question rather than an intuitive solution for users that they would tend to find on their own. I haven't had a chance to think through this deeply but here are the kinds of solutions I would tend to explore:

improve existing documentation for this issue and make sure the problem is easy to find on the forum (by e.g. tweaking post titles and cross linking them and making sure the full error message is searchable)
have pyxform always wrap with the if function in certain contexts. I'm not sure whether this is practical -- it would still need to be possible for users to build their own complex impressions so pyxform would have to identify and bypass that case
have clients intercept the confusing error about NaN and show a message that better guides the form designer to a solution
introduce a warning at the pyxform level to guide the form designers to understand the challenge and address it if appropriate
introduce some syntactic sugar at the XLSForm level to address this. It could even be number-or-zero that gets expanded to the if expression but I am not sure that's more intuitive. Perhaps it could be something in the parameter column. It could even be a form-level configuration (i.e. "if you run into any NaNs, always convert them to N").

I like the number approach somewhat better than number-or-zero but I still think it's unlikely it would address the confusion -- it would again just create a third answer to the question of "how do I fix this error." I suspect that the shorthand would be less intuitive and memorable than the longhand if. I'm also not thrilled about overloading an existing XPath function.

Xiphware · December 18, 2018, 8:42pm

Thanks for the great response! So my thinking, now:

number-or-zero(x) doesn't really add anything significant enough over what's already available to justify its existence.
the alternative of overloading an existing XPath function number(x,default) also arguably doesn't really add anything all that significant to the end user either [and the subtlety of returning 0 for invalid strings will be utterly lost on them]. And it probably just compounds an existing legacy problem/mistake that's already be made, namely overloading existing XPath functions in the same namespace, requiring complete reimplementation.

So probably makes sense to withdraw this, and perhaps think how we might address the problem at a higher - ie XLSForm/pyxfrom - level, better documentations, etc as @LN suggests.

Xiphware · December 18, 2018, 9:46pm

This may be quite doable, to do something in pyxform: when substituting /data/foo for ${foo} while translating any calculations, pyxform can check whether any referenced fields ${foo} is required or not [eg its binding is anything but required=false()]. If not then there is the potential for ${foo} to be null, so either throw a warning, or even automatically insert an appropriate coalesece(${foo},0), with no user interaction required.

Xiphware · December 18, 2018, 9:58pm

Hmm... somehow my original XPath feature request (ostensibly for my Objective-C based iXForms) has transmogrified itself into a python-based pyxform feature, passing thru java-based javaRosa in the process...

I guess I should put my "Java for Dummies" back on the shelf, and pull out "Python for Dummies" now.

Xiphware · December 18, 2018, 10:54pm

So if there are no objections, I'll close out this feature request (soon-ish), and perhaps open something suitable against https://github.com/XLSForm/pyxform/issues. OK?

yanokwa · December 19, 2018, 9:11pm

If number-or-zero() has brought @LN out of maternity leave, then I think it's best we close those issues

Docs help, but not everyone reads the docs. Catching the error in Collect seems unhelpful because the empty node can happen after the form has been deployed. And finally, don't like pyxform doing magical things in certain contexts because historically, it has bitten us (see or_other).

I think adding a warning in pyxform is the best place because that's when the form designer can make the appropriate choice. And to be specific, if we see an integer or decimal that is used downstream in a calculation, we warn that it can be NaN and that users should consider using an if or coalesce to set a default value.

Xiphware · December 19, 2018, 9:21pm

Ha! number-or-zero() was so bad it managed to raise @LN from the dead!

Unfortunately, I suspect all the downstream, dependency-checking code only currently exists in the javaRosa/Enketo calculation engines, and pyxform doesn't presently do anything here? [can anybody confirm?]. If the case, it might be non-trivial. But that can be a discussion for pyxform.... Case closed.