New number-or-zero() XPath function

Xiphware · October 29, 2018, 9:53pm

What is the general goal of the feature?
New ODK XPath function for purpose of returning a valid number for arithmetic calculations, regardless of whether input is null (or even invalid), in which case it returns 0.

What are some example use cases for this feature?
Presently null inputs have to be converted to 0 - to avoid dreaded 'NaN' problem - using either coalesce(${x},0) or if(${x}!='',${x},0). Both are rather verbose and so subject to error, and involve evaluating XPath expressions containing multiple function arguments. Its also unclear, from a strictly XPath data type perspective, what datatype these should return. It would be more convenient, and somewhat faster, to have a specific XPath function for this common usecase, along the lines number-or-zero(${x}), which would perform the equivalent of: if(number(${x}) != NaN, number(${x}), 0)

What can you contribute to making this feature a reality?
I can implement it for libxml

Xiphware · November 26, 2018, 9:31pm

bump.

Any interest in this, as an (simpler) alternative for dealing with the not uncommon NaN problem in arithmetic calculations?

yanokwa · November 26, 2018, 10:51pm

I don't have an opposition to the feature. Perhaps this can be your foray into contributing to JavaRosa and the spec?

Xiphware · November 26, 2018, 11:13pm

I'll just go dust off my "Java for Dummies"...

[Aside, but as it happens, I only just this morning finished finally implementing if(cond,true,false), for libxml2... And in case yer wondering how in the heck I managed without if() till now, you can actually accomplish it with just the basic XPath1.0 math & string functions, although its gawd-awful messy! lol]

Xiphware · November 27, 2018, 8:53pm

github.com/getodk/javarosa

add new number-or-zero() XPath function

getodk:master ← tiritea:patch-1

opened 08:52PM - 27 Nov 18 UTC

tiritea

+8 -0

Returns 0 instead of NaN if argument is not a valid numeric expression, eg numbe…r-or-zero(${foo}) Specifically intended to replace the more long-winded coalesce(${foo},0) or if(${foo}!='',${foo},0) Closes # #### What has been done to verify that this works as intended? #### Why is this the best possible solution? Were any other approaches considered? this solution is intended to make more 'complicated' solutions less necessary or prone to user mistakes. #### Are there any risks to merging this code? If so, what are they? Shouldnt be - this new XPath function wont ever get used unless explicitly added to XForm expression by form writer. So will not affect any existing forms.

yanokwa · November 28, 2018, 5:29am

@martijnr I think we should have had a spec discussion before we got to the PR stage, but we shouldn't let perfect be the enemy of good!

Would this number-or-zero function be acceptable to you as an addition to the spec? Any namespace considerations we should be thinking of?

Xiphware · November 28, 2018, 5:54am

Actually, "number-or-zero" was @martijnr's idea - I had my heart set on "numberOr0", but I acquiesced to his better judgement [apparantly he doesnt like camels...]

ggalmazor · November 28, 2018, 11:53am

Sorry to be nitpicky, but adding a function for a specific application of coalesce sounds a little bit off.

coalesce is already a broadly used verb for this operation in the context of data. In any case, I think that it should be called coalesce-zero
I don't get how number-or-zero(${x}) is less verbose than coalesce(${x}, 0) (it's longer!).
If the context is arithmetic operations, why zero? Why not one instead? If you're going to multiply numbers, you need a one. Other aggregation operations in other contexts could require other neutral terms, which makes this one look weirdly specific.

If the problem we're trying to solve is handling null values while aggregating nodesets, we could change those sum(), max(), min(), etc. functions to gracefully handle null values with proper neutral values in the context of the operation we're trying to do e.g. Integer.MIN_VALUE in the context of max().

martijnr · November 28, 2018, 6:03pm

Thank for posting this. Sorry, I somehow overlooked it.

it's longer!

I think it may only make sense when it covers 'NaN' and any non-numbers which if(number(${x}) != NaN, number(${x}), 0) does and coalesce doesn't. Whether it is useful enough to deserve its own shortcut function, I don't feel qualified to comment on. Would be good to hear from users as well.

we could change those sum(), max(), min(),

I would not be in favor of changing those. We'd be changing native XPath functions. It can be very useful for sum() to return NaN until all nodes in a node-set have value.

he doesnt like camels

Indeed!

Xiphware · November 28, 2018, 7:37pm

Not nitpicky at all - your comments are most welcome. Basically, the 'problem' this is specifically trying to solve is the quite common mistake newbie (XLSForm) writers make where they - somewhat naturally - assume (numeric) questions that are not answered will be zero when used in any subsequent calculations. Addressing this can certainly be accomplished with coalesce(${foo},0) [which in turn is arguably redundant since if(${foo}!='',${foo},...) can do anything coalesce() does...]. But for someone living in XLS-land - who is used to just dealing 'variables' like ${foo}, and has little or no concept of XML nodesets and XPath - the concept behind of coalesce() is pretty obscure...

coalesce(arg, arg)
Returns first non-empty value of the two arg s. Returns an empty string if both are empty or non-existent.

"huh?"

I think it'll be a bit more self-explanatory, for XLSForm writers, to just say if you want to include values in a calculation that might be unanswered/null, you should put number-or-zero(${foo}) instead of just ${foo}.

A minor technical advantage of number-or-zero(x) over coalesce(x,y) is that the later can require two XPath argument evaluations, whereas the former only ever requires one. Minor efficiency advantage, but when, say, a summation calculations involve lots of operands every one may need to be recalculated until the calculation quiesces.

If the concensus is that number-or-zero() is too specific for general use and unnecessarily redundant, then I'm happy to withdraw the proposal. However, I suspect the opposite may in fact be true - pretty much the only time I ever see coalesce() being used is for this exact purpose, and if a simpler alternative like number-or-zero() were available it would probably get used instead (and we may hardly see coalesce() popping up anymore...). Which I guess you could consider an argument in favor of introducing it.

Xiphware · November 28, 2018, 8:29pm

Agreed. We dont want to make existing forms now return different results, nor change the behavior of existing W3C XPath functions unless there is an extremely compelling reason to do so (round() was borderline IMHO...

ggalmazor · December 3, 2018, 11:09am

OK, that makes sense to me now

I think what I still don't get is why a fixed zero as a default value. Are we only supporting sums?

Maybe overloading number() to admit a second optional argument would work:

It's shorter
We're not adding a new function, nor changing current behavior
We're not deciding a fixed value for our users, supporting other scenarios for free, like multiplication, or any other thing users want
number(a, b) calls number(a) which somehow makes more sense and cohesive.

Xiphware · December 3, 2018, 9:55pm

Interesting idea @ggalmazor, I hadnt thought of it that way!

Although involving more arguments (and therefore less efficient), I do certainly see an appeal to a number(${foo},x) function that attempts to perform a number(${foo}) but now allows you to specify a result instead of NaN if it cant (which will probably be 0 in 99% of cases). Its also better than coalesce(${foo},0) in its handling of NaN, plus the name "number" is less obscure to a newbies than "coalesce".

I'm agreeable to this change. @martijnr? @yanokwa?

yanokwa · December 3, 2018, 10:05pm

There's something very elegant about number(${foo}, default) and we can use that idea elsewhere. I'm OK with it. @martijnr, is this OK for you?

Xiphware · December 4, 2018, 2:17am

In my defense, 99.9% of the time NaNs seem to blow up calculations 'cause somebody assumed an unanswered question would be 0...

martijnr · December 5, 2018, 6:11pm

I can see overriding number() with a second argument is the most attractive solution for users, so it's fine with me.

It's a little problematic for Enketo developers (and perhaps @Xiphware?) that try to leverage a native XPath evaluator as much as possible (for a ~100x performance improvement), but I'm sure those poor folks can figure out a way, right @alxndrsn?

Xiphware · December 5, 2018, 9:12pm

Good point! I'll need to see how (or if!? ) I can catch libxml2 errors, when the build-in number() function fails due to 'invalid' number of arguments, so that I can launch mine.

martijnr · December 7, 2018, 3:37pm

right, or perhaps a regex replace for 2 arg usage to if(number([1] != NaN, number([1], [2]) before sending to the evaluator

Xiphware · December 9, 2018, 9:34pm

Presumably Enketo's XPath support is already somehow handling these 'extensions' to existing Xpath functions, since you have to do similar to handle ODK's round(number, places), with extends the base XPath1 round(number) function to add an additional optional 2nd parameter?

Xiphware · December 10, 2018, 1:56am

Ewww.... yuck! [I cant imagine the regex necessary to correctly deal with substituting different flavors of nested number() calls... ].

So, it looks like libxml2 will fail to register an XPath extension having the same name as an existing function (as opposed to gracefully failing over, which I was rather hoping for...), and no obvious way to intercept the resulting error (in order to insert my valid value) before the entire XPath expression evaluation aborts. [So I'm not even sure I can implement support for ODK's round(number,places)! ]

I think the only way to accomplish this (in libxml2) would be to have the two functions in distinct namespaces. Which begs the question, is it actually strictly legitimate - wrt W3C XPath spec - to have different versions of the same XPath function name in the same namespace? I poked around the W3C specs a bit and didnt find anything explicitly indicating one way or the other; @martijnr do you happen to know?