Unexpected behaviour of the now() function

sebmercier · January 17, 2019, 8:56pm

Hi everyone,

As suggested by @yanokwa (thanks!), I will copy/paste here an ODK related issue that I posted on GitHub yesterday. Here it goes, and thanks in advance for your help:

This is my first contribution here [i.e. on GitHub, but on the ODK Forum as well!], from a freshly created GitHub account, so I hope this the right place (and procedure) to post my comment. Please let me know if that's not the case or if I should create a new issue for this.

I'm precisely coming from the link posted by @adammichaelwood ( https://docs.opendatakit.org/form-logic/#when-expressions-are-evaluated ), because I realised that the now() function doesn't behave (anymore?) as described there. Used on its own (as in the "datetime_last_saved" example), it is evaluated only once when the survey is opened. I've read several discussions here [on GitHub; I couldn't find anything on the ODK forum] about this subject (and related ones) over the past few weeks and I'm not sure where the debate is at today. But because the ODK documentation is an important source of information for XLSForm coders such as me, I thought I should point out this problem. (BTW, at the end of this same documentation section, the calculation formula should read if(${age} = '', '', once(now())) instead of if(age = '', '', once(now())). I wasted more time than I care to admit because of this, back in December, but really, that's on me. )

Here is some context, in case you'd have some guidance on the particular problem I'm facing. We're suspecting massive "Interviewer Fabrication" (to quote the title of Benjamin Birnbaum's dissertation, linked in another discussion [on GitHub] by @LN) in some of our recent assessments, initially based on how quickly the surveys were filled. So to be able to monitor this in the future, we'd like to calculate the duration of each interview in-survey. The reasons for using this instead of the more elaborate "audit" feature (which I haven't tested yet) are 1) to keep it light, and 2) to give the survey managers an easy way to monitor this indicator on the KoBo online interface, even mid-assessment, so they can take measures if necessary.

My first implementation of this was to use a "calculate" variable with ${end}-${start} (and some time unit conversion). It worked really well in my first tests with Enketo, but I realised a few days ago that ODK/KoBo Collect behave differently: variables using ${end} are not re-evaluated when the survey is finalised and their values remain empty.

Yesterday, I finally came up with another solution: I'm still using ${start} as the initial timestamp, but I'm now using if(${some_variable}='','',now()) for the final timestamp, again with some time unit conversion. I'm not sure why exactly (the behaviour of now() is still a bit of a mystery to me), but this seems to work as I want it to. (See attached XLS form: Convert timestamps and calculate durations within survey.xlsx) However, I still have two problems with this solution:

I don't really know why it works and how now() works, so I'm afraid it may not work anymore in the future if the behaviour of now() changes at some point.
It uses a reference to an unrelated variable (which, in addition, needs to be required and always relevant), which may need to be changed when coding a different survey: I'd like to find a more self-contained solution.

Any insight into this matter would be greatly appreciated.

yanokwa · January 21, 2019, 6:13am

Hi @sebmercier, thanks for the detailed message! I'll take on the issues you raised piece-by-piece.

Are the docs wrong?

ODK Collect, from May 2018 and likely before, should behave as is described at https://docs.opendatakit.org/form-logic/#when-expressions-are-evaluated. I have tried the example in the docs with ODK Collect v1.18.2 (the latest version) and it works as described. That is, when I use now() in a calculate, I get the datetime of the last finalized save.

I've sent in a pull request to fix the minor ${age} issue you found at https://github.com/opendatakit/docs/pull/943. In the future, please do file issues and send in pull requests if you find mistakes in the docs. We really need help with the docs, so if you find those docs valuable and you want to contribute, let me know and we can help you get started.

Why forks of Collect and Enketo might behave differently

ODK Collect and Enketo use different form processing engines. While we work really hard to make sure their behavior is the same (Enketo's developer is on the ODK technical steering committee), their behavior can be different.

Kobo Collect is a clone of ODK Collect with the icons changed, so you'd think the behavior should be the same, but it's sometimes not because it's a clone from April 2018. We've shipped a lot of fixes and features since then...

Recommended ways to do timestamping

If you do have interviewer fabrication, the form audit log is what I'd recommend you use. It should work really well for you. If it doesn't let us know! P.S. We'll be adding background GPS coordinates to the audit log in the next few weeks!

If you need lightweight timestamping, try the technique described by @LN at Timestamping responses - #3 by LN. Personally, I'd do this in addition to turning on the audit log so you have both data for quick review and for detailed analysis.

One more thing

When you get a chance, please introduce yourself here. It helps build community!

sebmercier · January 21, 2019, 6:09pm

Hi @yanokwa, and thanks for your detailed answer!

About now()
My previous tests were with ODK Collect v1.18.2, but on a rather old smartphone running Android 4.2.2. To make sure my problems with now() were not related to that, I've just tested the following form on a more recent phone running Android 7.0 and ODK Collect v1.17.2 (October 2018): test once and now functions.xls (23.5 KB). However, I still get a value for now() that corresponds to the (last) opening of the form.

Here is an example list of values that I get for the calculated functions used in this form:

start: 15:27:07.411
end: 15:29:08.191
once(now()): 15:27:07.436. As expected, this is shortly after start.
now(): 15:27:07.439. This is the unexpected result: it's basically equal to start instead of end.
if(${age}='','',once(now())): 15:27:55.967. As expected, this is the timestamp of when the value of ${age} was set. (To be exact, it is the timestamp of the last time that the value of ${age} went from empty to non-empty: if you set it, then empty it, then set it again, the latter will determine the timestamp. But still, that's expected.)
if(${age}='','',now()): 15:29:08.188. This is shortly before end, which means it gets re-evaluated when saving/finalising the survey. Is it expected? I don't know anymore, but this is the trick that I use as a final timestamp to compute the duration of the survey, as mentioned at the end of my first message.

Am I missing something here? Is there something wrong in my XLSForm?

About timestamping in general
I will probably test the audit log feature at some point, but I doubt that we'll have the time/staff resources to analyse such data in the long run. That's why I'm aiming for a simple, relatively easy-to-monitor "survey duration" indicator that may become a standard part of our assessments. We may also try and add additional timestamps with the if(${variable}='','',once(now())) method that you mention.

Misc.
Thanks for the pull request concerning the documentation. If I find anything else worth mentioning, I'll sure look into how to do that myself!

About the differences between Enketo and ODK/KoBo Collect, I imagined it would be something like that, but thanks for the clarification. We usually use ODK Collect on the phones and KoBoToolbox as the platform to manage our assessments.

yanokwa · January 21, 2019, 6:47pm

Assuming you have timestamp.xlsx like so:

+-----------+----------+-------+-------------+
|   type    |   name   | label | calculation |
+-----------+----------+-------+-------------+
| start     | start    |       |             |
| end       | end      |       |             |
| calculate | once_now |       | once(now()) |
| text      | text     | text  |             |
| calculate | now      |       | now()       |
+-----------+----------+-------+-------------+

If you open the form in ODK Collect v1.18.2, save it as finalized, edit it a minute later, and then save as finalized. This is the output on the SD card. Note that now is similar to end.

<?xml version='1.0' ?>
<timestamp id="timestamp" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
	<start>2019-01-21T10:35:59.535-08</start>
	<end>2019-01-21T10:37:27.650-08</end>
	<once_now>2019-01-21T10:35:59.536-08</once_now>
	<text>789</text>
	<now>2019-01-21T10:37:21.979-08</now>
	<meta>
		<instanceID>uuid:ca51ec70-44e0-49f2-b9c5-978919e3face</instanceID>
	</meta>
</timestamp>

sebmercier · January 21, 2019, 7:46pm

It is similar, but the difference of 6 seconds between now and end, I assume, is the time during which the survey was open the second time. My understanding is that now() is evaluated only once at each opening of the survey (hence my usage of "the (last) opening" in my previous post), and that your now variable (10:37:21.979) is when you reopened the form, not when you last saved it.

yanokwa · February 4, 2019, 12:43am

@sebmercier, I haven't forgotten about you! I've filed an issue at https://github.com/opendatakit/docs/issues/961 and will try to dig into this more over the next week.

sebmercier · February 5, 2019, 5:03pm

Hey, no worries! I've been quite busy myself these past two weeks. Thanks a lot for opening the issue: I'll try to keep an eye on the discussion regularly enough (and possibly participate if I feel I can contribute in any way).

LN · March 19, 2019, 7:55pm

There's nothing special about how now() is evaluated -- it follows the same evaluation rules as any expression. But the tricky thing is that those evaluation rules are geared towards expressions that only deal with form data and now() deals with state outside the form: time. random() is similar in that it's accessing a pseudorandom number generator outside the form.

Different form engines can and should implement efficiency measures to avoid recomputing expressions unnecessarily. Those shouldn't affect the behavior of expression evaluation but in practice they do in the case of functions that access state outside the form.

In particular, Collect keeps track of form fields that are related to each other through calculations and uses that information to determine what needs to be recalculated. This approach works really well for calculations that use form values. For example, it's clear that if you have a field set to the result of the expression ${age} + 1, you don't need to recompute it unless ${age} changes. And conversely, if it's recomputed only when ${age} changes, the value is always expected and the behavior is indistinguishable from if it were recomputed each time anything in the form changed.

In the case of expressions that only access state outside the form, the efficiency measures may lead to inconsistent behavior depending on what, if anything, has changed during a single form editing session, the particular implementation, etc.

I believe what is going on with now() vs if(${age}='','',now()) is that the engine sees the former as not being worth recomputing because it "couldn't have changed" whereas the second refers to a form element which could have changed.

All that to say that the datetime_last_saved example should definitely be removed. Perhaps the documentation should simply say that functions that only access state outside the form (now() and random()) should never be used on their own because exactly when they are evaluated is not precisely defined between implementations.

CC @ggalmazor for fact checking my explanation

LN · March 29, 2019, 11:55pm

I have submitted a pull request to remove the incorrect example and simplify some of the language at https://github.com/opendatakit/docs/pull/1004.

As I was doing that, I went back to @sebmercier's original question about ${end} and had an additional thought. Why not just store the values of ${start} and ${end} and compute the difference in analysis? ${end} in Collect is just about the last thing that gets set at the, well, end.

Xiphware · March 30, 2019, 7:15am

Define ‘end’...

This is actually something I raised A while back. Presumably ‘end’ = ‘Save’. But if it is possible - in any manner whatsoever - to edit a so-called ‘saved’ Form prior to submission, then when the form is finally, deterministically, ‘done’ (aka end) becomes a bit of a greased pig...

[Bit of an aside, but this is why I took a somewhat different approach and made ‘end’ (or more precisely, inspection ‘finished’ wholly user-specified. Right up until the point of actual submission anything can be changed, freely]

sebmercier · April 18, 2019, 2:21pm

@LN, Thanks a lot for the detailed explanation of the (somewhat unpredictable) behaviour of now(). I think I now have a better understanding of it and I conclude that the if(${required_and_relevant_variable}='','',now()) method is currently the only way to reliably achieve what I'm trying to do.

As for your question about using ${start} and ${end} later in analysis, I'm assuming you refer to the metavariables start and end. I'm stressing this because in ODK Collect, a 'calculate' variable defined as ${end} will remain empty, which is precisely the first problem I ran into, as described in my original message. If so, the main (and closely related) reasons I would like to compute the duration within the survey itself are:

To spare the survey managers the need to compute it from start and end, especially since those are stored in this not-so-user-friendly ISO 8601 format. Of course, I could develop a simple spreadsheet tool that would do it all for them (I've already started to), but I'm afraid that they generally won't have/take the time to do this mid-assessment instead of after data collection is complete (if at all), which would be too late to take corrective action.
To give them a relatively easy way (although not ideal) to monitor this duration indicator directly on the KoBo online interface, e.g. at the end of each day of data collection. This could be done without having to download and analyse the dataset each time and would allow them to react quickly if needed.

Finally, thanks for the pull request concerning the ODK documentation. However, now that I've reread this "When expressions are evaluated" section, I believe the following paragraph is a bit confusing:

The once() function prevents multiple evaluation by only evaluating the expression passed into it if the node has no value. That means the expression will be evaluated once either on form open or when any values the expression depends on are set.

My understanding (after a few tests that I've just done) is that the first sentence is correct and corresponds to the definition of once() given here:

once(expression): Returns the value expression if the question's value is empty. Otherwise, returns the current value of the question.

On the other hand, the second sentence is misleading, because once(expression) will actually return expression whenever the node is empty. That is, if the node remains empty even if expression is not (depending on the node's formula), the output of once(expression) can still change. In particular, I've just realised that the example given at the end of the section, if(${age}='','',once(now())), should be corrected, because it actually returns the last time (not the first) that ${age} went from empty to non-empty, as I pointed out earlier in this thread:

If you empty ${age} after setting it, the if(...) formula will evaluate to empty again.
If you then reset ${age} to a non-empty value, once(now()) will be re-evaluated at this particular time.

The correct way to record the first time that ${age} was set, I think, would be to use this formula:

once(if(${age}='','',now()))

This will continuously evaluate to an empty value until ${age} is set for the first time, after which it won't be able to change, because the node cannot be emptied again.

Am I getting this correctly? Should I maybe start a new thread to discuss this particular problem, since it concerns the once() function and no longer the now() function?

Also, please let me know if any part of my explanation is unclear: I've read over my message time and time again, but I don't know anymore!

LN · January 15, 2021, 9:27pm

Apologies for the long silence, @sebmercier. I hope the solution you ended up finding was workable for your project.

This thread popped back up as I was writing some documentation for a new strategy for lightweight timestamping: https://github.com/getodk/docs/pull/1308/files.

It's now possible to explicitly trigger calculations on value change so that you can do things like calculate the time elapsed between filling q1 and q3. An XLSForm example:

type	name	label	calculation	trigger
string	name	Name
dateTime	name_last_update_time		now()	${name}
string	life_story	Life story
dateTime	life_story_last_update_time		now()	${life_story}

It should now be much more explicit why this is a "last update": every time the trigger question changes, the timestamp updates.

For capturing a first update timestamp, the calculation would look something like if(${name_first_update_time}='', now(), ${name_first_update_time}) or coalesce(${name_first_update_time}, now()).

once has a misleading name and is hard to reason about, as you noted. Our goal would be for these triggered calculations to entirely replace it.

I've also filed an issue for calculations that use end not being evaluated.