They would not because there would be no body element for the label to be nested in. They do work with calculations but then the calculation is displayed by the form filling client (and should always be marked as read-only unless the calculation is wrapped in once() but really default is the way to go now!).
This is also what @danbjoseph proposed here. I want to gently push back on this being simpler. It implies that you'd be thinking about audio recording as you author the form and probably not modifying the recording start/end. I've been imagining adding audio recording as a step that happens after a form is complete. That is, I'd like to test the form, make sure all my logic makes sense, get a feel for how it flows, and once it's final, define what portions I'd like to record. I also imagine I might adjust when recordings start and end as I try the form out and having those defined together would make that process easier.
One additional idea you might like that @seadowg and I discussed is exposing question and group attributes for cases where just a single question or group should be recorded.
If the general sense is that start/end is more intuitive, it should be possible.
Another related theme that came out of the TAB call is whether there would be a benefit to using the event/action mechanism we have instead of bind attributes. I told @Xiphware I'd write that up for consideration in the next couple of days. That would correspond more closely to this start/end XLSForm concept.
Below is an alternative XForms concept that uses events and actions. This would require introducing either a new action for recording audio (e.g. odk:startaudio) or, as I've shown below, a combination of a generic background recording action (e.g. odk:startrecording) and an attribute to indicate the type of recording (e.g. odk:type="audio").
Adding recording for a range would require introducing an event for a question being reached and a stoprecording action (the recording implicitly stops on form exit in the example above). (Side note, I don't know if it would really make sense to allow partial recording within a field-list or an Enketo form not in pages mode but it would be possible by doing something like using the value of the immediately preceding question changing as triggering this new event.)
The purist side of me really likes this. It's extremely flexible and powerful and it's consistent with concepts we've already introduced. The pragmatist side of me concerned. I think that in XLSForm we'd only expose triggering on the odk-question-reached event and we could do a fair amount of validation at that level. But to really follow the specs clients would need to be able to handle actions being triggered by all the events we support, mismatched start/stop actions, etc. That may be handled by our existing generic support for events and actions but I have a feeling that there will be issues. For example, xforms-value-changed events can be triggered in really quick succession and that might cause instability for media recording. If we're unlikely to expose that functionality in XLSForms anyway, I'd rather avoid it.
One of the big points @Xiphware brought up during the TAB call is that there might be other kinds of background recording and that we'd want to introduce an approach that can be extended to e.g. video, locations, humidity, etc. I don't think that there's a big advantage of one approach vs. the other for this.
In the original attributes approach I outlined, we'd have to introduce a new attribute name for each type of data to record. Alternately, the attribute name could be something like odk:auto-populate with values like audio, location, video (which I think I prefer).
In the event/action approach defined above we'd need to introduce either new actions or new types for each new type of data to record.
I think of the attributes approach as a flattened version of the actions/events one that implicitly always uses a "question reached" event to trigger recording start and stop. It's less powerful but I think that's an asset because we get more control over what can be expressed by a form. It's also generally simpler to only have to deal with the bind rather than having information about the recording in actions as well.
Thanks @LN for the heads up and for this proposal. Sorry, I had not seen it. I'll focus on the XForms side.
Functionally this seems closest to our existing preload items, and since we'd like to eventually deprecate those and replace them with setvalue actions (for consistency), I think your 2 setvalue proposals make the most sense.
I like and would very much prefer the simple <odk:startrecording> action as sibling of <bind> but as you mentioned this would depend on whether we can accept not having fine-grained start and stop control.
If this start/stop functionality really is required (really?), there is an issue with how to determine when a question is reached (the trigger question may never get focus or a value, as the user may skip it - so it would require lots of require/field-list logic to make it all work). Depending on how this could be implemented in ODK, I'm wondering if something like odk-page-shown would be precise enough and perhaps reflect better how it would be implemented anyway.
Is your primary dislike of the attribute-based option that it's not consistent? That was also one of @Xiphware's complaints. I'd argue this case is very similar to audit. In that case, we used a fixed node name to signal to clients that a certain file node should be populated by an audit. If we'd thought of it at the time I might have preferred using a bind attribute. Then the audit is configured through bind attributes which would be the same. The major difference between the two is that there's exactly one audit versus possibly multiple background recordings. Other than that they do feel fairly analogous in that we're asking the client to populate a certain field with a particular kind of data.
Yes, from what we've heard from users, it's quite important. Imagine you have a 3000-question survey and there's one section you suspect is not being handled consistently. It would be much better to ask for those few questions to be recorded than the whole survey.
Conceptually, users want to specify a range of questions and know that whenever the enumerator is operating within that range, audio is being recorded. I've been imagining that clients would pre-compute identifiers (e.g. XPath paths) for all questions between the specified start and end. They'd initiate recording when any of those nodes is "reached" (for whatever that means for the specific client and view) and stop recording when any node not in the set is "reached." This implementation concept is one of the things that has me looking away from actions/events -- we likely wouldn't actually use the events. Instead it would make the pre-computation work harder than if the information about start and end were available in the same place.
That would be a better name if we're pretty confident clients with single-page views wouldn't want to use focus, presence on screen or value change in the same context. On the Collect side, since we allow non-linear paths through a form, we'd likely do the kind of implementation I described above. In other words, we wouldn't really use existing action and event implementations.
EDIT: maybe something like odk-question-reached-or-passed or odk-page-shown-or-passed would capture the concept I'm describing?
That's what still bothers me. Obviously, the desired behavior - at least for a form designer - is that they can (somehow) explicitly state the specific conditions when to auto-start and end audio recording. So this needs to be conveyed explicitly in the form definition and cannot otherwise be client specific.
In Collect there is a reliable expectation of overt user interaction involved around flipping between each question (although there is nothing in the XForm definition stating this...), but in more paged/web interfaces like Enketo (or iXForms for that matter) form navigation and interaction is more free form; there's really no explicit assumption that can made about the order users may fill in questions [short of overloading the form definition with relevant dependencies...]. Even triggering it around 'entering' or 'exiting' a group is problematic: eg in iXForms groups are merely used to tell the form renderer to show these questions within a new tableview section, so depending on the screen size you can readily have multiple groups visible at once.
I'm not totally sure about this. This kind of artifact would be a supporting artifact for training or quality control and so I think it's more about getting some ability to cut down on what's recorded and there is likely some tolerance on exactly what is included. Naturally each client would need to explicitly document their behavior but I don't think this is a case where they all need to behave precisely the same way.
An alternative that I'd be open to is to say that we expect the start/end concepts will only be defined as applied to questions that each take up a whole screen (e.g. Collect not in field-lists or Enketo pages mode). Otherwise any specified start/end would be ignored.
That said, we still need to handle jumping around between questions and the event/action model doesn't seem very well suited to that.
Maybe that's worth going back to our feature advocates about; ie is there an expectation that initiating audio recording is something the form designer has a (high?) degree of control over [which they may want if its for, say, auditing purposes], or something more left up to user/enumerator discretion...
I'm definitely not suggesting that the enumerator should have control over the recording beyond with how they navigate the form. The question is more whether it'd be ok for two different clients with the same form to have slightly different recording triggers based on what their display modality is. Each client would still precisely define what it does in its documentation. In other words, is it realistic/desirable to come up with an event type that triggers recording or can we say "clients will find a reasonable way to capture audio roughly in the range of questions from start to end and will make sure that their strategy is precisely defined?"
(Some of my initial input was meant to just be questions to better understand the proposal, not necessarily recommendations for any particular implementation.)
How does where the proposal stand now affect the scenario mentioned on the TAB call: in which a user may tend to advance to the next question/screen before the interviewee finishes talking. If you swipe to advance or click to navigate to a new question that is outside of the recording range, would the client show a confirmation prompt notifying you that the recording will be stopped if you continue?
Yes, totally. I had completely forgotten about the audit feature. I have no objection to this more magical way of implementing this feature (but in the <orx:meta> block with a fixed nodeName). The multiple-file generation issue you mentioned is something to solve indeed.
Using bind attributes instead of meta node attributes also seems better to me, but we'd have to also do that for the existing audit feature, then... right?
Good discussion about start/end. It's pretty difficult to figure out. Nothing new to contribute there yet.
That's a good idea and might be the way to go if it's really critical that no part of the question is missed. My immediate reaction is that it would be pretty disruptive. As an alternative, documentation could suggest things like starting the recording one or more questions before what's really important and ending after. We can also show example forms with a screen that asks the data collector to acknowledge that the next N questions will be recorded and/or get consent (like our foreground recording demo form). For Collect, we can also suggest disabling settings for moving backwards and going to the jump view to force fully linear navigation through the form. Using documentation this way would allow for more flexibility depending on the context.
We could. The special thing about audit that I don't think would be the case for any other kind of automatic background population is that there can only ever be one. So it feels ok that it has its own special spec.
But if consistency is really important there, it should be ok for Collect to support both the fixed audit name in orx:meta and a specific attribute that applies to binary fields without a body element anywhere in the form (e.g. odk-audit=true or maybe even odk-background-populate=audit). We could do what we've done before and eventually shift docs and pyxform to the preferred spec. I think that approach can address the multiple-file requirement: we can allow any number of fields with the attribute. That would also be fine with audit, we'd just say that all would be populated with the same filename (same as with background audio without a range specified).
Here's another concept tying in various parts of the conversation above. It sounds like we generally like an action for the case of a single recording that captures everything. Perhaps we could do that and introduce the artificial restriction that it can only be triggered by xforms-ready (and thus can't nest in the body).
I've used odk:recordaudio instead of odk:startrecording with a type like above because I now think it's easier to read and more coherent with implementation. I can't imagine that e.g. audio recording and video recording would be able to share a lot in any platform, even less so other types of data we've discussed like humidity readings.
For recording only in a question range, we could specify attributes on the action:
This would mean "start the recording engine when the form loads and actually record audio when the enumerator is interacting with a question between /data/q1 and /data/q13 inclusively." Alternately, as came up in conversation with @Xiphware above, it could mean "start the recording engine when the form loads if and only if /data/q1 and /data/q13 are questions displayed on their own screens (otherwise ignore the whole thing). Actually record when a question between /data/q1 and /data/q13 inclusively is on screen."
@LN I like the idea of moving the onus on documentation and advice on how best to use the feature. In my experience, the need to record audio for a specific section always assumes that users move in a linear way, or at least jump to a section and move on linearly from there.
I'm in favor of just documenting whatever the final behavior is for each client for someone jumping into the middle of a recording section. We already manage this with many subtle differences between Collect and Enketo so this shouldn't be an issue from my perspective.
Hello spec enthusiasts! Team Collect has completed most of an implementation for this functionality and is now blocked on the specification. I think this is a largely unprecedented scenario because we typically wait to have an approved spec before implementing. In this case, our commitments and constraints are such that it made sense to prioritize the implementation. Our ideal would be to release it in 2 weeks which would require a finalized spec by this Friday at the latest. I don't want to rush the process and if there are strong feelings that we need to continue consideration, we can delay. On the other hand, if folks don't feel very strongly, perhaps we can try to get to a quick conclusion. I think this is fairly self-contained and a less than optimal spec here seems unlikely to have much impact.
Are there any other clients that are very eager to implement background audio recording (@martijnr, @Xiphware)? If so, do you think recording a limited range of questions would be relevant in that context?
On the XLSForm side, we are considering a parameter-based option and a nesting option.
If I had to make a decision immediately, I would go with the initial attribute-based proposal for both XForms and XLSForm. There's a lot I like about the XForms hybrid concept and would be happy if we went with it but I am somewhat uncomfortable with having to restrict events to only xforms-ready. I continue to believe that introducing an action that can be triggered by any event adds a lot of complexity and I'm not seeing benefits that would make it worthwhile.
@Xiphware and I had some back and forth today and he's overall preferring the hybrid approach. He has highlighted that he is thinking about event-triggered recording because he is interested in user-directed data capture with something like events from a client pause/record button triggering data capture.
One additional possible wrinkle I want to mention if we do go in an action/event direction is that xforms-ready is still not well-defined. We had a long conversation about it when we added odk-instance-first-load. Upon reflection, I think that the question about whether xforms-ready is fired or not on reentry into a saved form might be irrelevant in W3C XForms because the assumption is that forms are filled out while online and that there's no "save as draft" concept. Either way, we'd need to decide whether we re-introduce xforms-ready with a different meaning than what it used to have or introduce a new event.
I propose we introduce the odk-instance-load event. I don't think there's much benefit to using xforms-ready and there's something nice about having odk-instance-first-load and odk-instance-load. That would give us:
(To be clear, I don't feel very strongly about this and would also be happy with re-introducing xforms-ready and documenting that it's fired both on an empty form and when loading a draft record)
On the XLSForm side, I think it makes sense to have a type for whole-form recording even if we do eventually introduce a start/stop syntax. Having to have start/stop directives that wrap the whole form definition seems really annoying.