Placeholder/alt text for images (eg screen readers)

Xiphware · November 18, 2024, 1:29am

What high-level problem are you trying to solve?

Address the desire to be able to display a text placeholder for images when they cannot be displayed, or doing so is not appropriate. For example, when using a screen reader or when needing to render a form in a purely textual manner (eg preview, automated testing, etc)

Any ideas on how ODK could help you solve it?

Introduce a new (optional) feature to XLSForm to be able to provide a text 'description' alongside any image media attachment, in either the main survey or in choice options; eg image_description
Introduce a corresponding new (optional) form= attribute option (for text strings) that - in addition to the existing <value form="image"> that today associates image media to the text - can provide an additional textual description of the image; eg <value form="image_description">. [I'll call this option A]
Alternatively, instead of adding a new form= attribute, extend the existing <value> element, only for form="image", to include an optional description field [I'll call this option B]

Upload any helpful links, sketches, and videos.

This feature request is being driven by the desire to be able to associate a text description placeholder for any images shown in a form, primarily to support screen readers for web-based forms. Specifically, if this information is available in a form then the (web) renderer, eg Enketo or ODK's new Web-Forms can use it to appropriately populate a suitable <img src="img_girl.jpg" alt="Girl in a jacket"> alt attribute in the resulting HTML rendition, to then be picked up by a screen reader.

Further, as noted, there may be other circumstances where a purely text-based rendering of forms is desired - eg preview, printing, testing - so the availability of image descriptions in form definition could have uses elsewhere too.

Proposal: XLSForm image_description column

Add a new column to XLSForm for specifying an image description. This column will contain text to associated with the image, eg a description. This text will obviously need to change according to the selected language (which in fact can display different images according to the current language!).

For a monolingual form, this would look like:
MonoLingualImages_proposed.xlsx (18.6 KB)

For a multi-lingual form this would look like:
MultiLingualImages_proposed.xlsx (19.0 KB)

Proposal: XForm new form attribute [option A flavor]

The above new image_description column would - under option [A] - be translated into a new optional element alongside the existing optional media associated with text. In the above example of a monolingual form this would look like in XForm:
MonoLingualImages_proposed.xml (3.1 KB)

For a multi-lingual form this would look like in XForm:
MultiLingualImages_proposed.xml (4.7 KB)

I believe this is consistent with how related things like media and guidance hints are being expressed in ODK XForms presently:

These image descriptions are basically supplementary to existing images (in both form controls and options), which in turn are supplementary to text labels. Hence it seems natural to add them as a new additional form="image_description" attribute to the existing form="short" (unused?) and form="image" extensions for labels.

(an option [B] flavor of this is ostensibly the same; I summarize the difference in Unresolved Issues, below)

Unresolved Issues

select a suitable new name for the XLSForm column containing image descriptions. I propose image_description (see above XLSForm examples), but have no strong preference.
[A] select a suitable name for a new form='...' attribute in XForm with which to define the associated text description; eg <value form="image_description">eagle</value> (see above XForm examples)
or [B] select a suitable optional attribute name (only) for the existing form='image' attribute in XForm, with which to add additional optional text for the specified image file; eg <value form="image" description="whale">a.jpg</value>

I have a slight preference for [A] just because in [B] the manner in which you specify the image filename (XML element body) vs image description (XML element attribute) is quite different. [A] also lends itself well to being able to specify an image description alone, although I don't see this being particularly useful for screen readers or when such might be required.

If [A] is the chosen path, then another unresolved issue - for both XLSForm and XForm - is:

if an image description is provided but an actual image is not, is this a fatal error (in either XLSForm, or in XForm for that matter) or just a warning?

Other things considered

For the purpose of just populating alt text just for the images shown against form controls - eg logos at the start of a form - this could also be accomplished using custom body attributes.

However, this would not readily accommodate permitting multi-language translations for image descriptions (it would effective entail adding a custom attribute for every language!).

Nor can corresponding custom attributes be specified for choice options AFAIK (and even if you could, you are back to custom attribute for every language...). Meaning you couldn't provide screen readers an alt text description for image-only based select questions.

Xiphware · November 18, 2024, 1:55am

BTW, another thing considered (and rejected) - for the purpose of populating web-rendered <img alt="xxx"> text - would be to have the web client (ie Enketo) simply always populate the alt text with, say, the label text, which is already being translated. Or possibly repurposing the existing hint or guidance_hint to acquire this text.

The problem with this is that it precludes the existing desired usecase of wanting to display a logo at the top of the form (without any text!). Hence purely separate placeholder text for the image.

Xiphware · November 18, 2024, 2:22am

Note, such image descriptions/alt text could also be useful for quickly previewing forms, without having to worry about the actual media files just yet. eg XLSForm Online will currently preview but only shows "image" for every placeholder.

Tino_Kreutzer · November 18, 2024, 3:55am

Thanks @Xiphware. Just to add that Kobo will happily would like to implement this new feature. But as always, we'd love the wise input of @LN and others since this would add an optional column to pyxform and would introduces an XForm spec change.

boazsender · November 18, 2024, 6:46pm

Thanks for posting this! I support this feature from an accessibility perspective and would like to use it in my work on Kobo and Enketo.

LN · November 21, 2024, 12:35am

Thanks for the thorough proposal and happy birthday!

This feels more like a downside than an advantage to me. Specifically, like you said, it means we have to figure out what to do with the case where a description exists without an image. Can you elaborate a bit on why you like [A] more? The description does feel like attribute data to me so [B] feels a bit more natural.

Have you considered accessibility needs for video and audio? I think it'd be worth giving them a bit of thought to see whether it could change this proposal. Both questions and choices can have any or all of image, video and audio associated with them.

I think it's defensible to recommend that any necessary description of audio or video go in label or hint text but you may have other ideas.

The short form is used by Collect in the summary screen. It's not currently exposed in pyxform so is probably really rare.

Xiphware · November 21, 2024, 8:38pm

So my interpretation of all the various permitted label form embelishments is that the XForm definition is basically providing a number of additional and/or alternative ways (ie different 'forms') in which to render its label - eg as abbreviated text, or visually, or audibly, ... - depending on the desired context and needs. eg

<text id="how-old-label">
    <value>How old are you?</value>
    <value form="short">Age</value>
    <value form="image">jr://images/b.jpg</value>
    <value form="big-image">jr://images/b_big.jpg</value>
    <value form="audio">jr://audio/goldeneagle.mp3</value>
</text>

so in that sense the text to be rendered in the context of a screen reader (when the control appearance or content doesnt permit rendering the regular label text...) is mostly just another flavor of a form embellishment. Hence "form=image_description", or equivalently "form=accessibility", seems a fairly natural extension of this rather functional approach to providing various alternatives to show instead the base text label.

It is also the case that with images we already have the situation where a form="big-image" alternative already effectively embellishes an existing one ("image") even further. So I dont see form="image_description" implicitly requiring an associated form="image" as necessarily divergent or incongruous (from media: "Specifying “big-image” alone has no effect, you must always include “image”"). Whereas introducing a description= attribute to the existing <value> element would be something entirely new to this sub-framework.

But again, its a slight preference and I certainly have no violent objection to the other.

LN · November 22, 2024, 6:36pm

That's compelling! Sounds good to me.

Any thoughts around any potential future desire to add accessibility information related to video and audio? In HTML5, both can include fallback content within their tag. But I think there's also a general recommendation to include descriptive text around the media assets so they may not need to be addressed in a special way.

Xiphware · November 22, 2024, 9:20pm

The obvious extrapolation - if we choose to go with a form="image_description" (or whatever name we want to call it) - would be a similar form="video_description" to provide text in lieu of a video snippet for such things like screen readers. And then of course, a form="audio_description" would seem to naturally follow.

Again, I think its also worth noting here again that these can also serve a useful purpose for things other than screen readers. This text can, for example, be used to provide a more meaningful placeholder for quickly previewing forms (without having to fetch everything in the manifest), or indeed especially in the case of audio when printing the form.

[It also just occurred to me that another potential issue with going with an option [B] approach - that is, introducing a new optional description attribute for all these media-related <value> elements - is there is now the potential of, say, having both image and big-image each having their own (different!) description [so presumably a screen reader may then need to flip between them?] Or what does a client do when the big-image has a description but the image doesn't? ...
A separate <media-type>_description form flavor - option [A] - would avoid this potential ambiguity.]

Xiphware · November 26, 2024, 10:09pm

Bump. So is there a general consensus that this would indeed be a useful new feature to add to ODK forms, and that the proposal above ([A]) is a reasonable approach to it? So that we may begin sizing the effort that will be entailed to get a PR ready, initially targeted for Enketo browser-based screen readers.

Or are there other outstanding issues that you think still need to be addressed first before proceeding? Thanks.

LN · November 26, 2024, 10:43pm

I'd like to get a sense of how likely it is that there will be a desire to have targeted alternative text for audio and video, possibly for the uses you mentioned. If you expect it's likely, I think it's worth spending a bit more time trying to come up with a way to express it in XLSForm that doesn't require this explosion of columns. It could be something like a single alt_text::lang column with keys/value pairs like we do with parameters, for example (image="", audio=""). In that case we'd want to take that approach in XLSForm from the start.

Unless anyone else has anything to add I think the client work could start any time introducing new forms.

Xiphware · November 26, 2024, 10:58pm

Not sure I fully undersand... Are you saying that the underlying proposed XForm representations - ie a new form attribute (eg form="image_description", or some other keyword) - is appropriate, but that there is still a question over how this should best be exposed in XLSForm (ie multiple columns for each media-type flavor vs single column with key/value pairs...)?

Client work can only meaningful proceed once a suitable XForm representation is agreed upon.

Lindsay_Stevens_Au · December 2, 2024, 12:53pm

I think the bar for adding more translatable columns should be relatively high. Currently there are many such columns available in XLS/XForms to meet a wide range of needs: label, hint, guidance_hint, constraint_message, required_message, image, audio, video. Maintaining / testing these with all possible feature regressions in pyxform is a challenge, and sometimes adding too many options can make it harder to learn and use the tools.

Can the use cases be elaborated with practical usages or motivating examples? The examples are quite abstract e.g. an option labelled "yes" having alt-text "eagle", and a question labelled "This is an English note" with "whale". If it is just about logos then maybe a different approach altogether is needed. Is it a realistic/likely scenario that a user would specify and translate alt-text, and require the no-buttons appearance such that this potentially useful content is not visible by default, and be deploying their survey exclusively to browser clients (in order to make use of the alt-text, assuming Android/Collect has no equivalent to alt-text)? I would assume any useful alt-text is good content that would be helpful for all users - in which case the existing text options of label/hint/guidance_hint could be used, and a screen reader would pick up these text fields. I mean, I don't really need subtitles yet but I always enable them in Netflix because they help comprehension sometimes. Is it realistic that a user would need to specify different alt-text for each media type? Has there been user research conducted with people that use screen readers, and people that design forms for those users?

Also I'm not clear on why the option of multi-purposing label/hint/guidance_hint was "rejected". If WebForms/Enketo could interpret a setting (survey-level, or question-level like the existing "appearance" column) to use text content as alt-text instead of displaying it normally, then the requirement is met. The WebForms/Enketo teams should probably have a look at this because it would be premature to add support to XForms/pyxform without knowing how, whether, or when those clients can use it - or what other solutions might be possible or feasible.

LN · January 6, 2025, 10:06pm

I do agree with this.

My hunch is that this is about compliance and not about direct end-use need. Every once in a while we'll talk to an organization that has a checklist related to section 508 or an equivalent accessibility law and wants to use alt tags. As far as I know, all of these laws mandate that alternate text be available somewhere but none of them mandates use of the alt tag. Some organizations or departments may have more restrictive requirements, however. Kobo team, have you been working with an organization like that by any chance?

Agreed, and so far all of the users who have brought this up to us have been able to find a suitable way to describe their visual content in a way that would be both accessible to screen readers and helpful to sighted users. Using the alt tag has not been a deal breaker. I can imagine that for some organizations it might be.

I'll admit I've gone back and forth on this as we've talked to organizations with related requirements. If you or someone you know is a screen reader user who uses ODK forms, we'd love to talk to you about your experience and needs! @Tino_Kreutzer @Xiphware, if you have such connections, please feel free to get us in touch.

Tino_Kreutzer · February 13, 2025, 12:22am

Hi @LN @Lindsay_Stevens_Au I'm so sorry for missing this, I didn't mean to take this long to respond!

Can the use cases be elaborated with practical usages or motivating examples?

The use case is completely blind educators in India who are taking surveys on Enketo. So no, this isn't just a compliance issue, it's the practical ability to read and answer the survey with the help of screen readers. I realize this critical detail was missing from @Xiphware's first post.

Also I'm not clear on why the option of multi-purposing label/hint/guidance_hint was "rejected".

I guess Gareth tried to explain this above:

...but it's a fair question whether Enketo could repurpose the guidance_hint for a note question that contains an image and inject it as the alt-text attribute. It just seems quite hacky. What if you want to display a guidance hint for a note question and also have an image? Maybe the guidance hint needs to be prefaced with alt-text:? It's doable but not elegant..

Maybe your suggestion was "have the hint explain what the image is about" - that's also a fair idea we suggested. The challenge is that screen readers will just say something like "image" and that can be confusing as the meaning of the image isn't obvious and writing "The image above is displaying ... " is a confusing message for non-blind people.

We are planning the other necessary work on the Enketo side to make real world forms work with screen readers for blind people (there are a few changes needed, we'll discuss these separately), but creating alt descriptions for images does require a spec change, hence this thread

LN · February 13, 2025, 1:15am

Thank you, that's really useful context!

Did you consider a question at the survey start to opt in or out of image descriptions? All the survey labels could be dynamic so that a person who opts in gets all of the image descriptions and a person who does not gets none of them. It could even do things like show image choices vs. text choices. Someone sighted who has trouble understanding images could toggle that mode at any time.

I'd be very interested to hear whether these users would prefer alt text over an approach like that.

My understanding is that most screen readers skip images without alt text but it may depend on the specific software.

We came to a reasonable agreement on using a new form in the XForm spec and like I said above I think that's enough to get the client work started if that still feels like the preferred direction. I wanted to push one more time because inline image descriptions can have benefits for sighted users as well! There are lots of interesting ways to include them in forms and I'm happy to share more form design ideas or examples if that would be helpful.

That's great, I think there are other aspects that will likely have even greater impact.

Tino_Kreutzer · February 13, 2025, 8:35pm

Thanks for the quick response! Yes, we explored workarounds like using dynamic text fields alongside images based on a screening question. However, the feedback was that this approach isn’t ideal, as it would require users to set up a relatively complex workaround just to ensure their Enketo forms are readable. And yes, some software doesn’t skip the image entirely but instead reads out “unlabeled image,” which can be confusing.

Given that we have an agreement on adding the new form attribute in the XForm spec, I think we can move forward in that direction. Of course, without pyxform/XLSForm support, it’s like having a car without wheels. Or a steering wheel.

So to clarify—are we aligned on figuring out the exact XLSForm method for implementing this, with the understanding that support for it will be added in some way?

LN · February 13, 2025, 10:16pm

Thanks, I'm glad you've had a chance to talk this through with end users!

Yes, exactly. If we're introducing it as a form, in XLSForms it will be passed through and localizable with e.g. media::image-description(::lang) without any pyxform changes. We can keep exploring whether we want to add an explicit alias for the column, checking for conditions like an image-description without an image, and explicit testing for it or simply document it as media::image-description for those who need it.

@Lindsay_Stevens_Au maybe something we should do is add some testing around the generic media::<form> if there isn't some already.

It does! Collect provides some content descriptions already and this would allow it to do the same for dynamic form content.

Xiphware · February 13, 2025, 11:32pm

Are you specifically thinking of this: https://developer.android.com/guide/topics/ui/accessibility/apps#describe-ui-element

(if so, that is what I was alluding to above when I said "there may be other circumstances where a purely text-based rendering of forms is desired - eg preview, printing, testing - so the availability of image descriptions in form definition could have uses elsewhere...". That is, perhaps being able to drive specific test flows for forms with images via their image tags; or in this case android:contentDescription)

Xiphware · February 17, 2025, 8:27pm

This seems like the most obvious manner by which to (fully) expose this via XLSForm; I'm not sure adding an additional explicit alias for it necessarily adds a great deal (whereas having image as an alias for media::image is a nicety), but it's obviously a possibility in the future.

However, this will require some pyxform changes, because pyxform is currently checking for valid media types, and will throw a fatal error on something like "image-description" as being unrecognized. eg

produces

There appears to be an explicit set of recognized media type, define in constants.py:

SUPPORTED_MEDIA_TYPES = {"image", "big-image", "audio", "video"}

This is being checked in survey.py, resulting in the above error:

if media_type not in constants.SUPPORTED_MEDIA_TYPES:
    raise PyXFormError(f"Media type: {media_type} not supported")

However, I suspect it may be a minimally invasive, minor code change to introduce a new media type for this purpose.

To be precise, I believe the proposal is that the following XLSForms would produce the following XForms (assuming we introduce something like "image-description" as a new form attribute in the ODK XForm spec).:

Untranslated:

ImageDescription_Proposed.xlsx (18.4 KB)
ImageDescription_Proposed.xml (3.5 KB)

Translated:

ImageDescriptionTranslated_Proposed.xlsx (18.8 KB)
ImageDescriptionTranslated_Proposed.xml (5.2 KB)

@LN @Tino_Kreutzer Is this correct/does this match your understanding?