Handling XLSForm --> XForm conversion in memory

Issue

The current implementation of pyxform requires an on-disk XLSForm to be converted to an on-disk XForm.

This is useful for running pyxform via command line, but if the code is used within a script or other tool, it may be useful to handle this entirely in memory.

Example

For example if pyxform is used within a web API to convert a user-provided XLSForm to an XForm, then continue with further processes.

The XLSForm is sent frontend --> backend API and received as an in memory object.

The code would have the write this to a temp file, convert the file, then output to a temp XForm file.

Possible Solution

Alternatively, the in-memory XLSForm could be converted to an in-memory XForm directly, saving the disk usage for two reasons:

  • Reduce file system clutter and issues if the temp files are not deleted (filling up the space).
  • Performance improvement as no on-disk IO is required and everything is handled in memory.

This approach would possibly provide these benefits to pyxform-http maintained by ODK.

Discussion

I made a PR for this to XLSForm/pyxform, but thought it would be best to gauge interest from the community here.

At HOT, we have a use case that is very similar to pyxform-http: a web API that received a user provided XLSForm and needs to generate an XForm.

Does any one else have a similar workflow, requiring the XLSForm --> XForm conversion to be handled in memory instead of disk files?

@Ivangayton @Niraj_Adhikari

2 Likes

What high-level problem are you trying to solve?
If your XLSForm definitions are not necessarily files on a filesystem, and you don't necessarily want files to be written for the conversion result, it can be tricky to see where to plug in to pyxform.

Any ideas on how ODK could help you solve it?
The below PR proposes a centralised interface that accepts a range of data types (now including markdown tables, and dictionary data), feeds that through the conversion pipeline, and returns the result data. Except for moving a few functions around, pyxform should be compatible with existing use cases, but it might make sense to deprecate directly using pieces of the pipeline.

Upload any helpful links, sketches, and videos.
Please see PR here: https://github.com/XLSForm/pyxform/pull/712 If you have any comments, ideas, or feedback please discuss here.

2 Likes

Love this :heart:

When converting an xlsform from software it was multi-step process previously.

Now I could simply run:

from pyxform.xls2xform import convert as xform_convert

xform_data = xform_convert(xlsform_data)
1 Like

@Ukang_a_Dickson I think an official public API for converting from md will be of interest to you, please take a look at the PR if so.

@Lindsay_Stevens_Au I hope you don't mind that I moved your Ideas thread to this existing thread! There's always some ambiguity in where developer-focused ideas should go. Here I figured there'd be value in keeping all context together.

1 Like

Note to anyone needing to use this functionality in the future:

  • The result from xls2xform.convert is a class ConvertResult:
@dataclass
class ConvertResult:
    """
    Result data from the XLSForm to XForm conversion.

    :param xform: The result XForm
    :param warnings: Warnings raised during conversion.
    :param itemsets: If the XLSForm defined external itemsets, a CSV version of them.
    :param _pyxform: Internal representation of the XForm, may change without notice.
    :param _survey: Internal representation of the XForm, may change without notice.
    """

    xform: str
    warnings: list[str]
    itemsets: str | None
    _pyxform: dict
    _survey: "Survey"
  • You must access the xform attribute, for example using with BytesIO:
from pyxform.xls2xform import convert as xform_convert

xform_bytesio = BytesIO(xform_convert(input_data).xform.encode("utf-8"))