Pyxform / XLSForm proposed deprecation of XLS format support in favour of XLSX

As discussed in Pyxform Issue #501, the following deprecation plan for XLS in favour of XLSX has been proposed. There is no specific timeframe for these steps, however PR #575 has now been merged, meaning that Phase 1 would begin at the next Pyxform release.

Phase 1: Release next version with both xlrd and openpyxl dependencies
Phase 2: (Optionally) Isolate each backend into a separate file (e.g. pyxform/backends/xlsx.py, pyxform/backends/xls.py).
Phase 3: Move xlrd from install_requires to an entry in extras_require and make all xlrd imports conditional (See this example). Users who pip install pyxform will not have XLS support by default, but those who need it would pip install pyxform[oldexcel] instead.
Phase 4: Remove XLS support entirely.

Please add any comments or feedback on the above plan or desired timeframes for support. For those requiring XLS support, please share any particular roadblocks that prevent XLSX adoption, and any ideas around what it would take to facilitate XLSX adoption (for example docs for using LibreOffice to convert).

1 Like

I get the feeling phase 4 could be a bit controversial, but I really don't know the user base well enough to say for sure.

I imagine that eventually there will be Python updates that will break xlrd. That feels like it will be the natural driver to remove xls support. I don’t see a reason to remove it before, do you, @Lindsay_Stevens_Au?

I also don’t see a big need for phases 2 and 3 proposed above. Phase 2 certainly would make things neater but if it’s not code that is likely to be modified it doesn’t really matter (perhaps I’m wrong on this). Phase 3 would only make sense to me if there’s some risk in maintaining xls support. I can’t think of one and if you can it would be helpful to have it spelled out.

Phase 3 might be a nice speed increase right? I suppose importing xlrd even when we don't need it isnt the biggest bottleneck though.

If Phase 2 is literally just moving some functions around than it could make the code a lot easier to follow, perhaps it's worth doing if it takes < 1 hour or something.

I think the main issues are 1) xlrd is an unmaintained dependency so it may not necessarily be known if/when some kind of security issue or bug is found, and 2) there are two code branches (XLS/XLSX) in the backends doing much the same thing, so changes (if any) need to be replicated carefully.

Phases 2 and 3 are mostly about doing a gentle transition to deprecation, but they could be combined or shortened e.g. provide fair warning and jump straight to Phase 4.

Since posting about 10 days ago we have almost 100 views but no new commenters (outside of issue #501) which suggests perhaps XLS is not that widely used? I'm relatively new to this forum so perhaps this actually does not mean anything.

It seems like it shouldn't be painful for users to convert their legacy XLS files to XLSX going forward, but I'd love to know what percentage are still using that format. We will try to find out from kobo users and will share back here.

@LN @yanokwa Are you tracking by chance what file extensions are being uploaded to http://xlsform.getodk.org/?

My feeling is that if the percentage is >30% it could be useful to investigate more if there are more reasons than just old templates being re-edited over time.

There are also free converter extensions for old MS Excel versions available. I found this list of tools that support XLSX very useful.

We don't actively track, but looking at the logs, over the last 1.5 months, we've had ~16k forms uploaded and it's been less than 5% XLS. And there are likely duplicate forms there, so the number is lower.

2 Likes

@yanokwa while we're at it - what's the proportion of XForms uploaded?