Reduce the File Size of the Google Sheet Template XLSForm in the Docs

What high-level problem are you trying to solve?

While many developed countries provide internet speeds that can download/upload files at 1 MB/s or more, areas with poor internet are still getting speeds in the KB/s and sometimes even Bytes/s…yes that’s right…millions of times slower than the developed world. While most web pages won’t even load with that poor of a connection, sending form data, etc. can often be done without many problems.

I recently noticed that the sample ODK Template in Google Sheet form, when downloaded as an XLSX is about 525KB, but if the google sheet has its empty columns deleted, it’s reduced down to 105KB. While this may be trivial for most, this simple change doesn’t take away anything and could help in reducing unnecessary bandwidth for poor internet contexts by reducing the ODK Template File size from 525kb to 105kb simply by deleting empty columns in the google sheet. I’m consistently iterating and uploading to ODK Central on a poor connection and I’m wondering if this would also help for devices downstream that need to download the form for Collect/Enketo. I’m not sure if the devices on Collect are only getting the XML, but I think about José operating in places where people have to walk days to get internet running a project with 1,000s of devices.

Looking at the docs here there are two links :

Google Sheets file when downloaded as XLSX is 455KB

Excel file is currently 591KB

I’m not sure what additional data is being stored in the Excel file, but reducing an unnecessary empty file space by 4.5x on a poor connection sounds good to me! :smiley: @yanokwa has talked about how little changes being multiplied millions of time can be quite a big deal with ODK. I don’t know if this is too trivial of a change, but just for fun I did the math if this change saved 1s for 1million downloads/uploads that would be:

1,000,000s / (60s x 60min x 24hrs) = 11.57 days (is that math right?)

Any ideas on how ODK could help you solve it?

Here is a link to a google sheet with the empty columns deleted. I changed to formula in C2 of the Settings sheet to be a little easier to read

=TEXT(NOW(),"yyyy_mm_dd_hh:mm-ss")

The survey and choices sheet now only have 200 rows but people can easily add more as they need or you could add more since I think somewhere it was stated that the average XLSForm has around 400 questions.

Upload any helpful links, sketches, and videos.

2 Likes

I don’t have the privileges to fix it but it seems to be due to cell styling applied to all rows/columns in the template. Downloading as .ods format gives 80KB file but you’d still need to convert back to .xlsx (which becomes 500KB again) to upload it. There are much lighter file formats (.csv, .md - mere bytes!) that pyxform can process but that would require a local pyxform installation which is less easy than fixing the template. (logged an issue for follow-up)

1 Like

As far as I know, the XLSForm is converted/”compiled” to a XForm - which is the format used for deployment. So, to check, please, if your size finding is then still relevant.

Thanks for highlighting this and proposing at least one piece of the solution, @Tyler_Depke! You’re absolutely right that we try to be mindful of file size and completely missed this.

@wroos you’re right that the XLSForm file size won’t affect people using forms to capture data. However, it’s pretty annoying and time consuming to have to upload a big file when iterating on a form’s design on a poor connection.

This post may be better for a new topic, but while I’m looking at changing my own workflow I decided to do a few upgrades to the template. Check out the original link I posted again. In the survey tab I added a column in the survey called “allowed_appearances” which uses an arrayformula to create a CSV list based on type in Column A by checking that with the values in the :eyes: Appearances tab. I also created a conditional formatting rule for the appearance column to get highlighted red if the appearance value doesn’t exist in the allowed_appearances column. This also works for multiple appearances so something like

no-buttons columns-pack

wont get highlighted.

One can still enter appearances that are highlighted as invalid, which for example columns-2 doesn’t match the literal string for columns-n, but I don’t think there are other issues. Ideally one would be able to make a dynamic dropdown for only the valid appearance types, but without a google script or creating dynamic list for every single row, I don’t think that’s possible. Here is a GScript that does that, but this requires internet to run/populate so maybe include somewhere as optional…

appearance_auto_dropdowns.gs.zip (1.1 KB)

In summary I made two changes to the survey sheet.

  1. I inserted a column called allowed_appearances with this formula in the header:

={"allowed_appearances"; ARRAYFORMULA(
IF(A2:A="","",
BYROW(A2:A, LAMBDA(type_full,
LET(
type, INDEX(SPLIT(type_full, " "), 1),
TEXTJOIN(",", TRUE,
VSTACK(
IFERROR(
FILTER(
'👀 Appearances'!$B$7:$B$48,
REGEXMATCH('👀 Appearances'!$D$7:$D$48, "(^| )" & type & "( |$)")
),
""
),
IF(type="","",
IF(LEFT(type,6)="select",
IFERROR(
FILTER(
'👀 Appearances'!$B$7:$B$48,
'👀 Appearances'!$D$7:$D$48 = "all selects"
),
""
),
""
)
)
)
)
)
))
)
)}

  1. I added conditional formatting rule located on top of the other two that applies to the appearance column G2:G with a custom formula of:

=AND( G2<>"", SUM( N( ISNA( MATCH( SPLIT(TRIM(G2), " "), SPLIT(H2, ","), 0 ) ) ) )>0 )

  1. I also added the counter appearance to the appearances tab and not sure if that only applies to integer type or not.
1 Like