Excluding fields from output file

Paul_Bessell · May 19, 2023, 9:07am

Hello friends

I don't think that this feature exists, but please correct me if I have simply not found it.

I have a form wiht a large number of calculate and note question types. These are not outputs from the survey, instead they control the internal form logic and provide feedback to the ODK Collect user. As a result, I do not need to export them, but when I export the data from ODK Central, the exported csv files, or the import into R contains a lot of columns that are of no use and the first step is to strip out all the clutter. I find this a more acute problem when demonstrating ODK's functionality to people that ar enot so familiar wiht this great tool.

I think it would be great if one could control from XLSform wwhich columns get exported. A single column in XLSform called export, and if set to false it would not export that item and we could have a clean export dataset. I wonder if this is something being considered among the other great developments that are in the pipeline

LN · May 25, 2023, 5:46pm

Thanks for filing this.

Here are a few things you can do today:

add a prefix to the names of all fields that you want to drop from analysis. For example, you can use a note_ prefix for all notes or you can use a general noanalysis_ prefix. Then you can quickly drop those columns as the first step in your analysis (e.g. you can see something similar in this pyODK example where we drop all columns with the __system prefix). This is simple but it doesn't help reduce data that's transferred from the server.
when importing into your analysis platform, if you use the OData feed, you can specify which columns you want to request using the $select query parameter. For example, if you specify $select=__id,species,color,weight, you will only get those 4 fields. This limits what actually gets pulled from the server so it can make the fetch much faster. But of course the downside is that you have to explicitly list every field in the query.
create entities from your submissions and only save a subset of form fields as entity properties. More in the docs: https://docs.getodk.org/central-entities/. You can then connect to the OData feed for the generated Dataset rather than the OData feed for the raw submissions. Note that once an entity is generated from a submission, submission edits don't get applied to the entity.

We'll also consider this but no sense of timeline yet!

wroos · September 30, 2023, 4:26pm

Hint: KoboToolbox has an option where you can chose the form items to download on server level (and save the settings).

I think this is a more flexible approach than defining this inside the form. For ex. you might have a question in a previous form version which is meanwhile deleted but you might want to include the data in the download.

wroos · October 5, 2023, 6:38pm

In addition to LN: If this export is done after the data collection, you might create (and deploy) a new version of your form version for the data export, excluding the variables. (But this will not work well with the export option all versions)

Take care that you keep a XLS Copy of the previous version, to be able that you can set it back afterwards.