Excluding fields from output file

Hello friends

I don't think that this feature exists, but please correct me if I have simply not found it.

I have a form wiht a large number of calculate and note question types. These are not outputs from the survey, instead they control the internal form logic and provide feedback to the ODK Collect user. As a result, I do not need to export them, but when I export the data from ODK Central, the exported csv files, or the import into R contains a lot of columns that are of no use and the first step is to strip out all the clutter. I find this a more acute problem when demonstrating ODK's functionality to people that ar enot so familiar wiht this great tool.

I think it would be great if one could control from XLSform wwhich columns get exported. A single column in XLSform called export, and if set to false it would not export that item and we could have a clean export dataset. I wonder if this is something being considered among the other great developments that are in the pipeline


Thanks for filing this.

Here are a few things you can do today:

  • add a prefix to the names of all fields that you want to drop from analysis. For example, you can use a note_ prefix for all notes or you can use a general noanalysis_ prefix. Then you can quickly drop those columns as the first step in your analysis (e.g. you can see something similar in this pyODK example where we drop all columns with the __system prefix). This is simple but it doesn't help reduce data that's transferred from the server.
  • when importing into your analysis platform, if you use the OData feed, you can specify which columns you want to request using the $select query parameter. For example, if you specify $select=__id,species,color,weight, you will only get those 4 fields. This limits what actually gets pulled from the server so it can make the fetch much faster. But of course the downside is that you have to explicitly list every field in the query.
  • create entities from your submissions and only save a subset of form fields as entity properties. More in the docs: https://docs.getodk.org/central-entities/. You can then connect to the OData feed for the generated Dataset rather than the OData feed for the raw submissions. Note that once an entity is generated from a submission, submission edits don't get applied to the entity.

We'll also consider this but no sense of timeline yet!

1 Like