Backend consequences of modifying ODK Collect forms

I regularly need to change our ODK Collect forms while data is already been collected. Sometimes I (need to) for example delete columns, rename them or group them. What are the backend consequences of such modifications, specifically for the API json?

For example, if I rename a column, is the old data then stored under this renamed column in the API json?

Is there any information to be found on how the behaviour is at the backend (API json) when you modify an existing form for:

  1. deletion of columns
  2. Renaming columns
  3. (un)grouping existing columns
  4. put columns in a repeat
  5. etc

Of course I can figure this out, but any feedback on this helps me to take things into account. Thanks!

1 Like

Short hint: Really try the best to avoid form changes during data collection. And always test any changes, incl. API and other exports, with a cloned form before deploying it to the field.

Esp. changing names (questions, choices), variable types, grouping or repeat structures, constraints and relevants may create severe challenges for data export and analysis work. See postings in the fora, e.g. https://community.kobotoolbox.org/t/download-data-from-files-before-changes-were-made/73225/15.

1 Like

Hi @Edmonds, here is a documentation section that I think will answer at least some of your questions: https://docs.getodk.org/central-forms/#updating-forms-to-a-new-version

One important thing to note is that no data is ever deleted by changes in the form definition. If you’re not seeing a field you expect in your downstream analysis tools, there’s always a way to access that data (not always pretty, but possible!).

Currently the JSON API (pyodk) can only return fields that are in the latest form definition. If you want your analysis to include fields that have been deleted without changing what data collectors see, you can reintroduce those fields in the form definition with a relevance of false(). For example, I’ve seen some XLSForms with a few groups and calculates at the bottom that exist only for the purpose of making values from earlier form versions available for analysis.

In general, I recommend trying to keep the columns populated with values and used for analysis the same across form versions if you can. You can still make significant changes to what data collectors experience. For example, if you introduce a repeat, you can join a repeated field’s values with “|” outside the repeat to be able to continue your analysis on flat data. Feel free to describe specific examples for ideas.

No, it is not. Renaming a field is the same as deleting it and then adding a new one.

You can either consolidate the data in your analysis step or keep the old field as a calculate that will store data from the user-facing field for analysis. When do you find yourself renaming columns? Has the question changed significantly in that case? Is it to have a better name for analysis?

A form field’s identity is actually represented by its name AND its parent groups and repeats. So changing the group and repeat structure that a field is in is like deleting the original field and then adding a new one.

Your questions have made me think of several improvements we could make to the documentation. If you have any other specific suggestions, we’d love to hear them.