Missing data from ODK Central export when the form structure has been updated

1. What is the problem? Be very detailed.

I had to move some variables from one group (or page) to another in an effort to optimise our data collection, while keeping the same name since we are still collecting the same information.
Let's for instance consider I started with var1 in group1 then decided to move it later in group2

I was expecting to end up with two columns with the group1-var1 and group2-var1 (which I would then have to retrospectively merge), however, it seems that the export is constrained by the latest published version of the ODK form, and only allows var1 to be exported once for the group that is used in this latest version, which means I only export group2-var1 but I do no longer export group1-var1 using the CSV export... (I cannot tell about odata since these data are encrypted).

Similarly, values for variables that have been removed from the form during data collection are not exported, which is not a behaviour I would expect, especially when adjusting the data collection and trying to adjust the data collection strategy.

Not sure if I am missing something absolutely obvious here as I have not seen this reported anywhere, but it does not seem to me like a desirable behaviour for the export function as I would like to be able to download all the data that have been collected for this form (independently on its current structure). Any thoughts welcome.

2. What app or server are you using and on what device and operating system? Include version numbers.

ODK Central
versions:
client (v1.2.2)
server (v1.2.1)

3. What you have you tried to fix the problem?

I have moved back var1 to group 1 to check that the data are still there and I indeed now retrieve the column group1-var1, but I am losing group2-var1
I have tried exporting CSVs filtering submissions up to the date of the form update, but I am unable to retrieve the same data structure I was able to export before the form update.
I have also looked at the description of the API (Exporting Form Submissions to CSV via POST) to see if there is a parameter that could be used to retrieve the whole dataset, but I have not seen anything that could help me to get around this issue (except maybe publishing a previous version of the form to retrieve the former data structure, but then I will miss the most recent data that are collected with the current form)

4. What steps can we take to reproduce the problem?

  • Use a form with 2 groups
  • Start collecting data with var1 in group 1
  • Then publish a new form with var1 moved in group2 or var1 removed from the XLSForm
  • Collect more data
  • Download the CSV export

5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.

Hello @Thalie ,

maybe this discussion and Hélène's explanations will help :

Can you still send data from old versions of forms to ODK central?

2 Likes

Oh, thanks a lot @mathieubossaert! This is indeed extremely useful, I had not seen this thread.

Just wondering if somebody can confirm if it is possible to retrieve XML submissions when the project is encrypted? I see the GET request to retrieve XML submissions, but I guess I would need a POST request?

Also in my case, I unfortunately cannot use LN's trick for the variables that have been moved between groups (I had though of something similar using calculation types) since the form validation does not allow variable names to be used several times in the same form (even when in different groups). I still feel from a data management perspective that no assumption should be made about the data export/analysis based on the data collection tool that is used at time T. These are two separate things for me and I would let this decision to the person who is managing data (or at least provide this as an option).

To give an example, in the database I am talking about, we have 2000+ older submissions with diagnoses. I have moved this variable to another group in the same form (because of operational constraints we had on when this data needs to be collected in our data collection workflow). The statistical team is still really willing to analyse all the diagnoses we have collected (and this is my task to merge information spread in different columns back in one single column so that the fact different form versions where used to collect this same information is not directly visible in the final database). Also we would like to be able to compare and see if this strategy is better than the latest strategy. Even for variables we stopped collecting we may want to do do a subset analysis.

Another example is that we have automated reports that run on the database, now that some variables have been removed from the form, the statistics associated to this variable in the report displays 0% for the whole database, which is not consistent with the reports that were generated 1 week ago (actually that's how I started investigating and realised these data were no longer exported).

Last, let's imagine somebody with project manager rights publishes a version of the form where some variables have been removed in the last week of the data collection, this could not be detected by other users and we may be missing variables in the export that had actually been collected in 99% of the submissions.

Have you got earlier exports where the data is not yet missing? Would your analysts be willing to merge the different exports at their end?

I manage form schema churn via ETL into a custom data warehouse. Form changes simply require a change in the ETL, and sometimes we change the warehouse data schema too.

In addition to setting the relevant column to false() as suggested by @LN for variables that have been dropped from the data collection, I am copying pasting here the solution kindly provided by @aurdipas to handle groups of variables that have been in another group in the same form - in case anybody else ends up with the same issue.

  • Use the ODK XLSForm Offline (release v2.0)
  • you can uncheck the option to validate. So the form with the X variable with same name in different groups will be converted.
  • only thing that you would need to change is the way reference to this variables if you use these in calculation/relevant/constraint columns
  • you can not use the typical ${a3_b_2} but you would need to use the full path /data/previous_enrolment/a3_b_2 (as an example)
  • after you convert it it uploads on Central smoothly

On a note, you need to manually edit your XML form if you moved child groups within parent groups (as the converter does not allow for several groups with the same names).

Still not ideal for everyday data management, but handy in "emergency" situations.

You then have to merge "duplicated" columns from the export (but this was expected as the structure had been changed) so that the change is transparent for the team analysing the data.

2 Likes

thanks @thalie !

We had the same problem a few days ago. We were lucky because no submissions occurred after we updated the form and changed its structure (by moving a question into a group). So we created a new version, hiding the old question (relevant set to false() ) outside the group and creating a new question with a new name in the group.
Thus, we were able to recover all the data thanks to OData :wink:

2 Likes