I made a lengthy lenghty form with around 500 variables. And as humans may, I discovered few spell mistakes in variable names. But the form has been deployed and has adound 300k submissions. I know that if i correct the variable name now, central will take it as a new field altogether, and the subsequent old data issues. But the variable is needed to be fixed somehow.
Any idea and method to do this internal 'surgery'?
I will say that you can download that data and then rename your variable. Once your variable name has been renamed you can easily use all the data.
For renaming of the variables. I personally use STATA. Please follow this guide.
What I usually do is export all the variable name in one column and in next column I entered new variables for that team and used concat function into excel. Once my code is ready in excel. I copy all the code from excel and put into STATA.do file.
Example: I named on variable as 'person_disablility' (note the spelling mistake). Now if I correct it to 'person_disability', 2 things will happen:
Central will treat it as a new variable, and make a new field/column for it.
My submission data for this variable/column will be left behind in the old variable/column, while the new data will come in new variable (the correct spelling one). This will cause a disconnect in any tool which is doing data processing or visualization at the output of ODK Central.
As noted in the link I previously shared you can change the old field to a calculate that pulls the new field's value in. Your analysis can then continue on the old field.
Looks like I misunderstood your question. You would like to pull the old values into the new field.
If you are comfortable in Python, you can rename the field in the submission XML with pyODK after you update the form definition with the new spelling.
Below is a very untested example Python script to rename the agge column to age. You should run this on a test form and a test submission first. It will not work for repeats.
With 300k submissions, you'll also want to run the script on the server itself so you don't have to deal with a network delay.