Correcting a mis-spelled variable in deployed form

Hi,

I made a lengthy lenghty form with around 500 variables. And as humans may, I discovered few spell mistakes in variable names. But the form has been deployed and has adound 300k submissions. I know that if i correct the variable name now, central will take it as a new field altogether, and the subsequent old data issues. But the variable is needed to be fixed somehow.

Any idea and method to do this internal 'surgery'? :slight_smile:

I will say that you can download that data and then rename your variable. Once your variable name has been renamed you can easily use all the data.
For renaming of the variables. I personally use STATA. Please follow this guide.

What I usually do is export all the variable name in one column and in next column I entered new variables for that team and used concat function into excel. Once my code is ready in excel. I copy all the code from excel and put into STATA.do file.

This is small hack for the work.

Best,
@iamnarendrasingh

There are lots of options to consider at https://docs.getodk.org/central-forms/#updating-forms-to-a-new-version. What exactly do you mean by "the subsequent old data issues".

1 Like

Example: I named on variable as 'person_disablility' (note the spelling mistake). Now if I correct it to 'person_disability', 2 things will happen:

  • Central will treat it as a new variable, and make a new field/column for it.
  • My submission data for this variable/column will be left behind in the old variable/column, while the new data will come in new variable (the correct spelling one). This will cause a disconnect in any tool which is doing data processing or visualization at the output of ODK Central.

As noted in the link I previously shared you can change the old field to a calculate that pulls the new field's value in. Your analysis can then continue on the old field.

2 Likes

Looks like I misunderstood your question. You would like to pull the old values into the new field.

If you are comfortable in Python, you can rename the field in the submission XML with pyODK after you update the form definition with the new spelling.

Below is a very untested example Python script to rename the agge column to age. You should run this on a test form and a test submission first. It will not work for repeats.

With 300k submissions, you'll also want to run the script on the server itself so you don't have to deal with a network delay.

python3 central-submission-column-rename.py --project_id 1 --form_id my-form --old_name age_group/agge --new_name age_group/age

central-submission-column-rename.py
import argparse
import uuid
import xml.etree.ElementTree as et

from pyodk.client import Client

parser = argparse.ArgumentParser()
parser.add_argument('--project_id', type=int, required=True)
parser.add_argument('--form_id', required=True)
parser.add_argument('--old_name', required=True)
parser.add_argument('--new_name', required=True)
args = parser.parse_args()

client = Client()

for submission in client.submissions.list(args.form_id, args.project_id):

    old_xml = client.get("projects/"+ str(args.project_id) + "/forms/" + args.form_id + "/submissions/" + submission.instanceId + ".xml").text
    xml_root = et.fromstring(old_xml)

    node = xml_root.find(args.old_name)
    
    if node is not None:

        node.tag = args.new_name
        
        meta_node = xml_root.find('meta')
        instance_id_node = meta_node.find('instanceID')
        deprecated_id_node = meta_node.find('deprecatedID')
        if deprecated_id_node is None:
            # submission has never been edited, add deprecated_id
            deprecated_id_node = et.SubElement(meta_node, 'deprecatedID')
        # in case submission has been edited, use the instance_id in the xml
        deprecated_id_node.text = instance_id_node.text
        instance_id_node.text = 'uuid:' + str(uuid.uuid4())
        
        new_xml = et.tostring(xml_root)
        client.submissions.edit(instance_id, new_xml, form_id=args.form_id, project_id=args.project_id)