Changing entities in ODK Central with pyODK

Hi there,
I've recently started using pyODK to interact with our ODK Central API both submissions and entities. Thank you for the useful guidance here @LN!

I have something specific I would like to do with our entities dataset.

I would like to return a list of all the entities and change some of the entries based on a filter condition. For example, we would like to add the correct spellings of villages for over 16000 entries in entities.

Rather than create an ODK form that does this manually I thought I could do this with pyODK.

I'm using a jupytper notebook and got this far. I'm not sure if this is the correct way to do it. I was trying to return a dataframe and then modify a column based on a condition. The 3 lines of code work, but the rest do not.

Any help would be much appreciated.

Charlie

from pyodk.client import Client
client = Client()
data = client.entities.list(project_id='x1', entity_list_name = 'entity_name')
# following lines throw the error: TypeError: list indices must be integers or slices, not str
df = pd.json_normalize(data=data['value'], sep='/')    
df.head(3)

Hey, @CharlieKeyes ! I wrote the following code for the same. It should be able to give an idea about how the same can be achieved! :smile:

Python Code:

from pyodk.client import Client

########################### Parameters ###############################
config_path = "config.toml"
cache_path = "cache.toml"
project_id = 1
name_of_entity_list = "my_entity_list"
column_name_in_entity_list_containing_incorrect_village_name = "village_name"
incorrect_village_name = "Bearville"
correct_village_name = "Foxville"
######################################################################

data = {}
data[column_name_in_entity_list_containing_incorrect_village_name] = correct_village_name
with Client(config_path=config_path, cache_path=cache_path) as client:
    submissions = client.entities.get_table(
        entity_list_name=name_of_entity_list, project_id=project_id
    )["value"]
    for submission in submissions:
        if (submission[column_name_in_entity_list_containing_incorrect_village_name] == incorrect_village_name):
            client.entities.update(
                uuid=(submission["__id"]),
                entity_list_name=name_of_entity_list,
                project_id=project_id,
                data=data,
                force=True,
            )
            print("Successfully updated " + submission["__id"])

Hope you find it helpful! :smile:

2 Likes

Thank you that is useful, I also wanted to know how to do this with multiple incorrect villages for a multiple entities. For example I have the following patient IDs and I would like to change their village name from 'other' to a correct village.

corrected village names
pid, correct_village
1, xyz
2, zsw
3, cvf
4, afc

entity data to be changed
pid, village
1, unknown
2, unknown
3, unknown
4, unknown

I think my python skills are not quite there yet, (I mainly use R and don't think this is possible with ruODK yet). But I think I have to create an array, merge it and loop over it?

any code example would be helpful!

Thank you

Hey @CharlieKeyes , I have wrote the code for the same! :smile:

  1. I am assuming that the pid in the corrected village names table (a.csv with only two columns) is unique for each record and is not repeated throughout.
  2. Also, I shall be using a lil-bit advanced approach here ("Hashing") to significantly enhance the performance for the code (as you mentioned that there might be 16,000 records to update).
  3. Please, make sure to well-test the code in the test environment before using it on the official entity_list!
  4. As, there may be 16,000 entries to update, that would be equivalent to making 16,000 requests to the server within a short span of time (as the code will run quite fast)! i.e. make sure that the server is not request limited for a short span of time (like AWS often blocks these many amount of requests to a server made within a short duration to prevent DDoS attacks for security purposes). If it triggers, the next best approach would be either to disable the security @ AWS (or others) temporarily or update the code and add delays after each 1000 requests or use the proxies! :sweat_smile:

Python Code:

import pandas as pd
from pyodk.client import Client

######################################### Parameters ############################################
path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name = "my_csv.csv"
column_name_for_pid_in_csv = "pid"
column_name_for_corresponding_correct_village_in_csv = "correct_village"
column_name_for_pid_in_entity_list = "pid"
column_name_for_corresponding_incorrect_village_in_entity_list = "village"
config_path = "config.toml"
cache_path = "cache.toml"
project_id = 1
name_of_entity_list = "my_entity_list"
#################################################################################################

df = pd.read_csv(path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name)
correct_village_dict = df.set_index(column_name_for_pid_in_csv)[column_name_for_corresponding_correct_village_in_csv].to_dict()
with Client(config_path=config_path, cache_path=cache_path) as client:
    submissions = client.entities.get_table(
        entity_list_name=name_of_entity_list, project_id=project_id
    )["value"]
    for submission in submissions:
        if submission[column_name_for_pid_in_entity_list] in correct_village_dict:
            data = {
                column_name_for_corresponding_incorrect_village_in_entity_list: correct_village_dict[submission[column_name_for_pid_in_entity_list]]
            }
            client.entities.update(
                uuid=(submission["__id"]),
                entity_list_name=name_of_entity_list,
                project_id=project_id,
                data=data,
                force=True,
            )
            print(
                "Successfully Updated: "
                + submission["__id"]
                + " (from "
                + submission[column_name_for_corresponding_incorrect_village_in_entity_list]
                + " to "
                + correct_village_dict[submission[column_name_for_pid_in_entity_list]]
                + " for "
                + submission[column_name_for_pid_in_entity_list]
                + ")"
            )

Hope you find it helpful! :smile:

1 Like

The next ruODK release will support entities as well as create/delete actions. Coming soon!

4 Likes

Thank you this is perfect. I had some missing values with threw an error. So I updated it to the following

import pandas as pd
from pyodk.client import Client

######################################### Parameters ############################################
path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name = "my_csv.csv"
column_name_for_pid_in_csv = "pid"
column_name_for_corresponding_correct_village_in_csv = "correct_village"
column_name_for_pid_in_entity_list = "pid"
column_name_for_corresponding_incorrect_village_in_entity_list = "village"
config_path = "config.toml"
cache_path = "cache.toml"
project_id = 1
name_of_entity_list = "my_entity_list"
#################################################################################################

df = pd.read_csv(path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name)
correct_village_dict = df.set_index(column_name_for_pid_in_csv)[column_name_for_corresponding_correct_village_in_csv].to_dict()
with Client(config_path=config_path, cache_path=cache_path) as client:
    submissions = client.entities.get_table(
        entity_list_name=name_of_entity_list, project_id=project_id
    )["value"]
    for submission in submissions:
        if submission[column_name_for_pid_in_entity_list] in correct_village_dict:
            data = {
                column_name_for_corresponding_incorrect_village_in_entity_list: correct_village_dict[submission[column_name_for_pid_in_entity_list]]
            }
            try:
                client.entities.update(
                     uuid=(submission["__id"]),
                     entity_list_name=name_of_entity_list,
                     project_id=project_id,
                     data=data,
                     force=True,
               )
            print(
                "Successfully Updated: "
                + submission["__id"]
                + " (from "
                + submission[column_name_for_corresponding_incorrect_village_in_entity_list]
                + " to "
                + correct_village_dict[submission[column_name_for_pid_in_entity_list]]
                + " for "
                + submission[column_name_for_pid_in_entity_list]
                + ")"
            )
        except Exception as e:
              print(f"Failed to update {submission['__id']}: {e}")
1 Like

Great! You're welcome! :smile:

2 Likes