Hi there,
I've recently started using pyODK to interact with our ODK Central API both submissions and entities. Thank you for the useful guidance here @LN!
I have something specific I would like to do with our entities dataset.
I would like to return a list of all the entities and change some of the entries based on a filter condition. For example, we would like to add the correct spellings of villages for over 16000 entries in entities.
Rather than create an ODK form that does this manually I thought I could do this with pyODK.
I'm using a jupytper notebook and got this far. I'm not sure if this is the correct way to do it. I was trying to return a dataframe and then modify a column based on a condition. The 3 lines of code work, but the rest do not.
Any help would be much appreciated.
Charlie
from pyodk.client import Client
client = Client()
data = client.entities.list(project_id='x1', entity_list_name = 'entity_name')
# following lines throw the error: TypeError: list indices must be integers or slices, not str
df = pd.json_normalize(data=data['value'], sep='/')
df.head(3)
Hey, @CharlieKeyes ! I wrote the following code for the same. It should be able to give an idea about how the same can be achieved!
Python Code:
from pyodk.client import Client
########################### Parameters ###############################
config_path = "config.toml"
cache_path = "cache.toml"
project_id = 1
name_of_entity_list = "my_entity_list"
column_name_in_entity_list_containing_incorrect_village_name = "village_name"
incorrect_village_name = "Bearville"
correct_village_name = "Foxville"
######################################################################
data = {}
data[column_name_in_entity_list_containing_incorrect_village_name] = correct_village_name
with Client(config_path=config_path, cache_path=cache_path) as client:
submissions = client.entities.get_table(
entity_list_name=name_of_entity_list, project_id=project_id
)["value"]
for submission in submissions:
if (submission[column_name_in_entity_list_containing_incorrect_village_name] == incorrect_village_name):
client.entities.update(
uuid=(submission["__id"]),
entity_list_name=name_of_entity_list,
project_id=project_id,
data=data,
force=True,
)
print("Successfully updated " + submission["__id"])
Hope you find it helpful!
2 Likes
Thank you that is useful, I also wanted to know how to do this with multiple incorrect villages for a multiple entities. For example I have the following patient IDs and I would like to change their village name from 'other' to a correct village.
corrected village names
pid, correct_village
1, xyz
2, zsw
3, cvf
4, afc
entity data to be changed
pid, village
1, unknown
2, unknown
3, unknown
4, unknown
I think my python skills are not quite there yet, (I mainly use R and don't think this is possible with ruODK yet). But I think I have to create an array, merge it and loop over it?
any code example would be helpful!
Thank you
Hey @CharlieKeyes , I have wrote the code for the same!
- I am assuming that the pid in the corrected village names table (a.csv with only two columns) is unique for each record and is not repeated throughout.
- Also, I shall be using a lil-bit advanced approach here ("Hashing") to significantly enhance the performance for the code (as you mentioned that there might be 16,000 records to update).
- Please, make sure to well-test the code in the test environment before using it on the official entity_list!
- As, there may be 16,000 entries to update, that would be equivalent to making 16,000 requests to the server within a short span of time (as the code will run quite fast)! i.e. make sure that the server is not request limited for a short span of time (like AWS often blocks these many amount of requests to a server made within a short duration to prevent DDoS attacks for security purposes). If it triggers, the next best approach would be either to disable the security @ AWS (or others) temporarily or update the code and add delays after each 1000 requests or use the proxies!
Python Code:
import pandas as pd
from pyodk.client import Client
######################################### Parameters ############################################
path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name = "my_csv.csv"
column_name_for_pid_in_csv = "pid"
column_name_for_corresponding_correct_village_in_csv = "correct_village"
column_name_for_pid_in_entity_list = "pid"
column_name_for_corresponding_incorrect_village_in_entity_list = "village"
config_path = "config.toml"
cache_path = "cache.toml"
project_id = 1
name_of_entity_list = "my_entity_list"
#################################################################################################
df = pd.read_csv(path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name)
correct_village_dict = df.set_index(column_name_for_pid_in_csv)[column_name_for_corresponding_correct_village_in_csv].to_dict()
with Client(config_path=config_path, cache_path=cache_path) as client:
submissions = client.entities.get_table(
entity_list_name=name_of_entity_list, project_id=project_id
)["value"]
for submission in submissions:
if submission[column_name_for_pid_in_entity_list] in correct_village_dict:
data = {
column_name_for_corresponding_incorrect_village_in_entity_list: correct_village_dict[submission[column_name_for_pid_in_entity_list]]
}
client.entities.update(
uuid=(submission["__id"]),
entity_list_name=name_of_entity_list,
project_id=project_id,
data=data,
force=True,
)
print(
"Successfully Updated: "
+ submission["__id"]
+ " (from "
+ submission[column_name_for_corresponding_incorrect_village_in_entity_list]
+ " to "
+ correct_village_dict[submission[column_name_for_pid_in_entity_list]]
+ " for "
+ submission[column_name_for_pid_in_entity_list]
+ ")"
)
Hope you find it helpful!
1 Like
The next ruODK release will support entities as well as create/delete actions. Coming soon!
4 Likes
Thank you this is perfect. I had some missing values with threw an error. So I updated it to the following
import pandas as pd
from pyodk.client import Client
######################################### Parameters ############################################
path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name = "my_csv.csv"
column_name_for_pid_in_csv = "pid"
column_name_for_corresponding_correct_village_in_csv = "correct_village"
column_name_for_pid_in_entity_list = "pid"
column_name_for_corresponding_incorrect_village_in_entity_list = "village"
config_path = "config.toml"
cache_path = "cache.toml"
project_id = 1
name_of_entity_list = "my_entity_list"
#################################################################################################
df = pd.read_csv(path_to_csv_file_containing_unique_pid_and_the_corresponding_correct_village_name)
correct_village_dict = df.set_index(column_name_for_pid_in_csv)[column_name_for_corresponding_correct_village_in_csv].to_dict()
with Client(config_path=config_path, cache_path=cache_path) as client:
submissions = client.entities.get_table(
entity_list_name=name_of_entity_list, project_id=project_id
)["value"]
for submission in submissions:
if submission[column_name_for_pid_in_entity_list] in correct_village_dict:
data = {
column_name_for_corresponding_incorrect_village_in_entity_list: correct_village_dict[submission[column_name_for_pid_in_entity_list]]
}
try:
client.entities.update(
uuid=(submission["__id"]),
entity_list_name=name_of_entity_list,
project_id=project_id,
data=data,
force=True,
)
print(
"Successfully Updated: "
+ submission["__id"]
+ " (from "
+ submission[column_name_for_corresponding_incorrect_village_in_entity_list]
+ " to "
+ correct_village_dict[submission[column_name_for_pid_in_entity_list]]
+ " for "
+ submission[column_name_for_pid_in_entity_list]
+ ")"
)
except Exception as e:
print(f"Failed to update {submission['__id']}: {e}")
1 Like