Here's a revised and more concise version of your message:
Hello,
I have a Python script that I use to extract data for my workflow. I've noticed that the columns KEY and PARENT_KEY are missing in the output after running the script. I tried tracing if these columns were being pulled from the data and if any subsequent lines of code were removing them, but that doesn't seem to be the case. When I print the raw data returned from Submissions = client.submissions.get_table(form_id=form_id, table_name=table_name), these columns are missing.
Is there a way to ensure KEY and/or PARENT_KEY are included in the data? Below is the relevant part of my code:
for index, row in project_repeat_data.iterrows():
form_id = row['form_id']
table_name = row['table_name']
form_status = row['status'] # Assuming the status column name is 'status'
try:
# Check if the form's status is "active"
if form_status.lower() == 'active':
# Retrieve data and process
Submissions = client.submissions.get_table(form_id=form_id, table_name=table_name)
print(f"Raw data for form_id {form_id}: {Submissions['value']}")
obs_df = pd.json_normalize(data=Submissions['value'], sep='/')
Based on my observation, pyODK does not return "KEY" and "PARENT_KEY" through the API when returning the json file, as when downloading the data as CSV on ODK Central.
Instead, it provides "__id" as a unique identifier for each observation in the base part of the form, and "__Submissions-id" as a unique identifier for each observation within repeat groups, if they exist.
You can utilize "__id" as a replacement for "KEY" and "__Submissions-id" as a replacement for "PARENT_KEY" in your data processing."
This should clarify how to handle identifiers in your workflow using pyODK.
Thanks, @Joseph_Muganga; unfortunately, this doesn't address my problem. I've always used the columns instanceID from the parent table and __Submissions-id from the repeat table, but when you alter a record from central, the __Submissions-id changes, removing the link that was previously there. The only columns that may be used to link the two tables after updating/editing the data are KEY and PARENT_KEY.
Hi @Chimwemwe, sorry for bringing something new but I'm using Pandas to manipulate Central data in Python without any problems even after many edits. The code is something like this:
import pandas as pd
# main form id on Central
FORM_ID = 'MyFormID'
# you have to stablish a connection to Central using pyODK and your credentials as shown in the tutorial.
# getting submissions list for the main form
subm_list_master = client.submissions.get_table(form_id=FORM_ID, filter="__system/reviewState ne 'rejected'")['value']
# the submission list is a list of dictionaries so we can pop system and meta data like this
[d.pop('__system') for d in subm_list_master]
[d.pop('meta') for d in subm_list_master]
# pandas dataframe for the main form data
df_master = pd.DataFrame(pd.json_normalize(subm_list_master))
# submission list for the detail data, usually a table created by a repeat structure
subm_list_detail = client.submissions.get_table(form_id=FORM_ID, table_name='Submissions.MyRepeat')['value']
# pandas dataframe for the detail (repeat) data
df_detail = pd.DataFrame(pd.json_normalize(subm_list_detail))
# now we can merge both dataframes using Pandas pd.merge()
df_all_data = pd.merge(left=df_master, right=df_detail, left_on='__id', right_on='__Submissions-id', how='inner')