ODK Central and R / Python automation for complex data management projects

Thalie · March 23, 2021, 5:43pm

For those who would like to use Python instead of R (e.g. if you would like to use pandas and scikit-learn on your data) but are have not started playing the API (yet), you can easily request data from ODK Central elaborating from the (very helpful) example and get_session_token function provided by @yanokwa here. I am using a very slightly modified version of get_session_token, in which I passed ODK Central parameters to the function, so as not to rely on global variables.

Function to export data to a `pandas` dataframe relying on Odata

See API documentation

def data_document(central_url, session_token, central_project_id, central_form_id):

    data_response = requests.get(
        central_url + "/v1/projects/" + str(central_project_id) + "/forms/" + str(central_form_id) + ".svc/Submissions",
        headers={"Authorization": "Bearer " + session_token},
    )

    if data_response.status_code == 200:
        d = data_response.json()['value']
        return pd.DataFrame(d)

Main

project_id = 14
form_id = "01-TIMCI-CRF-Facility"

with open("odk_central_credentials.json") as f:
    credentials = json.load(f)
    central_url = credentials["central_url"]
    central_email = credentials["central_email"]
    central_password = credentials["central_password"]

    session_token = get_session_token(central_url, central_email, central_password)
    if session_token:
        df = data_document(central_url, session_token, project_id, form_id)
        # Display only the first 5 submissions of the dataframe
        print(df.head(5))
        # Other processings :-)
    else:
        print("Error getting session token")

Backups

You can also use Python (or R) for automating your backups, if sending your data to Google Drive is not an option. Here is an example for making a direct backup, which we developed with one of my Tanzanian colleague.

def backup(central_url, session_token, passphrase):

    backup_response = requests.post(
        central_url + "/v1/backup",
        data = json.dumps({"passphrase": passphrase}),
        headers = {"Authorization": "Bearer " + session_token,
                   "Content-Type": "application/json"},
        stream = True
    )

    if backup_response.status_code == 200:
        _, params = cgi.parse_header(backup_response.headers.get('Content-Disposition', ''))
        filename = params["filename"].replace(":","-")
        file = open(filename, "wb")
        file.write(backup_response.content)
        print("Backup successful")

ODK Central and R / Python automation for complex data management projects

Function to export data to a pandas dataframe relying on Odata

Main

Backups

Function to export data to a `pandas` dataframe relying on Odata