Automating Mobile Data Integration from KoboToolbox to PostgreSQL with Pentaho Data Integration (PDI)

Background

As part of the monitoring and management of hydrometric stations, a mobile application based on ODK Collect was implemented. This application allows field agents to:

  • Collect water level measurements,
  • Document the status of hydrometric stations,
  • Record the GPS coordinates of the station where observations are made.

Once collected, the data is transmitted from the field to a centralized KoboToolbox server.

The ultimate goal was to quickly store the collected data in a PostgreSQL/PostGIS database, avoiding certain manual import/export steps and eliminating the need for user interaction with the KoboToolbox interface.

To achieve this, an automated dataflow was implemented using Pentaho Data Integration Spoon (aka Kettle). The PDI transformation leverages the KoboToolbox API to extract data in JSON format, perform necessary transformations, and insert the records directly into a dedicated table in the PostgreSQL database.

The goal of the showcase is to explain how to use PDI to harvest Mobile Data stored in Kobo and send them on PostgreSQL.

PDI Workflow and Kobo API Call

The dataflow in PDI Spoon facilitates the extraction of data from the KoboToolbox API, interprets the JSON response, performs standard data transformation steps, and inserts the data automatically into a PostgreSQL database.

The overall workflow is as follows:

pdi_gloal_flow

Below are the most important steps in this process:

  • Calling the KoboToolbox API via [HTTP Client]

This step retrieves the collected data from KoboToolbox using its REST API. The API endpoint used is:

https://kf.kobotoolbox.org/api/v2/assets/[ASSET_UID]/data.json

Kobo API documentation is available here

Login credentials (username and password) are required. In this flow, these parameters are passed as variables from a job.

the parameter [ASSET_UID] is the unique identifier of the Kobo survey, visible in the URL of KoboToolbox when you open your form (nav toolbar).

  • Interpreting and Processing JSON Data [JSON Input]

The JSON data retrieved must be transformed into a tabular format to be usable and ready for insertion into PostgreSQL. This step parses the JSON response and extracts the required fields using JSONPath syntax.

In the [Field] tab, define the paths of the JSON fields to extract, following the structure of the Kobo form.

  • Data Cleaning and Insertion into PostgreSQL

The tabular data extracted is processed and cleaned before being inserted into a PostgreSQL table. Depending on the requirement. To insert data on the DB table, steps like [Table Output] or [Insert/Update] could be used.

Execution Scheduling

In our case, the transformation is executed as part of a global job. The PDI workflow is deployed on a server and connected to a web application, which triggers it using Kitchen. Additionally, the job is scheduled to run automatically at regular intervals using CRON.

I hope these may be useful of some other use.

1 Like