Two years ago, I spent time during the third COVID lock-down working on PostgreSQL functions to automatically pull data from Central into a PostgreSQL database.
Central2PG is the result of that research. And we've been able to maintain the workflows we built on Aggregate since 2015 on Central
Every time we publish a new form, we simply set up a scheduled task that uses Central2PG to automatically pull the data into a PostgreSQL database.
Central2PG works very well and we never lost any data.
But it has one limitation : every time we call it, it downloads all the data collected with the form since its creation.
That's a lot of CPU usage, energy and bandwidth for a percentage of new data that is decreasing every day.
I explored different possibilities to get only filtered data, for example collected after a given submission date and finally used the pyODK functions within the PostgreSQL pl-pgsql functions.
While Central2PG required the installation of curl on the server, pl-pyodk requires the activation of the plpythonu extension of python.
The first call gets the submission data that meet the given filter.
When "NavigationLink" are found in the response json, the function is called recursively on these links.
The same functions we developed for Central2pg continue to create and populate the tables in your database.
When new questions appear in the form, they are automatically added to the tables.
The repo explains how to set up pl-pyodk on your database and also contains a PostgreSQL Docker file and a simple form to test pl-pyodk on your own machine with the given SQL examples.
I will complete the tutorial with an example of image upload and a QGIS visualization.
You can find pl-pŷodk here :
We can now ask central for new data only at a frequency close to real time.
Please let me know if you use it, or if you face any problems, and feel free to improve it and share your modifications.
If you raise a bug, please fill an issue on github : https://github.com/mathieubossaert/pl-pyodk/issues/new