Some discussion over the forum or with users in France about "how to integrate Central's data to my own database' encourage me to share my recent tries.
Central is a big step forward in projects/users/groups and user rights management. Using it will save a lot of time and gymnastics. Its standard API will help a lot of users to create data analysis, dashboards... I plan to try creating dashboards for specific forms (with all data produced by Collect and stored in Central) using Central's API.
But, in order to transfer Central's data to our main database, I would like to continue as I do quite simply with Aggregate and not to have to rebuild too much "working like a charm" procedures.
So last week I started to explore SQL access to Central's data as we do with Aggregate (which will continue to manage approx 10 existing forms).
I think the method I'll expose here is not the encouraged way to access data (as it was not for Aggregate), but in our context, it is an easy way to use PostgreSQL's functionalities as our information system is already build over PostgreSQL.
PostgreSQL come with some practical capabilities, to connect to a foreign server and process xml data (xpath/xml functions, Foreign Data Wrappers, triggers...).
Some colleagues do that data transfer from Central's database to their main one with an ETL. The kettle example on the forum is a good one : Automating Data Delivery using the OData Endpoint in ODK Central
But I wanted to try to keep the data work-flow as "simple" as possible such as two databases talking together.
We will work through a Foreign Data Wrapper over ODK Central's database with an unprivileged user (SELECT only on Central's tables).
The form we use for this test is the one described here
The form contains 2 nested repeats (the first for locations (localite), the nested one for observations)
CREATE EXTENSION postgres_fdw;
CREATE SERVER odk_central
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'aaa.bbb.ccc.ddd', port '1234', dbname 'odk');
CREATE USER MAPPING FOR local_user SERVER odk_central
OPTIONS ("user" 'odk_reader', password 'odk_pwd');
/* odk_user exists on Central's server, with only connect to the database, usage to the schema and select rights on odk tables */
CREATE SCHEMA odk_central
AUTHORIZATION dba;
IMPORT FOREIGN SCHEMA public LIMIT TO (forms, form_defs, form_fields, submission_defs, submission, submission_attachments)
FROM SERVER odk_central INTO odk_central;
SELECT * from odk_central.forms /* to try ;-) */
Now we can query the listed tables.
Let's investigate how Centr'al stores our submission data :
Forms structure is very well described in the database :
SELECT path, form_fields.name, type, "binary", "order"
FROM public.forms JOIN public.form_defs ON "formId" = forms.id
JOIN public.form_fields ON "formDefId" = form_defs.id
WHERE forms.name = 'Sicen2020' AND version = '14'
ORDER BY "order"
Will return each field of the form, numbered from 0 to the last one appearing in the form.
Repeats are typed as 'repeat' (type column) and groups are typed as 'structure'.
Every other form element is typed (string, decimal, integer, datetime, geopoint, geotrace, geoshape...)
And its path from the beginning of the xml entity is mentioned in the path column. So we can now try to generate tables from xml, as in aggregate, the root one (xxxx_CORE) and its child, sometimes nested.
We will use the xmltable() function, available since PostgreSQL 10.
CREATE VIEW odk_central.obs_sicen2020 AS
WITH xmldata AS (
SELECT submission_defs.xml::xml AS xml
FROM odk_central.submission_defs
), core AS (
SELECT "xmltable".phonenumber,
"xmltable".custom_setting,
"xmltable".astuce,
"xmltable".email_utilisateur,
"xmltable".username,
"xmltable".nom_observateur,
"xmltable".mail_observateur,
"xmltable".now,
"xmltable".date_time,
"xmltable".id_etude,
"xmltable".id_protocole,
"xmltable".emplacement,
"xmltable".obs,
"xmltable".ajout_acompagnateur1,
"xmltable".acompagnateur1,
"xmltable".ajout_acompagnateur2,
"xmltable".acompagnateur2,
"xmltable".ajout_acompagnateur3,
"xmltable".acompagnateur3,
"xmltable".ajout_acompagnateur4,
"xmltable".acompagnateur4,
"xmltable".ajout_acompagnateur5,
"xmltable".acompagnateur5,
"xmltable".ajout_acompagnateur6,
"xmltable".acompagnateur6,
"xmltable".ajout_acompagnateur7,
"xmltable".acompagnateur7,
"xmltable".ajout_acompagnateur8,
"xmltable".acompagnateur8,
"xmltable".ajout_acompagnateur9,
"xmltable".acompagnateur9,
"xmltable".ajout_acompagnateur10,
"xmltable".acompagnateur10,
"xmltable".instanceid,
"xmltable".instancename
FROM xmldata,
LATERAL XMLTABLE(('//data'::text) PASSING (xmldata.xml) COLUMNS phonenumber text PATH ('phonenumber'::text), custom_setting text PATH ('custom_setting'::text), astuce text PATH ('utilisateur/astuce'::text), email_utilisateur text PATH ('utilisateur/email_utilisateur'::text), username text PATH ('utilisateur/username'::text), nom_observateur text PATH ('utilisateur/nom_observateur'::text), mail_observateur text PATH ('utilisateur/mail_observateur'::text), now text PATH ('utilisateur/now'::text), date_time date PATH ('utilisateur/date_time'::text), id_etude text PATH ('protocole_etude/id_etude'::text), id_protocole text PATH ('protocole_etude/id_protocole'::text), emplacement xml PATH ('emplacement'::text), obs xml PATH ('emplacement/observation'::text), ajout_acompagnateur1 text PATH ('accompagnateurs/ajout_acompagnateur1'::text), acompagnateur1 text PATH ('accompagnateurs/acompagnateur1'::text), ajout_acompagnateur2 text PATH ('accompagnateurs/ajout_acompagnateur2'::text), acompagnateur2 text PATH ('accompagnateurs/acompagnateur2'::text), ajout_acompagnateur3 text PATH ('accompagnateurs/ajout_acompagnateur3'::text), acompagnateur3 text PATH ('accompagnateurs/acompagnateur3'::text), ajout_acompagnateur4 text PATH ('accompagnateurs/ajout_acompagnateur4'::text), acompagnateur4 text PATH ('accompagnateurs/acompagnateur4'::text), ajout_acompagnateur5 text PATH ('accompagnateurs/ajout_acompagnateur5'::text), acompagnateur5 text PATH ('accompagnateurs/acompagnateur5'::text), ajout_acompagnateur6 text PATH ('accompagnateurs/ajout_acompagnateur6'::text), acompagnateur6 text PATH ('accompagnateurs/acompagnateur6'::text), ajout_acompagnateur7 text PATH ('accompagnateurs/ajout_acompagnateur7'::text), acompagnateur7 text PATH ('accompagnateurs/acompagnateur7'::text), ajout_acompagnateur8 text PATH ('accompagnateurs/ajout_acompagnateur8'::text), acompagnateur8 text PATH ('accompagnateurs/acompagnateur8'::text), ajout_acompagnateur9 text PATH ('accompagnateurs/ajout_acompagnateur9'::text), acompagnateur9 text PATH ('accompagnateurs/acompagnateur9'::text), ajout_acompagnateur10 text PATH ('accompagnateurs/ajout_acompagnateur10'::text), acompagnateur10 text PATH ('accompagnateurs/acompagnateur10'::text), instanceid text PATH ('meta/instanceID'::text), instancename text PATH ('meta/instanceName'::text))
), emplacements AS (
SELECT core_1.instanceid,
concat('<data>', core_1.emplacement, '</data>')::xml AS localite,
core_1.obs
FROM core core_1
), localite AS (
SELECT emplacements.instanceid,
"xmltable".localite_id,
"xmltable".methode_geo,
"xmltable".longitude,
"xmltable".latitude,
"xmltable".point,
"xmltable".trace,
"xmltable".lenght_trace,
"xmltable".rounded_lenght_trace,
"xmltable".shape,
"xmltable".shape_area,
"xmltable".rounded_shape_area,
"xmltable".observations
FROM emplacements,
LATERAL XMLTABLE(('//data/emplacement'::text) PASSING (emplacements.localite) COLUMNS localite_id FOR ORDINALITY, methode_geo text PATH ('localisation/methode_geo'::text), longitude numeric PATH ('localisation/longitude'::text), latitude numeric PATH ('localisation/latitude'::text), point text PATH ('localisation/point'::text), trace text PATH ('localisation/trace'::text), lenght_trace text PATH ('localisation/lenght_trace'::text), rounded_lenght_trace text PATH ('localisation/rounded_lenght_trace'::text), shape text PATH ('localisation/shape'::text), shape_area text PATH ('localisation/shape_area'::text), rounded_shape_area text PATH ('localisation/rounded_shape_area'::text), observations xml PATH ('observation'::text))
), observations AS (
SELECT localite_1.instanceid,
localite_1.localite_id,
concat('<data>', localite_1.observations, '</data>')::xml AS obs
FROM localite localite_1
), obs AS (
SELECT observations.instanceid,
observations.localite_id,
"xmltable".obs_id,
"xmltable".useregne,
"xmltable".calcul_regne,
"xmltable".type_observation,
"xmltable".nom_pression,
"xmltable".intensite,
"xmltable".atteinte,
"xmltable".menace,
"xmltable".nom_observation_generale,
"xmltable".searchtext_habitat,
"xmltable".search_nom_habitat,
"xmltable".recherche_espece_animale,
"xmltable".lb_nom_animalia,
"xmltable".cd_nom_animalia,
"xmltable".recherche_espece_plante,
"xmltable".lb_nom_plantae,
"xmltable".cd_nom_plantae,
"xmltable".recherche_espece_champi,
"xmltable".lb_nom_fungi,
"xmltable".cd_nom_fungi,
"xmltable".groupe,
"xmltable".adulte_sexe,
"xmltable".adulte_male,
"xmltable".adulte_femelle,
"xmltable".adulte_sexe_indet,
"xmltable".juvenile_sexe,
"xmltable".juvenile_male,
"xmltable".juvenile_femelle,
"xmltable".juvenile_sexe_indet,
"xmltable".effectif_textuel,
"xmltable".adulte,
"xmltable".juvenile,
"xmltable".plantule,
"xmltable".graine,
"xmltable".larve,
"xmltable".oeuf,
"xmltable".age_indet_sexe,
"xmltable".age_indet_male,
"xmltable".age_indet_femelle,
"xmltable".age_indet_sexe_indet,
"xmltable".no_count,
"xmltable".total_individu,
"xmltable".note_indiv,
"xmltable".determination,
"xmltable".comportement,
"xmltable".eff_habitat,
"xmltable".etat_conservation,
"xmltable".gestion,
"xmltable".code_phyto,
"xmltable".remarque,
"xmltable".surface_estimee,
"xmltable".lineaire_estime,
"xmltable".prendre_image
FROM observations,
LATERAL XMLTABLE(('//data/observation'::text) PASSING (observations.obs) COLUMNS obs_id FOR ORDINALITY, useregne text PATH ('type_obs/useregne'::text), calcul_regne text PATH ('type_obs/calcul_regne'::text), type_observation text PATH ('type_obs/type_observation'::text), nom_pression text PATH ('obs/pression/nom_pression'::text), intensite text PATH ('obs/pression/intensite'::text), atteinte text PATH ('obs/pression/atteinte'::text), menace text PATH ('obs/pression/menace'::text), nom_observation_generale text PATH ('obs/observation_generale/nom_observation_generale'::text), searchtext_habitat text PATH ('obs/habitat/searchtext_habitat'::text), search_nom_habitat text PATH ('obs/habitat/search_nom_habitat'::text), recherche_espece_animale text PATH ('obs/animalia/recherche_espece_animale'::text), lb_nom_animalia text PATH ('obs/animalia/lb_nom_animalia'::text), cd_nom_animalia text PATH ('obs/animalia/cd_nom_animalia'::text), recherche_espece_plante text PATH ('obs/plantae/recherche_espece_plante'::text), lb_nom_plantae text PATH ('obs/plantae/lb_nom_plantae'::text), cd_nom_plantae text PATH ('obs/plantae/cd_nom_plantae'::text), recherche_espece_champi text PATH ('obs/fungi/recherche_espece_champi'::text), lb_nom_fungi text PATH ('obs/fungi/lb_nom_fungi'::text), cd_nom_fungi text PATH ('obs/fungi/cd_nom_fungi'::text), groupe text PATH ('obs/groupe'::text), adulte_sexe text PATH ('detail/adulte_sexe'::text), adulte_male integer PATH ('detail/adulte_male'::text), adulte_femelle integer PATH ('detail/adulte_femelle'::text), adulte_sexe_indet integer PATH ('detail/adulte_sexe_indet'::text), juvenile_sexe text PATH ('detail/juvenile_sexe'::text), juvenile_male integer PATH ('detail/juvenile_male'::text), juvenile_femelle integer PATH ('detail/juvenile_femelle'::text), juvenile_sexe_indet integer PATH ('detail/juvenile_sexe_indet'::text), effectif_textuel text PATH ('detail/effectif_textuel'::text), adulte text PATH ('detail/adulte'::text), juvenile text PATH ('detail/juvenile'::text), plantule text PATH ('detail/plantule'::text), graine text PATH ('detail/graine'::text), larve text PATH ('detail/larve'::text), oeuf text PATH ('detail/Oeuf'::text), age_indet_sexe text PATH ('detail/age_indet_sexe'::text), age_indet_male integer PATH ('detail/age_indet_male'::text), age_indet_femelle integer PATH ('detail/age_indet_femelle'::text), age_indet_sexe_indet integer PATH ('detail/age_indet_sexe_indet'::text), no_count text PATH ('detail/no_count'::text), total_individu text PATH ('detail/total_individu'::text), note_indiv text PATH ('detail/note_indiv'::text), determination text PATH ('detail_optionnel/determination'::text), comportement text PATH ('detail_optionnel/comportement'::text), eff_habitat integer PATH ('detail_optionnel/eff_habitat'::text), etat_conservation text PATH ('detail_optionnel/etat_conservation'::text), gestion text PATH ('detail_optionnel/gestion'::text), code_phyto text PATH ('detail_optionnel/code_phyto'::text), remarque text PATH ('detail_optionnel/remarque'::text), surface_estimee text PATH ('detail_optionnel/surface_estimee'::text), lineaire_estime text PATH ('detail_optionnel/lineaire_estime'::text), prendre_image text PATH ('detail_optionnel/prendre_image'::text))
)
SELECT row_number() OVER(ORDER BY core.instanceid, localite.localite_id, obs.obs_id) AS gid, core.phonenumber,
core.custom_setting,
core.astuce,
core.email_utilisateur,
core.username,
core.nom_observateur,
core.mail_observateur,
core.now,
core.date_time,
core.id_etude,
core.id_protocole,
core.ajout_acompagnateur1,
core.acompagnateur1,
core.ajout_acompagnateur2,
core.acompagnateur2,
core.ajout_acompagnateur3,
core.acompagnateur3,
core.ajout_acompagnateur4,
core.acompagnateur4,
core.ajout_acompagnateur5,
core.acompagnateur5,
core.ajout_acompagnateur6,
core.acompagnateur6,
core.ajout_acompagnateur7,
core.acompagnateur7,
core.ajout_acompagnateur8,
core.acompagnateur8,
core.ajout_acompagnateur9,
core.acompagnateur9,
core.ajout_acompagnateur10,
core.acompagnateur10,
core.instanceid,
core.instancename,
localite.localite_id,
localite.methode_geo,
localite.longitude,
localite.latitude,
localite.point,
localite.trace,
localite.lenght_trace,
localite.rounded_lenght_trace,
localite.shape,
localite.shape_area,
localite.rounded_shape_area,
obs.obs_id,
obs.useregne,
obs.calcul_regne,
obs.type_observation,
obs.nom_pression,
obs.intensite,
obs.atteinte,
obs.menace,
obs.nom_observation_generale,
obs.searchtext_habitat,
obs.search_nom_habitat,
obs.recherche_espece_animale,
obs.lb_nom_animalia,
obs.cd_nom_animalia,
obs.recherche_espece_plante,
obs.lb_nom_plantae,
obs.cd_nom_plantae,
obs.recherche_espece_champi,
obs.lb_nom_fungi,
obs.cd_nom_fungi,
obs.groupe,
obs.adulte_sexe,
obs.adulte_male,
obs.adulte_femelle,
obs.adulte_sexe_indet,
obs.juvenile_sexe,
obs.juvenile_male,
obs.juvenile_femelle,
obs.juvenile_sexe_indet,
obs.effectif_textuel,
obs.adulte,
obs.juvenile,
obs.plantule,
obs.graine,
obs.larve,
obs.oeuf,
obs.age_indet_sexe,
obs.age_indet_male,
obs.age_indet_femelle,
obs.age_indet_sexe_indet,
obs.no_count,
obs.total_individu,
obs.note_indiv,
obs.determination,
obs.comportement,
obs.eff_habitat,
obs.etat_conservation,
obs.gestion,
obs.code_phyto,
obs.remarque,
obs.surface_estimee,
obs.lineaire_estime,
obs.prendre_image,
st_transform(odk.geom_from_odk_to_postgis(COALESCE(longitude, split_part(point,' ',2)::numeric), COALESCE(latitude, split_part(point,' ',1)::numeric), trace, shape),2154)
FROM core
JOIN localite ON core.instanceid = localite.instanceid
JOIN obs ON obs.instanceid = localite.instanceid AND obs.localite_id = obs.localite_id;
I can now open my table with QGIS and map my observation, or "move" it to a consolidate observations table
We will have to complete this with binary data and media creation, filter form's name and version... The geom_from_odk_to_postgis() function needs to be improve to use directly the geopoint data.
This is a very first try (no materialization, no indexes, no trigger...), and I maybe missed some subtilities, but it convince myself that Central will be as accessible than Aggregate, with a lot of bonus features.