Geo fields should be stored in wkt format (Well Known Text)

Good evening everybody,

I would like to suggest some improvement of the spatial data storage.
Actually, geopoint are stored into 4 attributes (longitude, latitude, elevation and accuracy), Geotrace and geoshape are stored in a text field. The format of that string is not so simple. The use of Well Known Text format could be a great enhancement. All the databases supported by odk do understand that GIS format. https://fr.wikipedia.org/wiki/Well-known_text

Actually we have to create some fun sql to get data into GIS Eneblad databases such as PostGIS :

CREATE OR REPLACE FUNCTION odk.geom_from_odk_to_postgis(LONG numeric, LAT numeric, GEOTRACE TEXT, GEOSHAPE TEXT)
RETURNS geometry(GEOMETRY,4326) AS
$$
WITH geom_from_odk AS 
(
        SELECT trim(COALESCE(NULLIF(trim(concat(LAT,' ',LONG),';'),' '),
        GEOTRACE,
        GEOSHAPE),';') as geom_odk
),
node_array AS (
        SELECT geom_odk, unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)) AS unnest,
        array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1) AS nb_points,
        generate_series(1, array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1)) AS ordre,
        split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 1)::numeric AS LAT,
        split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 2)::numeric AS long
        FROM geom_from_odk
)--, geom AS (
SELECT 
CASE
WHEN node_array.nb_points = 1 THEN st_union(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326))
WHEN st_isclosed(st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))) IS FALSE THEN st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))
WHEN st_isclosed(st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))) AND node_array.nb_points > 3 THEN st_makepolygon(st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre)))
-- ELSE the polygon is invalid because made of 3 points, 1st and 3rd are the same so we create a linestring
ELSE st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))
END AS geom
FROM node_array
GROUP BY node_array.nb_points;
$$
  LANGUAGE sql VOLATILE
  COST 100;

Spatial databases do have easy native functions to transform a wkt data into geometry. For example, in PostGIS the simple use of the ST_GeomFromText(geo_column) function should do the job :slight_smile:
http://postgis.net/docs/ST_GeomFromText.html
Mysql do have the same : https://dev.mysql.com/doc/refman/5.7/en/gis-wkt-functions.html#function_st-geomfromtext

ODK Collect could become a great GIS Field tool !

It looks like @martijnr mentioned WKT in the original discussion regarding the format of geo data collected by ODK clients here. I'm not entirely clear from reading the discussion why it wasn't considered, does anyone remember?

I can also see that @Rijvi_Rajib wanted to use WKT in this post.

In principle, producing WKT (or another standard format used by GIS tools) seems like a really good idea. In practice, what would that take? Adding an attribute to the existing geo controls? Adding alternate geo controls? It would probably be important to keep supporting the existing format for continued compatibility.

@mathieubossaert do you have a sense of how you would like to request WKT output when building a form?

We used the original already-existing geopoint format as a basis to define the new geotrace and geoshape formats. So that was the reason we didn't really consider WKT, I think.

what would that take?

To change that (properly) we would have to add 3 new datatypes. I wonder if it's useful to do this on the client side, because it's so easy to convert our existing 3 formats into any other geo format on the server side (an exporter option?). If starting from scratch, I would agree to pick a well-known popular format though.

1 Like

Thanks for the background, @martijnr!

I like what you're saying about adding some conversion functionality on the server side. Clearly some users like @mathieubossaert can already do the conversions themselves but it would be convenient to make that more accessible for everyone. Something like "GIS data format" could be added to export settings both at the server-wide level and for individual forms. What do you think, @mathieubossaert?

Good evening Hélène and Martijn,

sorry for my silence, but I was out of office this week. By the way, on the server side, data should always be store ine a text format. We face a problem due to the small size of the text field generate for geoshape format, but that's a "detail" :https://github.com/opendatakit/opendatakit/issues/1265#issuecomment-307324702

I Think it would be great to have a kind of option at the form level to choose the desired format to send to the server (WKT ot "native" odk format).

This would help a lot of user, who don't want to perform too much SQL queries to use their data in a GIS software.
For example with QGIS you can easily map as csv file with a WKT field.

Thanks a lot for you attention :slight_smile:

@mathieubossaert Would it help you just as much if there was a conversion option in the server you are using (e.g. in ODK Aggregate)? E.g. a pulldown with WKT, geoJSON etc. There could be a similar option to convert dates, datetimes (not sure if that exists already).

2 Likes

@martijnr of course, this would be great ! In fact this question does concern the data administrator more than the form designer.

Thanks for the great conversation on this. A conversion option on the server seems like a fantastic solution and I wanted to follow up to give you a sense of the possible timeline. Aggregate has not yet become a community tool and Mitch from UW has been continuing its maintenance when he has a moment (thanks, @Mitch_S!).

As you may have seen elsewhere on the forum and in the ODK convening notes, there have been some conversations started about how to evolve Aggregate. I think that this WKT feature will need to wait for some broader decisions to be made on that front. There will likely be a roadmap developed as part of those conversations and I imagine this will be on it.

Thanks for your interest on this feature !
I'm sure it would be a big step to spread ODK in GIS community.

We (A colleague and I) made a course for GIS admins (from nature conservation bodies) and one of the difficulties is to explain the transformation process from odk field to PostGIS geom. So in the future we will spend less time on that point and more time on more important things !

Thanks again to the odk community, who makes such tools available.

1 Like

Thanks for this excellent function. I had a bit of a problem with it though. When the data is turned into an array there is a leading space so when split_part is used it selects this for the first coordinate. The solution is to add another btrim around the unnest to remove the leading space, so the relevant lines would read:

split_part(btrim(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text), ' '::text, 1)::numeric AS LAT,
split_part(btrim(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text), ' '::text, 2)::numeric AS long

Thanks @Phil_Burapha for you interest, but I can't see and reproduce the problem you mention, but maybe I misunderstood the problem. The tree examples below are working fine...

for a geotrace :

WITH geom_from_odk AS
(
SELECT trim(COALESCE(NULLIF(trim(concat(null,' ',null),';'),' '),
'43.608362555356365 3.892917466396426 0.0 0.0;43.60864838739175 3.8930497886624664 0.0 0.0;43.608906859254205 3.8928904418296213 0.0 0.0;43.60870681929299 3.892721462561326 0.0 0.0;',
null),';') as geom_odk
)
SELECT geom_odk, unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)) AS unnest,
array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1) AS nb_points,
generate_series(1, array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1)) AS ordre,
split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 1)::numeric AS LAT,
split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 2)::numeric AS long
FROM geom_from_odk

for a geopoint :

WITH geom_from_odk AS
(
SELECT trim(COALESCE(NULLIF(trim(concat(3.892917466396426,' ',43.608362555356365),';'),' '),
null,
null),';') as geom_odk
)
SELECT geom_odk, unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)) AS unnest,
array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1) AS nb_points,
generate_series(1, array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1)) AS ordre,
split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 1)::numeric AS LAT,
split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 2)::numeric AS long
FROM geom_from_odk

for a geoshape :

WITH geom_from_odk AS
(
SELECT trim(COALESCE(NULLIF(trim(concat(null,' ',null),';'),' '),
null,
'43.608647467555784 3.8931233982374636 0.0 0.0;43.60826940402103 3.8932827624893207 0.0 0.0;43.60789392761811 3.893790823480458 0.0 0.0;43.60862157286206 3.8943042359775006 0.0 0.0;43.60907213881663 3.8937161287686024 0.0 0.0;43.608647467555784 3.8931233982374636 0.0 0.0;'),';') as geom_odk
)
SELECT geom_odk, unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)) AS unnest,
array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1) AS nb_points,
generate_series(1, array_length(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text), 1)) AS ordre,
split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 1)::numeric AS LAT,
split_part(unnest(string_to_array(btrim(geom_odk::text, ';'::text), ';'::text)), ' '::text, 2)::numeric AS long
FROM geom_from_odk

As an alternative, following @Xiphware and @detrygeoffroy I will soon play to generate a WKT column withi the form as they did with kml and geojson :

It will be more simple to do because we only need to concatenate edges coordinates.

The below image is some of the data I collected as an array. This is after the first btrim and the string_to_array function have been called, but not unnest. Because you use ' ' as the deliminator when you split_part that leading space, highlighted in blue, will make the first thing that split_part selects null. It will also make the second part the wrong coord.

Here is what the same data looked like after I called split_part

Hi to all,

thanks to @Phil_Burapha I think I found a curious comportment in Aggregate : It seems that when geoshape is a short string, edges are separated by a semicolon and a space "; " and when the geoshape is a long string, the delimiter is only a semicolon.
here are some examples, created the same days with the same form.
geoshp_spaces.txt (3.4 KB)

So I adapted the function to consider a possible space after the semicolon.

CREATE OR REPLACE FUNCTION odk.geom_from_odk_to_postgis(LONG numeric, LAT numeric, GEOTRACE TEXT, GEOSHAPE TEXT)
RETURNS geometry(GEOMETRY,4326) AS
$$
WITH geom_from_odk AS 
(
        SELECT regexp_split_to_array(btrim(COALESCE(NULLIF(trim(concat(LAT,' ',LONG),';'),' '),
        GEOTRACE,
        GEOSHAPE),';'),';\s*') as geom_odk
),
node_array AS (
        SELECT geom_odk, unnest(geom_odk) AS unnest,
        array_length(geom_odk,1) AS nb_points,
        generate_series(1, array_length(geom_odk, 1)) AS ordre,
        split_part(unnest(geom_odk), ' '::text, 1)::numeric AS LAT,
        split_part(unnest(geom_odk), ' '::text, 2)::numeric AS long
        FROM geom_from_odk
)--, geom AS (
SELECT 
CASE
WHEN node_array.nb_points = 1 THEN st_union(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326))
WHEN st_isclosed(st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))) IS FALSE THEN st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))
WHEN st_isclosed(st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))) AND node_array.nb_points > 3 THEN st_makepolygon(st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre)))
-- ELSE the polygon is invalid because made of 3 points, 1st and 3rd are the same so we create a linestring
ELSE st_makeline(array_agg(st_setsrid(st_makepoint(node_array.long::double precision, node_array.lat::double precision), 4326) ORDER BY node_array.ordre))
END AS geom
FROM node_array
GROUP BY node_array.nb_points;
$$
  LANGUAGE sql VOLATILE
  COST 100;
1 Like