First off, I'm doing these conversions/translations in my iXForms app (ala Collect) not in the Aggregrate, and on a per submission basis not as part of a collective database dump/export. Although converting a (Collect) submission from XForm XML -to- GeoJSON in the frontend vs backend is of no consequence to the problem at hand, I do think it is probably worth first focussing on what a single (XForm) submssion should look like in GeoJSON (eg a "Feature"?), and let the larger export flow naturally from that (eg a "FeatureCollection"?)...
In my case, iXForms can export a submission to a number of different formats:
- KML, to launch Google Earth to view the submission
- XLS, to launch MS Excel to view it as a spreadsheet
- CSV, to be able to process the data in other tools (more readily than XLS)
- JSON, as a lighter-weight payload than XML
- and finally GeoJSON, as an obvious GIS-targetted alternative to KML.
One of the first issues faced when dealing with KML or GeoJSON is that a Collect XForm form (and hence its resulting submissions) may well contain one, none, or multiple geo-referenced properties. Although in many cases there will only be only one, eg the geopoint of dwelling, there could easily be many (eg location of all water sources, as a repeat group, when surveying a village), or none (eg the device has no GPS, or a survey of aid recipient demographics for which spatial location is meaningless/useless).
In the case of only a single, primary, mandatory geo-referenced property (probably a geopoint) the GeoJSON representation is fairly obvious: the submission becomes a GeoJSON "Feature", you pull out the (single) geopoint from the XML (however deep its buried...) which becomes the Feature's "geometry" point, and everything else in the submission XML is put under "properties" as key-value pairs. And because GeoJSON supports nested properties, this property/group hierarchy can pretty much be a direct 1:1 mapping of XML to JSON (sans the geopoint obviously).
In the case of the XForm submission containing no georeferenced property, a strong argument can be made that - as a consequence - this has no legitimate GeoJSON representation. Specifically, GeoJSON is defined as:
GeoJSON is a format for encoding a variety of geographic data
structures using JavaScript Object Notation (JSON) [RFC7159]. A
GeoJSON object may represent a region of space (a Geometry), a
spatially bounded entity (a Feature), or a list of Features (a
FeatureCollection). GeoJSON supports the following geometry types:
Point, LineString, Polygon, MultiPoint, MultiLineString,
MultiPolygon, and GeometryCollection. Features in GeoJSON contain a
Geometry object and additional properties, and a FeatureCollection
contains a list of Features.
[emphasis added]
My interpretation of this is that a 'structure' containing no "spatially bounded entity" is not a GeoJSON Feature per se, and therefore neither can it be an element in a FeatureCollection. Whereas XML and JSON (and CSV?) can largely be considered a universal data serialization format, GeoJSON simply isnt JSON [sic]; its meaningful scope is a georeferenced object(s), and its not intended to represent more abstract (non-georeferenced) data structures. If we consider GeoJSON as just-another-export-format for (a) XForm submission, I would argue that XForm submissions lacking any georeferencd properties should return a NULL result (!) - because anything else would be disingenuous - and that if a particular XForm is fully intended to be georeferenced then this must be explicitly enforced by the inclusion of a mandatory geopoint (or similar) property.
When it comes to the case of the XForm actually containing multiple georeferenced proerties things get interesting... First off, what is the 'principal' georeference (eg geopoint) that we want to use to display this submission on a map? Perhaps the easiest is simply take the first control, under the assumption that the location of the thing you are conducting the survey about is probably one of the first questions that will appear in the form [this is what I did, as a quick-n-dirty solution]. But obviously we'd want a more robust/less ad hoc solution for ODK. To this end I might suggest introducing an ODK-specific attribute on any geopoint/geoshape/geotrace binding to indicate that it is the 'primary' one to use for the overall result (eg orx:geoprimary=yes
). The primary georeference would therefore become the top-level "geometry" point. Then the question is what to do with the rest of these geo* properties... Unfortunately, GeoJSON doesn't allow nested GeoJSON objects as properties:
A GeoJSON text is a JSON text and consists of a single GeoJSON object.
[although we could leverage using a FeatureCollection for this purpose, I think this is a stretch - there's no implicit notion of what the 'primary' Feature is in this list - and we probably want to exploit the FeatureCollection for our multi-submission export in any case].
What GeoJSON does allow is to instead represent these sub-features (eg locations of the various wells within the village), as so-called "Foreign Members". The problem here however is that these extended properties may well contain, but will not be interpreted as, geospatial properties. The spec even gives an example:
GeoJSON semantics do not apply to foreign members and their
descendants, regardless of their names and values. For example, in
the (abridged) Feature object below
{
"type": "Feature",
"id": "f2",
"geometry": {...},
"properties": {...},
"centerline": {
"type": "LineString",
"coordinates": [
[-170, 10],
[170, 11]
]
}
}
the "centerline" member is not a GeoJSON Geometry object.
[it might be possible to exploit a GeoJSON "GeometryCollection" somehow, which appears to allow multiple georeferenced values, but its not clear to me that these can be any more than the actual geopoint/geoshape/geotrace; that is, they couldn't have any additional key-value properties associated with each. But perhaps one of the GeoJSON GIS experts can chime in here?].
In conclusion, I do think XForm submissions can be well-defined in GeoJSON, with the following caveats:
- an XForm submission containing no geopoint, geoshape or geotrace has no valid GeoJSON representation [which I expect will elicit some debate...],
- an XForm submission containing a single geopoint, geoshape or geotrace is represented as a single GeoJSON "Feature" whose "geometery" is defined by the geopoint/geoshape/geotrace; all other submission properties are represented as key-value pairs under the GeoJSON "properties".
- an XForm submission containing multiple geopoint, geoshape or geotrace are represented as a GeoJSON "Feature" whose "geometery" is defined by the primary geopoint/geoshape/geotrace - as determined by (say) the associated binding's orx:geoprimary value. All non-primary georeferenced properties of the XForm submission may be represented via GeoJSON "Foreign Members", but their interpretation (eg as secondary georeferenced features) is entirely implementation dependend.
- multiple XForm submissions, eg an Aggregate export, are represented as a GeoJSON "FeatureCollection", subject to the above.
Alas, I can offer no guarantee my ideas are good, or even above the median. But you can rest assured they'll probably be "provocative"... 