Hi all,
I'm pasting in some ideas from an email discussion on this very topic. I hope this will provide some insights into the technical discussion about file formats!
I'd emphasize MVT as the first priority to allow broad basemaps, with GeoJSON as the second priority to allow editable vectors.
Why?
MVT allows a very small file to provide a very broad, though static, map that is quite efficient and quick to render. Raster MBTiles are fine for viewing a static map, but the file size is many orders of magnitude greater for a given area; for example a detailed MVT map of all of Tanzania—containing all of the OSM data, therefore millions of buildings and most of the visible roads—is about 300MB, while a raster MBTile of only the capital city of Dar es Salaam could easily be 120GB. If deploying enumerators to visit rural areas, raster MBTiles are simply too heavy to contemplate in most cases, but MVT is small enough to fit on the vast majority of phones' storage. So implementing MVT basemaps immediately gets most enumerators an offline basemap of the area they are working in.
The drawback of MVT is that it's not a good format for on-device editing. A few reasons for this:
- Each tile contains only the geometry within the tile itself. At a high zoom level this may be a tiny fragment of a road, or even a portion of a single building. Editing this means trying to keep track of the geometry of the whole feature, which is contained in the protobuf of multiple tiles (adjacent tiles and overlapping tiles of different zoom levels). That's a hassle.
- The geometry does not have much notion of the attributes of the feature it's part of, so you need another file/database to keep track of object attributes. While you have the feature ID, you have to go back to another file to link that ID to the attributes that you may wish to deal with when editing.
- The geometry within the protobuffers encapsulated in the MVT tiles is simplified; it may not (and in most cases probably doesn't) contain all of the nodes of the original feature! During creation of MVTiles, nodes are discarded to leave only enough to render the relevant parts of the feature as a visibly reasonable facsimile at the particular zoom level. When you extract a feature back out of MVT, you aren't guaranteed to get your original geometry back, though at the highest zoom levels you might (or at least it should be close). So editing would require, at a minimum, always drilling down to the highest-zoom tile, grabbing all of the adjacent tiles that contain relevant bits of the feature you want to edit, converting that to an editable representation (the internal MVT protobuffer representation is a crazy Logo Turtle-style language that would be hard to manipulate), and then replacing it in those tiles, as well as re-simplifying it to push it back into the lower-zoom tiles. Not a nice process at all, and even if you pull it off not guaranteed to retain all unedited nodes of the original feature.
This still leaves the question of styling MVT. There are various options for that, but I think the simplest is to have a default styling similar to the rendering done by OSMAnd, Maps.me, or the MapBox SDK, with an optional sidecar file specifying a custom styling using MapCSS or something like it.
Once MVT is done, we could theoretically implement OpenMapKit-style adding of attributes to polygons right away, just using the feature ID of the polygons in the MVT (no editing of actual geometry, but adding attributes in a form containing the feature ID). However, while it might be nice, I think that's a dead end because it'll never be smooth to access existing attributes or edit geometry. I'd just as soon head straight toward an extensible solution, which is an editable file for particular layers that are targets of the survey.
I could see lots of arguments for Shapefile or GeoPackage, but I think GeoJSON is the best option.
Shapefile is very compact and computationally efficient, but it truncates attribute headers/keys to 10 characters—you'll find a very common column name in GIS data is "descriptio" due to this truncation—this is particularly unfortunate for data like OpenStreetMap that often has quite long keys (i.e. building:construction
which gets truncated to building:c
, indistinguishable from the truncated building:commercial
). Shapefile is also a bit of a bear to parse due to the arcane ancient spec; there are some good libraries around, but the underlying machinery is pretty hairy. GeoPackage is lovely, reasonably efficient and compact, and is implemented on top of SQLite, which makes it a cinch on Android which has solid SQLite support. However, a lot of users will be intimidated by the requirements to create a GeoPackage. Not all data can be translated smoothly to GeoPackage, even for a moderately skilled GIS user (GeoPackage freaks out when confronted with topological errors or non-unique id columns, which unfortunately are often found in useful datasets).
GeoJSON is not very compact and not particularly computationally efficient, but it's really clear and straightforward in structure, easy to parse without relying on esoteric libraries, and easy to generate. Lots of Web people are familiar with GeoJSON, and there are tons of tools to generate it that aren't big scary GIS packages. Most data can be translated to GeoJSON without complaint, as the GeoJSON drivers don't care if you have topological errors or weird keys in your data (which would likely trip up a GeoPackage writer and maybe a Shapefile writer). It contains only one representation of any given feature, and can happily contain all attributes and metadata in the same file. As a generalist GIS format, it covers most uses reasonably without offering particularly high performance in any specific use.
So compared to an MVT (or Shapefile), a GeoJSON is large and inefficient for pure display purposes. You wouldn't want to put all of the OSM data for a city, or some equally large dataset, into it! However, you only need to put the layers you want to edit into the GeoJSON format, which in most cases vastly reduces the amount of data needed in that layer. And it's easy to create, style, edit, add/modify attributes to, and ingest after editing.
So my dream scenario is a three-layer cake, each layer being optional:
- Raster MBTiles on the bottom for satellite imagery or other useful high-res stuff in specific areas
- MVT in the middle, above the rasters, for a broad vector basemap, default styled with an optional sidecar file for styling
- Editable GeoJSON on top for features that will actually be mapped and/or tagged.
- Editable meaning the ability to:
- Modify geometry using the GPS position
- Modify geometry using fingers (long-press or cross-hairs, either way is fine)
- Trigger a form to populate attributes
- Trigger a form with existing attributes already pre-loaded for modification or not
However, that's my big ambitious dream scenario! I feel that just doing the MVT basemaps already adds a lot of value, which is not lost if the rest is not done (or takes a lot of time).