Adding a field to existing form (with data) and updating it by API - good idea?

1. What is the issue? Please be detailed.
I have an existing dataset on Central which is relatively 'rich' (i.e. lots of fields) and want to integrate a geotrace that has been digitised subsequent to data collection. I thought it would be useful to have the data in a single space rather than having to associate data from different sources.

Enumerators collected GPS data of their route (attached to the form as a GPX file) and that has been edited to generate a better approximation (!) of the route on the ground. So I have WKT and can convert that to ODK-geotrace, and these records are matched to the UUID of the form submissions.

Data collection phase is nearly complete, but there are likely to be additional records trickle in over a period of months.

I am wondering about the value and implications of:

  • updating the form definition to add a geotrace column (probably read only so that new records don't come in with a random trace) therefore creating a new version of the form, which would obviously be distributed to enumerators.
  • using the API to update existing records to insert the trace data (and periodically updating new records when they arrive).

In my head this would allow access to the submissions from Central allowing the data manager(s) to see the trace, make any edits and allow export of the full dataset (or ODATA linkage when I work out how to do that).
At present I am holding one dataset in QGIS (desktop) and one on Central (minus the trace!) - I am sharing a web-map version of the QGIS data, but that can't be edited by a 3rd party.

My support questions are:

  1. Is this a good idea and worth the investment of time?
  2. Would I be better to create a new form (including the geotrace) and populate it via the API - it does have a nested repeat and I don't want to max-out my server space by duplicating all the images.

I know that this is close to Entities concept but we're not able to bulk upload entities yet (I 'only' have 500 records here, but 6000 geolocated images as nested repeats) - these records will probably become entities in a future iteration (this is the audit phase and we then need to think about managing!) - probably separating the records from the images as 2 datasets.

But I'd like to move forwards slightly without breaking anything (haha!) and let the data manager(s) get access to the data in an editable format (via Enketo, obviously) and maybe share data through ODATA

Feedback welcome... gulp.

Great question, @seewhy!

Am I understanding this right: You are using Entities already and you have a Dataset (which we're starting to call Entity Lists). This entity list is missing a column of data, and it would be really useful if you could add that new column and populate it with data (which you have elsewhere, like in other submissions), and you're comfortable doing that via the API.

If that's the case, I think your approach sounds good. I have some small hacky suggestions, such as if you need to add another property to an entity and you don't want to make a new version of your form that gets sent out to everyone, you could make a separate form that adds the new entity property and doesn't get distributed, and then you would be able to use the API to update/fill in that new property in any existing entity.

I might be misunderstanding your situation though! (I think my main issue is not being clear on whether or to what degree entities are already involved, and also being curious but not super well-informed about how geodata gets managed in systems outside of ODK such as QGIS. I'd really like to understand, though!)

It also kind of sounds like being able to view and edit traces is important, and that's something we couldn't really do yet with entities (w/o making external API calls like you're talking about) but are actively working on. @LN just made this post outlining some design decisions about updating entities via submissions: Update an Entity from a form submission with server-side conflict detection and resolution

Thanks for your reply and trying hard to understand my situation. I'll see if I can answer, but I think you are be too optimistic about how far along the road I have got :slight_smile:

am not yet using entities for this dataset / entity list and I don't know if it's ready to make the leap of faith required.
This current dataset has been collected by a dispersed team and we are going through the QA and potentially wanting to confirm some queries in some records. If I give a limited number of data managers access to ODK Central we may be able to fix any data irregularities or gaps via Enketo rather than by email ping-pong. I can see that translating the whole list to entities could be more powerful in the very near future, but it has some risks (including my competencies) which I am not ready to unleash, and the cost-benefit of doing so might be too high. Hence me wondering about a modification of the form definition and potentially doing an update of records via API (which is a bit scary itself)

I would rather not duplicate the dataset if possible due to the size and my storage capacity on the server. (Because I can't easily delete records from other forms, things are filling up!)

They are not - we started data collection before entities were a 'stable' feature on Central and the client needed to get underway (a common problem, me thinks!) so I thought that it would be possible to translate the entity list to entities at a later date (still not today, though!)

Well, in my case I have a GIS layer which just has Linestring geometry (manually digitised) and a KEY field (Geopackage database stored locally), which I associate with the KEY from the ODK record (CSVs plus images downloaded incrementally and stored locally), so then I can use joins to interrogate the data spatially. But my client's staff do not all have access or competencies to use QGIS.

View definitely, edit... well that opens a can of worms asking people to edit a trace if they are not familiar with the concepts of how the trace is digitised - and Collect / Enketo are not fully featured GIS tools (rightly, I would say).

In other news, I'm experimenting with another dataset to see if I can translate it to entities on Central during a repeat (second round) of data collection (the existing records were collected using odk in 2018, and I have a select-from-map using an external CSV, which pre-fills the form with the existing values, some of which will be saved to the entity table!). It's nearly working - a geojson version gave me a headache and I'm not familiar enough with the format to solve so I'm trying CSV. This will build a new entity table on Central as each record is revisited. I need this ready for the end of the month, so can't afford to wait for bulk upload (sorry that's not meant to sound ungrateful or impatient!).

Not sure if that helps answer your queries, but maybe it helps me think through the implications of both tasks a bit more...

A couple of quick thoughts!

  • You might find this thread has interesting ideas. It's about augmenting submissions with audio transcriptions but you'll probably find value in the idea of adding a field that a script then makes relevant. Like I say there, I think augmenting submissions this way is a perfectly valid thing to do if you want Central to be a/the source of truth.

  • If you're comfortable using the API, you can shortcut your way to bulk upload with something like this script. This is basically what Central will do itself but we'll also want to do preview, detection of errors in source files, etc, so that's why it's a bigger endeavor to build in. If you haven't already, please be sure to answer our quick poll about bulk upload.

1 Like