I like "asset" @Xiphware - it feels a lot more MI5/CIA/<insert NZ equivalent here> (does NZ have a secret service?)
My current thinking is that we can use an external secondary instance to populate the list of available entities. This could then be an arbitrary data set, although it would normally be populated by the server (Central, presumably), which would have a UI that would allow you to nominate form A as the entity source for form B (the report form). The server would then automatically add an external secondary instance to form B. This external secondary instance would reference a CSV file that the server would automatically generate from instances of form A.
Although, what with 30 million sheep and 15 million cows, vastly outnumbering the less than 5 million humans, you can probably guess who our resident spooks consider to be the greatest threat to national security... (thankfully, both lack an opposable thumb, or we'd be totally screwed!)
I think it would be unfortunate to be limited by CSV, though. One of the big reasons the form spec is built around XML is because it can represent arbitrarily complex schemas and XPath makes querying those straightforward. To give a concrete example, let's say you're collecting information about patients and you ask questions about what allergies each has, when each allergy was developed, how severe it is, etc. In an XML instance generated by an ODK XForms client, you might have a varying number of repeated allergy blocks. Provided an XML document with all patient information, you could do things like get all patients with severe allergies, all patients with a particular kind of allergies, all distinct allergens represented (I don't think the ODK XForms spec has support for this last one yet but it could/should), etc. With CSVs, you'd need to have multiple files, cap the number of allergies per patient, have a really wide table, or some other workaround. This gets at the core of why it's important to support XML external secondary instances and make them performant.
I think the schema of the entity/asset representation will possibly lead to some creative options for performance. For example, introducing a standard element name like entityId that ties records about entities together means clients (Collect, Enketo) could do something like have a database table per entity with entityId as the key. XML blobs could be stored for each form filled about the entity. This would make listing entities, linking to a specific one, and querying a specific one extremely fast. Queries across entities could then use a virtual instance built from pulling all the relevant XML blobs from the database. This could also make synchronization between server and client more efficient: the clients could request updates after a certain date and the server could provide just those blobs that have changed.
All that to say that if one of the big goals for the performance work that is starting back up in JavaRosa/Collect is to support longitudinal data collection, it might make sense to start getting more concrete about what that means!
If its going to be auto-generated anyway, I dont see there's a particular benefit to making it CVS vs XML, whereas there's definitely a benefit to the latter - both in terms of a more orthogonal implementation, and opening up the data to powerful XPath queries.
Yes, absolutely. I was ducking the details here (plus I had the discussion about converting CSV to XML in the back of my mind).
And yes, absolutely, it's time that we got more concrete about the implementation of case management. I think we should have another round of discussion, primarily focused on requirements rather than implementation, at the next TSC meeting.