1. What is the general goal of the feature?
Thanks to recent ODK Briefcase releases we have three targeted pull/export features
1] start-from-date
to resume pull operation from a specific date
2] start-from-last
to resume pull operation, picking up from position of last pull.
3] export-from-date
which runs an export operation from a specific date
What's now logically missing is the export counterpart to start-from-last
i.e.
4] export-from-last
which runs an export operation, picking ip from position of last export
I can also see reason to include an option
5] smart-export
which looks at the target file, then either
- creates the target file if it is not there and exports from first submission
- Identifies the
meta-instanceID
of the last exported submission and then exports from the next submission
2. What are some example use cases for this feature?
A] Export-from-last
When working with long lived and heavily used forms, export time becomes a significant issue when using behaviour that deletes the old csv and exports all data to a new one.
Daily export of ~750,000 forms to 17 CSVs currently takes about 5-7 hours on our largest project
An alternative solution we have tried is to use the append
behaviour and to stitch new submissions from an arbitrarily recent date (i.e. today < 5 days) to the end of the existing form, but this leaves duplicates that need to be removed using downstream analysis. Basic Unix based process such as sort | uniq
leads to problems with line order being changed in the resulting file (headers are also affected) so is not ideal.
Using export from date only works at granular level of the day, so if we pulled twice a day we would potentially miss or duplicate some records.
An export from last
would have a fairly good use case for all export activities, but especially in long lived forms when managed on CLI
B] smart-export
When system failure occurs or when passing system over to another operator, the ODK briefcase database can be quite quickly recreated by copying ODK xmls from backup drive to a new machine, but the start-from-last
or export-from-last
position flag is lost when this happens. By implementing a system that can look at the target file and identify the appropriate resume point, the full system could be rapidly recreated/replicated from system failure by
- Copying the xmls folder to new machine
- Copying the target CSV folder to the new machine
Another use case for smart-export
is that I set up a system and run it for six months. Then I need to go on a journey and @dr_michaelmarks wants to run the system while I am away. I copy the whole ODK Briefcase directory and CSV files on to a hard drive and give them to Michael. Michael's copy of Briefcase then just figures everything out (resume points for pull and export) and picks up where I left off without having to first run long pulls or exports.
I think that resume data are currently stored in system java somewhere, but moving to within ODK briefcase folder would allow resume info to be carried within the folder, meaning they could be passed between two systems.
3. What can you contribute to making this feature a reality?
Beta testing, discussion