Hello All,
Apologies on the lengthy post (and apologies if this is simply too large to be in the features category), but there I don't see how to water this down without losing important information:
I'm an MSc Computer Science student at the University of Southampton in the UK, I'm currently in the process of doing my dissertation, which is in "Crowdsourced" form fill prediction (more below), I'm looking in to implementing some features for ODK and using them to collect usage data. Before I begin investing development time, I'd like to:
- gauge if ODK users are interested in having predictive capabilities
- gauge willingness to contribute usage data for research purposes
- propose several features that would be required in order to make this happen - I will be contributing work/efforts (probably in whole) toward building these features.
- get some understanding as to how this would fit in with ODK and user's expectations
The Project
The ultimate aim of the project is to assess the impact on users (time spent, error rates, other metrics) of having a system that predicts and potentially pre-selects options for a user
This preselections can be made based on:
- A user's own history on a form
- histories of other users of the same form (i.e: teammates, colleagues, etc)
- Answers to questions preceding the one being predicted
###Example
So a very simple (and certainly made up!) example would be: if we're taking down somebody's personal details and their 'title' is 'Dr.' - when the profession question comes up later in the survey, we know that most people with the title 'Dr.' are in fact Medical Doctors so pre-select that option on the form when it comes up (with the option of course to change it).
The Features
###Predictive technology (collect/aggregate)
- predictive widgets - some are simple (checkboxes/radiobuttons can simply be pre-selected), some are more complicated, such as predictive text, this may be implemented in some priority order depending on time available. Care must be taken in to avoid confusing a user or disrupting their work/focus.
- referring to models - the form system must be able to extract some data/probabilities from the underlying predictive models.
- implementing to baseline models without crowdsourced-ness (MFU/MRU) - this allows us to evaluate performance on a level playing field
- control the fields it's enabled on? - do we do this through the form definitions? enable it everywhere? by widget type?
###Model transfer (collect/aggregate) / Model creation/update (aggregate)
I group these together because they're two halves of the same process - but basically, models are generated on the aggregate server (models and modelling systems yet to be determined) and serialized and sent to ODK collect for the steps above (i.e: referring to models) - this is in order to maintain the offline capabilities of ODK collect.
###Telemetry Collection and Reporting (collect/aggregate)
- Timing overall - I believe this is partially impelemented - will have questions on this.
- timing per component/question
- backspacing/corrections
- on who's infrastructure? - do we simply provide it back to a team's aggregate server and ask them to export and upload data? straight from ODK?
###Informed Consent to collection of telemetry
- When to show a consent form to a user? start?
end?
opt-in?opt-out?
admin options?
per form? per user? - Experience Questionnaire/Survey
##What does ODK get out of this?
- Predictive technology, even if unsuccessful on the crowdsourcing front - MFU and MRU alone are an improvement over the baseline/nothing [1]. This equates to productivity gains!
- Telemetry if it does not already exist
- meta survey functionality (or just another ODK form? :P)
##What does the researcher get out of this?
- Anonymous usage data for use as data collected for my dissertation to evaluate the impact of prediction on time saved in filling forms.
- Contribution to OSS
##Other Questions
What do the maintainers/community think? Would such features be considered for inclusion in production ODK components?
Would ODK form administrators and their users be willing to contribute data to such research? (where could I gauge this interest if not on ODK forums)