Crowdsourced Predictive Capabilities

arahayrabedian · June 28, 2017, 5:24pm

Hello All,

Apologies on the lengthy post (and apologies if this is simply too large to be in the features category), but there I don't see how to water this down without losing important information:

I'm an MSc Computer Science student at the University of Southampton in the UK, I'm currently in the process of doing my dissertation, which is in "Crowdsourced" form fill prediction (more below), I'm looking in to implementing some features for ODK and using them to collect usage data. Before I begin investing development time, I'd like to:

gauge if ODK users are interested in having predictive capabilities
gauge willingness to contribute usage data for research purposes
propose several features that would be required in order to make this happen - I will be contributing work/efforts (probably in whole) toward building these features.
get some understanding as to how this would fit in with ODK and user's expectations

The Project

The ultimate aim of the project is to assess the impact on users (time spent, error rates, other metrics) of having a system that predicts and potentially pre-selects options for a user

This preselections can be made based on:

A user's own history on a form
histories of other users of the same form (i.e: teammates, colleagues, etc)
Answers to questions preceding the one being predicted

###Example
So a very simple (and certainly made up!) example would be: if we're taking down somebody's personal details and their 'title' is 'Dr.' - when the profession question comes up later in the survey, we know that most people with the title 'Dr.' are in fact Medical Doctors so pre-select that option on the form when it comes up (with the option of course to change it).

The Features

###Predictive technology (collect/aggregate)

predictive widgets - some are simple (checkboxes/radiobuttons can simply be pre-selected), some are more complicated, such as predictive text, this may be implemented in some priority order depending on time available. Care must be taken in to avoid confusing a user or disrupting their work/focus.
referring to models - the form system must be able to extract some data/probabilities from the underlying predictive models.
implementing to baseline models without crowdsourced-ness (MFU/MRU) - this allows us to evaluate performance on a level playing field
control the fields it's enabled on? - do we do this through the form definitions? enable it everywhere? by widget type?

###Model transfer (collect/aggregate) / Model creation/update (aggregate)
I group these together because they're two halves of the same process - but basically, models are generated on the aggregate server (models and modelling systems yet to be determined) and serialized and sent to ODK collect for the steps above (i.e: referring to models) - this is in order to maintain the offline capabilities of ODK collect.

###Telemetry Collection and Reporting (collect/aggregate)

Timing overall - I believe this is partially impelemented - will have questions on this.
timing per component/question
backspacing/corrections
on who's infrastructure? - do we simply provide it back to a team's aggregate server and ask them to export and upload data? straight from ODK?

###Informed Consent to collection of telemetry

When to show a consent form to a user? start? ~~end?~~
opt-in? ~~opt-out?~~
admin options?
per form? per user?
Experience Questionnaire/Survey

##What does ODK get out of this?

Predictive technology, even if unsuccessful on the crowdsourcing front - MFU and MRU alone are an improvement over the baseline/nothing [1]. This equates to productivity gains!
Telemetry if it does not already exist
meta survey functionality (or just another ODK form? :P)

##What does the researcher get out of this?

Anonymous usage data for use as data collected for my dissertation to evaluate the impact of prediction on time saved in filling forms.
Contribution to OSS

##Other Questions

What do the maintainers/community think? Would such features be considered for inclusion in production ODK components?
Would ODK form administrators and their users be willing to contribute data to such research? (where could I gauge this interest if not on ODK forums)

LN · July 6, 2017, 8:22pm

Hi @arahayrabedian and apologies for the silence. As you've probably seen around the forum, there's a lot going on right now in the ODK world!

There's definitely a lot of interest in metrics as you can see in the discussion on timing here. I'll update that issue with the current status since all the information isn't in one place.

What I can say personally is that I'm really excited about the potential of what you're describing but unfortunately I'm not sure I'll have time to invest in it in the short term. I'd recommend developing in a fork, making things as modular as possible and writing a note here if you have specific questions.

yanokwa · July 6, 2017, 11:21pm

Hi Ara! Thanks for your detailed post. It's a very compelling idea and I'm excited to see how it turns out!

I've got two high-level concerns about this as a contribution to ODK.

First, the goal of research is to demonstrate a result and that's often hard on its own. You might not want to add the goal of contributing production ready code to both Aggregate and Collect. I don't know your timeline, but it's a lot to take on.

Second, we don't have a process for putting "third-party" research project into ODK. It doesn't feel appropriate and we'd have to run that through the PMC. I think it'd be difficult to get approval.

I think @LN's suggestion of doing all your work in a fork seems like a great solution for both these concerns. You can work at whatever pace you'd like and take the appropriate shortcuts to get your results (e.g., maybe build a simple server instead of using Aggregate). And since it's in a fork, you can recruit people who can give informed consent. If there are positive results, then we can find a way to bring that into the core.

As far as what would be immediately useful as a contribution to the community (and would give you a good introduction to ODK), some kind of auto-complete on a user's own data (basically everything on their device) would be amazing.

arahayrabedian · July 7, 2017, 1:47pm

Thanks for your responses! I think I (unfortunately) have to agree that this may be too difficult a task to get sorted in the limited time I have. I had started to think about this as I finished setting up the dev environment.

I think on my end what I'm going to do is pull off a bespoke/custom/limited solution and report my findings here after my dissertation is complete, at least that way you can make an informed decision if you decide to pursue predictive systems.

If time permits, I'll try to get around to implementing an MFU/MRU widget on ODK Collect after my MSc's over - I think it'd be a worthwhile contribution. Let's see what happens as my next 2-4 months are still very much undefined.

Thank you all for your time, I'll be around

Ara