Future of ODK's self-survey functionalities

vlehn · August 10, 2022, 5:17pm

Dear ODK community,

first, thanks for all your great work on ODK! I'm heavily inspired by the vibrant global community and especially by the professionalism and drive of the core development team. It's really great to see ODK grow.

I'm writing this post as I feel the need to share some thoughts that developed over the past months.

Historically, the current ODK suite is mostly deriving from ODK Collect as a survey software. And to my understanding, ODK Collect is mostly aimed at enumerator-based survey projects, where an enumerator downloads the app and interviews people (via phone, in person, etc.). I guess ODK is great for that.

In our case, we have been using ODK mostly for self-survey projects, so we sent out public form links to our recipients and let them answer on their own. The direct integration of Enketo into ODK Central is great for that.

What I am worried about are ODK features that rely on attaching datasets to forms. Entity-based data collection is AFAIK a hot topic and currently further being worked on. However, for self-survey purposes, I think they are not a legit option and they would technically reveal the full dataset of previous entities/enumerations/other sensitive data to each respondent. I was once asking for an option to do something similar in a confidential way, however, I think it's not possible atm.

My question is:
Are there any thoughts on progressing towards self-survey features that reflect on such issues and are more data security aware?

For now: I think there should be some clear hints or even warnings in the documentation, that attached datasets are being technically revealed to self-survey respondents and that this can be an issue for personal or sensitive data.

I don't think ODK has to be the best tool for every use case (and for enumerator-based survey, I guess it's one of the best), however, in the given case I suggest to be clear about the limitation.

Thanks and all the best,
Vitus

LN · August 17, 2022, 1:10am

That's correct! That has been the historical strength of ODK and where we continue to focus most. Our reasoning is:

There aren't many alternatives that do offline effectively and that's critical for many of the contexts our users are in.
A lot of beneficiaries of ODK-facilitated activities have low literacy and low access to technology so self-report is not accessible.
A lot of the types of workflows we focus on are inherently mediated: service delivery (e.g. vaccinate and keep a record of those vaccinations, plant trees and keep a record of those plantings), independent monitoring/evaluation (e.g. observe a service delivery and report on its process), research (e.g. capture all the plant species observed in an area).

That said, ODK is being used in a greater range of contexts and self-report is becoming more of a reality in many contexts where ODK has been historically popular so it's certainly something we think about.

I agree that's a concern. In most contexts, it would not be appropriate to use a dataset representing respondents for self-report. However, there may be some where all respondents are already known to each other or where baseline data is not at all sensitive where it would be ok. There may also be other datasets that don't represent respondent data that would be helpful to attach to self-report surveys.

We have a number of ideas and concepts but no timeline yet. One of the things we have done in the last year is become the primary maintainers of Enketo which means we are invested in web forms and their continued integration in the ODK ecosystem.

We do currently work with some projects involving longitudinal self-report. We see approaches like having respondents double-enter a unique code or sending a unique Public Link per respondent.

Depending on how much data needs to be passed on to follow-up forms, you may be able to make use of Enketo's existing query parameter defaults functionality. You would also likely want to script some of the workflow. For example, given a Public Link and a spreadsheet containing a participant id, email, first name, last name and date of birth, you could generate URLs of this structure and email them to each participant:
<base link>?st=<public access token>&d[/data/participant_id]=12345&d[/data/first_name]=Kwame&d[/data/last_name]=Gagnon&d[/data/details/dob]=1976-03-04

In this example, questions at the root of the survey with names participant_id, first_name, last_name would be populated according to their query parameter values. Question dob in the details group would also be populated. Those values could be calculates (not user visible), read-only for confirmation or user-editable so that respondents can update them.

The hardest part of including something like this in Central is the user interface and making sure it’s coherent with all of the other existing functionality. It’s something we do hope to explore.

yanokwa · August 17, 2022, 2:03am

I agree that we could do a better job communicating this risk. Is there a particular place in the documentation where you'd expect to see this?

vlehn · August 18, 2022, 11:37am

Hi Hélène and hi Yaw,

thanks for your feedback!

I think unique public links with individual tokens could be a great solution and might be well integrable into the existing ODK UI (not sure what the exact plans for longitudinal collection currently are). Though I ealize that the use case should be well-understood in order to come up with a solution here.

That is true, in our case, indeed, to often the amount of data that was to be prefilled was too much. But you are right, this can be an approach for some use cases.

To me, it would make sense to describe the technical implication of attaching a dataset on the Form Datasets page, potentially in an info box.

Then on the Data Collector Workflows page, the sections "Encounters with known entities" and "Multiple encounters with the same entity (longitudinal)" appear to be the most suspicable to involve sensitive data. Here, I could imagine having little warning boxes pointing out the risk with self-survey/Enketo projects.

What do you think?