Chrissy h Roberts - TAB Application - 2020-09-01

chrissyhroberts · September 15, 2020, 4:25pm

Name
Chrissy h Roberts (@chrissyhroberts)

Organization
London School of Hygiene & Tropical Medicine (LSHTM)

What contributions (e.g., issue triage, tech support, documentation, bug fixes) have you made to the ODK community?
Long term focus on implementation of ODK systems in clinical research, epidemiology, emergency global health response and clinical trials.

Contributions to ecosystem development through commissioned work with Enketo (security updates) & Nafundi (audit trail etc.)

How do you believe your contributions have benefited ODK?
My work has raised the profile of ODK in the academic research community and particularly in the sphere of emergency medicine, clinical trials during outbreaks and global health. My team's work leveraged opportunities to test innovations in ODK ecosystem in real world high challenge settings and to refine and improve functionality of ODK in complex contexts such as conflict zones and areas affected by high consequence infectious disease outbreaks.

The ongoing collaboration between my team, Nafundi and Enketo has led to developments which we have detailed in a paper that we recently submitted to the academic journal BMC Medical Informatics and Decision Making (Preprint https://www.researchsquare.com/article/rs-52854/latest.pdf) This paper describes some significant updates and improvements to ODK including audit trail, fingerprint scanning, scalability and automation.

What do you believe the top priorities for ODK are?
I suppose that the current goals are to progress with the Central roadmap.
I'd personally identify a potential priority in this as being the need to expand functionality of Central to include functions that allow longitudinal data collection, sync between Central and Collect and audited editing of data on device and/or server.

How will you help the ODK community accomplish those priorities?
My position as lead of a mixed methods clinical research team provides me with opportunities to test new developments to the ecosystem in real world settings. The various needs of the many clinical trials, research studies, anthropological and ethnographic studies that my team is involved in, will continue to provide links to a sizeable end-user pool which has highly specialised and diverse data collection and management needs.

I have professional links and ongoing collaborations with many major health NGOs including the WHO and MSF and my position as a senior academic at one of the world's highest ranking universities provides many opportunities to advocate and raise the profile of ODK among key actors in global health.

How many hours a week can you commit to participating on the TSC?
1-2

What other mobile data collection projects, social good projects, or open source projects are you involved with?
I lead the LSHTM Global Health Analytics Group, which is focussed on leveraging emerging technologies for public health benefit. All of our work is for public and social good. We are currently funded to develop electronic data collection and management systems for vaccine trials during health emergencies through a research grant from the UK Department for Health and Social Care and National Institute of Health Research

Please share any links to public resources (e.g., resume, blog, Github) that help support your application.

danbjoseph · September 15, 2020, 9:43pm

Hi @chrissyhroberts
Are there challenges specific to complex contexts such as conflict zones (better data security?) and infectious disease outbreaks (hardware that can be sanitized?) that the ODK community should be looking to solve?

chrissyhroberts · September 16, 2020, 9:55pm

Hi @danbjoseph,

In our experience the security of the data (both from ODK encryption and built-in Android) is great already and hardware is cheap enough that we can usually afford the inevitable loss of a few devices to adverse events. Having only sporadic Internet connection is obviously a big problem, but the off-grid capability of ODK can keep the boat afloat until satellite links, sim cards and eventually broadbands can come in to play.

There's been a bunch of stuff happening during COVID-19 that I think really highlight a big gap in what ODK's ecosystem can do.

Quite early on in the pandemic I was approached by a stakeholder which was a consortium of agencies including charities, government, health service and academics. They wanted a system for tracking displaced people through an emergency support scheme. People would be registered with the system in location 1, given a medical exam in site 2, found somewhere to stay by a wandering field operative (which would be edited if they moved again) and so on. What they really wanted was a participant information management system that was accessible from many different handsets, but essentially didn't do a lot more than create and then allow multiple users to edit a single form that had very limited data complexity (who, what, where, etc).

Going back a year to Cyclone Idai in Mozambique, we had discussions with another stakeholder about their needs for patient management in an emergency medical facility. Their needs were essentially the same as the ones described above, i.e. tracking participants through the hospital from triage, through wards (including isolation) and to discharge. As in the above example, most of the data needs were actually pretty minimal. In clinical trials we have similar needs for longitudinal data followups and the multi-form approach where you stitch it all together in R later makes the whole thing bottleneck on the analysts and is very much "near-real-time" at best. In lots of these situations, we ended up looking towards REDCap to fill the needs as that system has online editing capability (and imho a fundamentally unintuitive front end on the mobile app).

Contact tracing (for Ebola, multi-drug resistant TB, SARS-CoV2 etc.) and longitudinal surveillance of exposed contacts is also such a huge thing (as we all know right now) but actually hard to do with a system that doesn't allow for sync and edit of data by multiple end-users. For this we ended up using DHIS2, which is a great platform but really hard to deploy.

Really I think that all the places where we've had really big challenges, the problems have all been about how multiple teams working together can easily access, edit, add to and audit some fairly basic data sets. The off-grid/online issues that editing bring are less of a problem (I think), because you can probably solve this with a bit of ingenuity and a copy of Central served across a LAN to local handsets (or at worst with a lot of pay as you go credit and a sim card on a WIFI hub acting as a conduit to the internet). This kind of design might be really helpful in places like displace persons settlements, those emergency med-centres and other semi-permanent emergency response settings.

So, for me, it is all about being able to get data editing functions... ...but I would also like a version of ODK Collect that works on a smart watch using google assistant to fill the form.

seewhy · September 17, 2020, 1:04pm

What I like about your response @chrissyhroberts is the clarity of exploring this from the perspective of simplicity rather than additional functionality (and I understand the two are intrinsically linked!). There are benefits of those features to so many other situations - your simple "who what where", if it could be addressed would open so many opportunities for us dealing with far less important or time critical issues.

I would agree that one of the major challenges is making things easy to deploy - in a rapidly changing and challenging situation reliance on deep technical knowledge to get up and running (or adapt on the hoof) is as much of a bottle-neck as your analysts example. But I guess a balance is needed between 'works out of the box' and flexibility.

And then you go and blow it all with smart watches

chrissyhroberts · September 17, 2020, 9:14pm

Great point about the ease of deployment @seewhy and I agree that balancing out of the box ease of use with high level of flexibility is really key. Interestingly though, our experience is not necessarily that people have problems installing and running central (though some noobs do find it daunting). This tends to be because local IT teams are awesome and helpful. A lot of the problems come from the need for data sharing agreements, contracts and trans-national data law, which can make things really tough to do in an emergency. Unfortunately that's some pretty over our heads stuff that we can't really do much about for one another and just have to suffer with alone!

The whole 'out of the box' thing extends in my thinking beyond the installation of central and set up of devices. What there's a lot of opportunity for in case studies like those above is generalisable out of the box solutions for specific tasks. For instance, my team recently wrote a set of ODK forms and protocols for an ebola vaccine trial that we are now remixing for COVID-19 work. The forms and protocols are shareable to others doing similar trials and we're trying to put together a general use package that includes analysis and reporting pipelines along with data collection stuff (read all about it here)
We'd also love to do something similar with the emergency medical service patient information system described above as this has essentially unlimited re-use value in similar participant/patient management contexts. I kind of start to hope that something our community could look towards doing better is the sharing and co-development of some of these OOB solutions, along with documentation and protocols for use in the field.

The Excel spreadsheetiness of ODK's form design wins the hearts of many in terms of using something familiar to code up the survey/database, but when faced with a more complex data management problem, having some 'templates' could be an interesting starter solution for many newcomers.

mathieubossaert · September 18, 2020, 3:51pm

It seems there is a shared willing about the creation of a more "formal" place to share differnet working forms with the community. It would be really motivating and helpful for - not only new - users.

dicksonsamwel · September 19, 2020, 9:02pm

Hi @chrissyhroberts

Thanks for the detailed application.

In your opinion what is does it take for the ODK tools to be an alternative for this system that you are building? Or is it a highly custom system?

LN · September 20, 2020, 9:47pm

Thanks for the work you do packaging ODK for others to use and for applying to further guide ODK.

What do you see as TAB members’ roles in making something like this happen? (For example, should the TAB mark it as a priority roadmap item for the core team? Assemble a volunteer working group? ...?)

chrissyhroberts · September 21, 2020, 2:24pm

So far I have always focussed on use of ODK where it has capability to meet the needs of the work. For the most part, my feeling is that there's usually a way to use ODK for most purposes, for instance a series of cross sectional surveys with one or more form can substitute for a longitudinal survey form. Where this has not been the case and we've ended up using other systems (either alone or as a hybrid), the problem usually comes from situations where there's real need for multi-way sync between stakeholders in the data (see above case studies) or where oversight, data audit and transparency are requirements of the work. For instance in management of health data, clinical trials etc there are very strict rules and laws about what a data system has to provide. A lot of that is about the need for a database that can be edited and where the edits can be audited by a third party using logs.

chrissyhroberts · September 21, 2020, 2:35pm

@Odil appears to be thinking on the same lines and has suggested a "Library of XLSForm templates"
and I think that there's a lot of potential for this kind of group knowledge base becoming a part of the ODK Forum's community activities. For me, the most tantalising yet underused part of the forum is the Showcase, but this is rather underused and has only basic indexing.

I feel that the role of the TAB in respect to enabling this might be to think about how to develop the showcase in to a new entity that has some means by which forms and solutions can be shared in a highly structured way, whilst not alienating people who find things like github rather daunting without support of excellent documentation.

At the same time, I think that the TAB needs to provide guidance in the form of standard operating procedures and possibly development of a screening tool that could check that XLS files submitted to the system are (i) valid and (ii) pass some basic formatting standards (like column headers being in the right order).

tomsmyth · September 29, 2020, 8:27pm

Hi @chrissyhroberts! Thanks for the application!

I'm glad to see your enthusiasm for and experience with longitudinal data collection. @LN, @Xiphware, and I recently had a conversation about restarting the effort around that. We noted that adding longitudinal functionality represents a major shift in the conceptual model of ODK. This is daunting to get right—to pivot the app in this major way without it losing its identity. It is tempting to do too much and lose the simplicity that makes ODK great, but also risky to keep things too simple in a way that ends up being too restrictive.

I'm curious if you have a concrete vision for how longitudinal functionality could be done "the ODK way".

chrissyhroberts · September 30, 2020, 9:40am

Hi @tomsmyth

I think that there's a lot of ways to think about what longitudinal really means, but my feeling is that it is really important to make the distinction between longitudinal and relational data.

By definition, longitudinal data really just means making repeated observations of a single entity (a person, a tree, an infinite improbability drive). Whilst it would be tempting to pursue a relational design because we think of things in relational ways (i.e. separate forms from visit one, visit two and visit three relate to one another so there should be a link between three different forms), in reality if there's only one form then there's no need for relationships at all and you just keep adding those observations to the same single form.

I think that in its most basic implementation, a longitudinal data collection system simply allows you to open and edit or add more data to a previously submitted form. This really comes back to a thing I talked about in other posts, which is the concept of participant/entity management (another way of saying longitudinal data collection).

Example 1 : Longitudinal data collection

So let's say we have a project based in a tea plantation and we are monitoring the health of individual tea plants across time.

We visit each plant periodically and follow their health using our ODK form.

Each bush has a little tag on it which has a human readable code and a QR Code on it. This tag allows us to find and open the correct tree's record. With thousands of plants in the plantation, QR becomes very valuable indeed!

Inside the form, if we know how many visits we will be doing we can have a really simple form structure like this

Question	Answer
Plant number	001
Visit 01 status	Healthy
Visit 01 date	2020-01-08
Visit 02 status	Healthy
Visit 02 date	2020-01-17
Visit 03 status	Signs of dieback disease
Visit 03 date	2020-01-26
Visit 04
...

The use of repeat fields (indexed or not) makes it tidier and more flexible

	Question	Answer
	Plant number	001
Begin Repeat
	Status
	Date
End Repeat

So the only big difference we need is actually just the ability to look up and open/edit the form, which is already on the roadmap for Collect and Central.

Having a nice interface for finding one record among thousands or hundreds of thousands would make it even better

As a bare minimum for editing we'd need to be able to search for a specific text match within an instance_name, but being able to search this using either text or QR code scanning would be much better.

As a bigger bonus, being able to search for records that match text in either a pre-specified field-name (i.e. the ID plant number variable in the example) or in any single or multiple field of the form (i.e. a true search) would be amazing. In the example, this would allow you to search for all forms related to a specific zone of the plantation. A more complex search might find bushes of a specific variety in a specific zone. Even more complex would be able to interrogate repeats and find bushes which had ever had signs of die back disease.

So far, I think that adding some level of search, open, edit capacity is totally the 'ODK way', because we are talking about something a bit like a digital piece of paper that we open, add data to and save/submit without any need for relational structures.

Example 2 : Much More flexible Longitudinal data collection

Things get a bit more complicated when we start thinking about who needs/ should be allowed to be able to see certain data and again, there's this temptation to think about this at the level of whole forms and to default to thinking about relational forms.

Let's change the example to a medical clinic, where the entities are now humans and we're keeping track of infusions of a medicine across a few days in a clinical trial. Our form for this looks a bit like this

	Question	Answer
	ID	0001
	Name	Jeff
	Address	1 Infinite Loop, Ca.
	DOB	1978-03-02
	Trial Arm	Placebo
Begin repeat
	Drug given	Wuwuwumab
	Volume	1000 CC
	Date	2020-09-30
End Repeat

Now on this form there's some sensitive fields (name, date of birth, address, trial Arm) which only the chief investigator should be able to see. The clinicians delivering the infusion just need to be able to scan the QR code on the patient's wristband (i.e. ID 0001) to open their record and add to the data. For confidentiality we would want the clinician at bedside to see this

	Question	Answer
	ID	0001
	Name	hidden
	Address	hidden
	DOB	hidden
	Trial Arm	hidden
Begin repeat
	Vial number	12342398
	Volume	1000 CC
	Date	2020-09-30
End Repeat

It strikes me that ODK Central already has some controls for controlling access to forms at the level of the whole form. The kind of control I am describing here would require some control over which users could access data at the level of individual fields and this would presumably have to be part of the XLSForm definition, i.e. by adding a new column in which you add specific user groups

I can't think of any study I've ever done where there wasn't some degree to which we would need this kind of access control. In any study with humans, we prefer to keep the personal details for top level eyes only and to use pseudonymous ID codes for study staff.

Thinking about how many contexts my teams and colleagues would use this type of granular access control in, this would be a huge priority to me (i.e. it would be priority #2 after form editing) and this still feels pretty much the ODK-way.

I'd love to discuss this further