Hélène Langet - TAB Application - 2022-04

Hélène Langet (@thalie)

Swiss Tropical & Public Health Institute

What contributions have you made to ODK?

My contributions mainly consisted in feedback provision to the ODK team from a user's point of view (e.g., feedback on application use cases and beta features such as the built-in audio recording for ODK Collect or data management in ODK Central, reporting and documentation of bugs especially when handling with encrypted data), and more marginally direct technical contributions to the ODK ecosystem (export of encrypted data in RuODK, supervision of a MSc student for developing a generic R package to help users generate automated reports).

These contributions were based on my experience as data lead in the multi-country research evaluation conducted for the “Tools for Integrated Management of Childhood Illness” (TIMCI) project, in which all research data are collected with Enketo/ODK Collect, stored in ODK Central and processed through RuODK and Rmarkdown to generate analyses, and operational or quality reports.

I also dedicated a significant amount of my time in the project to coaching collaborators with various backgrounds on the use of ODK Collect/ODK Central, and for those collaborators with the most advanced IT skills, also on RuODK and the ODK apiary (Python/R). Last, I have advised collaborators involved in other health data projects on when ODK is the best data collection tool depending on their needs.

How do you believe your contributions have benefited ODK?

I think the TIMCI project is a very good showcase of the potential of ODK for complex data collections in challenging environments, where – to my knowledge – no other tool can offer the same range of services and existing solutions dedicated to investigational product clinical trials would have lacked the flexibility necessary to collect all the required data to meet the project objectives.

Using the ODK ecosystem enables a remarkable consistency of all research processes, triangulation of data between studies, and the management of fully reproducible research pipelines. Whenever possible I have shared with my collaborators and the ODK community the know-how that can be transferred from my experience in this project, and especially concrete starting elements to implement its underlying concepts.

I also encouraged ODK users to use more advanced/new features (as lots of them remain unknown to those who do not regularly read the forum and the technical documentation). I believe that this exchange has benefited the ODK community and can be further cascaded to new users. In addition, the wide range of data needs encountered in TIMCI (which spans from pragmatic clinical trials with scheduled follow-ups to qualitative interviews, passing by clinical observation, time-flow and health cost surveys) have allowed me to report practical challenges encountered by ODK users to the development team while – I hope – avoiding the pitfall of being locked into a too specific use case.

What do you believe the top priorities for ODK should be?

First, it seems essential for me that the ODK solution continues to be as robust, as generic and as modular as possible while addressing the needs of a wide range of users. For me, these are key strengths of ODK and one of the main reasons for its success. Adding longitudinal features will definitely open new opportunities for all users – I also really liked the idea suggested by Florian – if I am not mistaken – of a parent (meta-)form and child forms, which could pave the way for an even more modular approach to designing forms.

Second, it seems to me equally important to continue the excellent work initiated by the team to strengthen the consistency (e.g., Enketo/ODK Collect) and attractiveness of the ODK ecosystem. In this sense, further developing generic data monitoring/quality services that can help users to quickly identify major flaws in the data collection or alternatively further developing interconnectivity with data analysis/visualization tools or pipelines that can provide these services without requiring advanced data skills would certainly benefit ODK users (e.g., exploiting what is available in the metadata and audit log, which are currently underexploited by most users).

Last, and probably more specific to data collection involving human subjects, proposing more granularity for data access and protection would enable manipulating these data more easily without compromising on data protection (usually only a subset of a full dataset requires the highest level of protection, e.g. personally identifiable information or sensitive information such as HIV status, ethnicity, etc).

How will you help ODK accomplish those priorities?

I have a strong innovation background (10 years of research with GE Healthcare and Philips) with a core technical expertise on data analysis/visualization/interpretation, and a keen interest for population health, preferably in very remote settings, so that I would enjoy any strategic/roadmap discussions for developing ODK further.

I can provide a system thinking approach, since I have hands-on experience with the whole data lifecycle from design/collection to analysis for generating public health evidence in the TIMCI project (as the quality of any data analysis depends on the quality of all the upstream components), which obviously is only one experience among others and must be weighted accordingly.

I also have good experience of data regulations and publication of scientific evidence at the technology/application interface to support/guide the development of new technologies. I am also very interested in more actively contributing to knowledge transfer and capacity building to increase the pool of ODK users and empower existing ODK users.

How many hours a week can you commit to participating on the TAB?

What other data collection projects, social impact projects, or open source projects are you involved with?

Please share any links to public resources (e.g., resume, blog, Github) that help support your application.

  • https://github.com/SwissTPH/timci - The GitHub project that contains the code of the R package that is used by each research partner for all the data management activities (including de-identification, generation of follow-up logs and automated form publication, and the generation of operational and data summary reports). Notice: it is very project specific that cannot be reused outside of the TIMCI project as such, also of inequal quality as I am not supposed to be coding at all (...), but some of the ideas developed here and know-how (e.g., on how to parametrize, modularize RMarkdown documents, build fancy LaTeX tables, etc) can definitely be reused in any new project.
  • https://github.com/SwissTPH/repvisforODK - The GitHub project that contains the code of the R package that was developed by Lucas Silbernagel during his internship to generate a generic HTML data summary from any ODK Central project (Rmarkdown, plotly)

Do you have a method or approach for evaluating a person's background and deciding where to start with them on their ODK journey?

Hi @Thalie !
I had a couple follow-ups for you:

What tools do you think/see from your perspective are priorities for interactivity/pipelines?

What features do you see people not knowing about/using?

I'm often curious how people in clinical trials view ODK as a tool for data collection - and if its features are robust enough for the required levels of data protection, data auditing, etc. From your perspective, is it the right tool? Anything missing that would make it an obvious choice? I've been part of some trials locally, and I questioned the poor nurses who showed up at my house to do data collection all about RedCap (what they were using). My brother also does clinical trials and I think he uses RedCap. Just curious if you have thoughts on ODK vs. RedCap or others.

Thanks! Great application!

1 Like

Hi @danbjoseph, nothing very formal at this stage, but I would indeed love to have something more structured with well-identified criteria with associated resources and development pathways so as to accompany users along their ODK journey. I realize it may be less the starting point that matters but rather at what speed you can go through the initial stages with some users and how far you can accompany them, while you will need to spend more time consolidating the initial stage(s) before trying to move to more advanced usages with others.

A few initial thoughts, but it is highly biased by my own context so no claim to universality here.
In general, I would definitely fast-track anybody who has a proactive/problem-solving approach and has some prior programming experience (but here it will also depend on the programming maturity of the person, i.e. what programming languages they are more proficient with and their level of coding (imperative vs. functional vs. object-oriented – in the latter case the person probably does not need much guidance to find their way through ODK). But obviously it is a continuum and you can have users who handle some fairly complex calculations with Excel and may be faster than somebody with very basic R knowledge.

  • basic Excel user / no previous exposure to logical relationships: start with design of basic ODK forms (possibly even starting with the drag-and-drop user interface), draft and publication of forms on ODK central, data collection in ODK Collect or Enketo, downloading data from ODK Central

  • good understanding of logical relationships / basic calculation skills with Excel: can be exposed to most ODK features (targeted to fit their specific needs, e.g., can be repetition loops, choice filters, etc), use of oData API with Excel / Tableau / PowerBI

  • Users with some programming experience / good command of all individual ODK functionalities: design of advanced ODK forms combining several features to create customised functionalities, use of ruODK to retrieve/process data

  • Users with extensive programming experience: XForms, use of the ODK apiary, etc

More basic users will understand the structure of a form they did not design but will generally need more than high-level instructions to be able to modify its logic, while independent users will be able to implement the change, advanced users will discuss the merits of a given design/approach and suggest improvements.


Hi @janna,

I am also sure you would have more thoughts about the question from Dan as you have done a lot on training. I forwarded the link of your Youtube to some of my colleagues as this is a great resource for ODK users.

Mostly the latest developments/features which have been added by moving from ODK Aggregate to Central or in the new releases of ODK Collect, but also features that users have not been directly exposed to depending on their own personal experiences (sometimes it seems it was several years ago...). I also came several times to @aurdipas starting with "this is probably not feasible with ODK...", when in fact there was almost always something feasible. In general, the potential of ODK is largely undervalued. I believe the work that has been initiated on different channels to advertise about these new developments from a user's perspective, including the resources you shared, or posts on the social media, is essential and should be further strengthened. I know the main features of ODK Central are described on this page, but what I would find it helpful for users to be able to refer to a summary table that would group technical features within functional/thematic categories users may be interested in, possibly also allowing a direct comparison with what was previously (not) available with Aggregate for former Aggregate users. Easier said than done obviously :slight_smile:


That's a good question. My current experience with ODK comes within in a very specific context (multicountry trial, healthcare system intervention aka pragmatic trial with data being collect mostly at the primary care level, offline as network may not be available especially in very rural areas, and we also wanted to record an unknown number of unscheduled visits which does not really fit within standard trial visits schedules) so it is not representative of all clinical trials.

I was actually initially advised to use REDCap and started doing some tests with the REDCap Mobile App (since we needed a data collection tool that could work offline). That was two years ago, so the app must have largely evolved since then and again my tests were very context-specific, but I was not really convinced... so I went to Aurelio (I think my initial question was something like "ODK is probably not the right tool for clinical trials, but I would like to have the confirmation that it is indeed the case and I do not have a more serious alternative to the REDCap Mobile App?") who then introduced me to the new ODK Central (v0.9). I still ran a comprehensive assessment/testing of both the REDCap Mobile App and ODKCollect/Central, developed a detailed comparison table to objectivise my decision, reviewed it with our clinical/statistical colleagues and agreed that in our context, ODK was definitely a much better fit (even in terms of data quality since the app was not as user-friendly as ODK which would have led to more data entry errors in the field). While I am convinced that ODK was the right tool for this trial, I would not go as far as to say that it is the right tool for all clinical trials and could definitely have chosen another tool such as REDCap (especially the online version of REDCap) in a completely different context, especially if this involves an investigational product for applying to a regulatory body (since in this case you would need a validated system which is not the case of ODK). What type of context is your brother working on? It would be interesting to compare.

I think at this stage the only "annoying" thing is that we have to work with encrypted data for our main trial and cannot benefit from all the great edit functionalities now offered by ODK Central, while it would be super useful to correct data entry errors reported by research assistants (or sometimes a change that happened shortly after the form was finalised). At the moment I maintain an external system of changes in R and use the APi and RuODK to interact with ODK Central (such as system still has the advantage to allow more context-specific checks and corrections, something which will never be available within ODK, e.g. identifying patient duplicates based on a selected subset of variables).

I also do not believe that it should be the goal of ODK to become a perfect tool for clinical trials as it would then lose the flexibility that makes its main interest, but I think that for any trial conducted outside of controlled settings or network service area, in which the data collection is at the interface between real-world evidence data and a very controlled data collection, or at the interface with epidemiology, ODK can definitely be an ideal fit and a perfect trade-off between flexibility and quality. I am especially a big fan of the audit trail, which I still did not have time to mine as much as I would like to.