Seeking your feedback: ruODK

Hi all,

ruODK is an R client for ODK Central API. Currently, it's under review at ROpenSci:

rOpenSci is a non-profit initiative founded in 2011 by Karthik Ram, Scott Chamberlain, and Carl Boettiger to make scientific data retrieval reproducible. Over the past seven years we have developed an ecosystem of open source tools, we run annual unconferences, and review community developed software.

I would love to get any kind of feedback on ruODK from the ODK developers and community:
Are the demonstrated workflows correct?
Are explanations and technical terms correct?
Is the language of the documentation accessible and inclusive? (non-native speaker here)
Did I miss an obvious dad joke anywhere?

Any feedback here or at the ruODK submission would be greatly appreciated.

Update: now with hosted one-click demo https://github.com/dbca-wa/urODK

4 Likes

Hi Florian,

Thanks for the fantastic work on ruODK. Your workflow description for setting up ODK Central and ruODK is great. I have managed to run through the entire process in less than 4 hours.

My greatest challenges along the way:

  1. Generating an XLSForm (non ruODK related and non-experienced form developer here). I watched this tutorial by Secure Data Kit, which I found very helpful and concise.

  2. R package dependencies. In particular Issue 46. But, again, non ruODK related.

Something that would be great to add are examples for the two use cases of "smaller" and "larger" projects that you describe.

Best,
Lars

2 Likes

Thanks for the feedback, Lars!

I'll add your suggestions to the docs.

All those pesky development dependencies should eventually find their way to CRAN. Maybe adding an install_all_dependencies.R could save some typing? One other already available approach is the binder template for ruODK, which already comes with all packages.

Fresh off the press as an example for a small use case: https://dbca-wa.github.io/rGeoCBI/index.html
One paper form becomes one ODK form, with the plumbing being delivered as a small R package.

A larger use case is documented at https://dbca-wa.github.io/wastd/index.html

Cheers,
Florian

2 Likes

FYI the link https://dbca-wa.github.io/ruODK/ doesn't seem to work at the moment, we see a 404 error.

1 Like

Hi @wu!

Welcome to the ODK forum! If you have a moment, introduce yourself here.

ruODK has just passed peer review, and is being transferred to rOpenSci at https://github.com/ropensci/ruODK. I'm sorting out a few squeaky bits, then we'll announce the release officially.
Let me know if you run into any troubles in the meantime.

Edit: transfer is done, and updated links have been merged into ODK Central.

1 Like

Follow up, the GH repo migration dust has settled.
ruODK is now available at https://docs.ropensci.org/ruODK/ with an all new logo (HT @yanokwa, cheers for your advice). Special thanks to @Odil @Are_Strom @dickoah and @dmenne for excellent feedback and vigorous testing.

5 Likes

Hello, congrats for the great work achieved here! It really makes ODK central super attractive. I am especially interested in using the R data handling capabilities, Shiny and rmarkdown on the data collected with ODK as part of a large-scale RCT + a few mixed-research studies.

I think the only one thing I struggled with when setting up RuODK was the fact that I was working with ODK forms that were still drafts and had not been published yet, and hence got empty tables when trying to read submissions. It may be worth re-emphasising this (although I see that publishing is indeed mentioned in your section describing the ODK Central setup)

Some further comments with a data management perspective:

  1. It would be absolutely awesome to be able to read the *.csv files generated by the ODK audit log with RuODK, in addition to the server audit log (e.g. as a data manager, I would like to know if the data collectors have modified their answers many times, or when changes have been done, in order to monitor my processes)

  2. I was also wondering how the community would envision data cleaning activities within the pipeline? I would currently see it as a regular reading of the ODK database(s) through RuODK, making advantage of R's large dataset handling capabilities (https://rpubs.com/msundar/large_data_analysis), so that the whole data are processed / cleaned again using the same code and SQL / SQLite tables (or *.csv / Excel) generated as outputs are overwritten. Necessary corrections / edit to solve queries raised by data quality checks would be stored in separate SQLite tables so that they can be easily tracked.

1 Like

Hi @Thalie, thanks for the kind words!

Re form drafts: https://github.com/ropensci/ruODK/issues/91
Re audit logs: https://github.com/ropensci/ruODK/issues/41

Re data cleaning:
Server-side data editing for ODK Central (with audit logs!) is on the roadmap: What's coming in Central over the next few years
This would bring QA capabilities to ODK Central.

ruODK's scope is to extract and parse the data from ODK Central. I drew the line there to keep it as small and modular as possible.
Therefore, the docs are sparse on all the wonderful data cleaning / workflow automation options in the R universe. I feel a "showcase" vignette coming up... additions and ideas welcome at https://github.com/ropensci/ruODK/issues/92

2 Likes

Thanks a lot for all the links! Don't take my question on data management as something to be developed as part of RuODK, I fully agree with your approach on keeping developments modular.

I also fully agree cleaning / editing directly in ODK is what would make the most sense. I still foresee that it may be interesting to run data quality checks in R (e.g., if you want customised sophisticated processings such as approximate string matching, bootstrapping, etc), but I understand this would rather generate data quality reports that will then be used by data managers to query / edit data through Enketo (which would make perfect sense).

Happy to contribute to ideas while working on my different studies - also happy to share more examples as soon as I have something consistent enough.

1 Like

That's great, thanks and looking forward to your experiences with ruODK!
I'll work your suggestions into a vignette for sure.

Agree that direct editing in ODK Central could address a good few QA operations.

On the other hand, a use case for ETL downstream of the ODK ecosystem is if you'd need context from other points of truth, as in longitudinal records that came in through other avenues, matching ODK Collect usernames to actual names in user profiles. Another use case could be if you'd need some tricked out visualisations beyond a simple list view of submissions, such as "list of mildly related records". E.g. fuzzy matches to possible typo duplicates, as we get it if over-tired field workers hand-write/type turtle flipper tag IDs.

1 Like