ruODK is an R client for ODK Central API. Currently, it's under review at ROpenSci:
rOpenSci is a non-profit initiative founded in 2011 by Karthik Ram, Scott Chamberlain, and Carl Boettiger to make scientific data retrieval reproducible. Over the past seven years we have developed an ecosystem of open source tools, we run annual unconferences, and review community developed software.
I would love to get any kind of feedback on ruODK from the ODK developers and community:
Are the demonstrated workflows correct?
Are explanations and technical terms correct?
Is the language of the documentation accessible and inclusive? (non-native speaker here)
Did I miss an obvious dad joke anywhere?
Any feedback here or at the ruODK submission would be greatly appreciated.
Thanks for the fantastic work on ruODK. Your workflow description for setting up ODK Central and ruODK is great. I have managed to run through the entire process in less than 4 hours.
My greatest challenges along the way:
Generating an XLSForm (non ruODK related and non-experienced form developer here). I watched this tutorial by Secure Data Kit, which I found very helpful and concise.
R package dependencies. In particular Issue 46. But, again, non ruODK related.
Something that would be great to add are examples for the two use cases of "smaller" and "larger" projects that you describe.
All those pesky development dependencies should eventually find their way to CRAN. Maybe adding an install_all_dependencies.R could save some typing? One other already available approach is the binder template for ruODK, which already comes with all packages.
Fresh off the press as an example for a small use case: https://dbca-wa.github.io/rGeoCBI/index.html
One paper form becomes one ODK form, with the plumbing being delivered as a small R package.
ruODK has just passed peer review, and is being transferred to rOpenSci at https://github.com/ropensci/ruODK. I'm sorting out a few squeaky bits, then we'll announce the release officially.
Let me know if you run into any troubles in the meantime.
Edit: transfer is done, and updated links have been merged into ODK Central.
Hello, congrats for the great work achieved here! It really makes ODK central super attractive. I am especially interested in using the R data handling capabilities, Shiny and rmarkdown on the data collected with ODK as part of a large-scale RCT + a few mixed-research studies.
I think the only one thing I struggled with when setting up RuODK was the fact that I was working with ODK forms that were still drafts and had not been published yet, and hence got empty tables when trying to read submissions. It may be worth re-emphasising this (although I see that publishing is indeed mentioned in your section describing the ODK Central setup)
Some further comments with a data management perspective:
It would be absolutely awesome to be able to read the *.csv files generated by the ODK audit log with RuODK, in addition to the server audit log (e.g. as a data manager, I would like to know if the data collectors have modified their answers many times, or when changes have been done, in order to monitor my processes)
I was also wondering how the community would envision data cleaning activities within the pipeline? I would currently see it as a regular reading of the ODK database(s) through RuODK, making advantage of R's large dataset handling capabilities (https://rpubs.com/msundar/large_data_analysis), so that the whole data are processed / cleaned again using the same code and SQL / SQLite tables (or *.csv / Excel) generated as outputs are overwritten. Necessary corrections / edit to solve queries raised by data quality checks would be stored in separate SQLite tables so that they can be easily tracked.
Re data cleaning:
Server-side data editing for ODK Central (with audit logs!) is on the roadmap: What's coming in Central
This would bring QA capabilities to ODK Central.
ruODK's scope is to extract and parse the data from ODK Central. I drew the line there to keep it as small and modular as possible.
Therefore, the docs are sparse on all the wonderful data cleaning / workflow automation options in the R universe. I feel a "showcase" vignette coming up... additions and ideas welcome at https://github.com/ropensci/ruODK/issues/92
Thanks a lot for all the links! Don't take my question on data management as something to be developed as part of RuODK, I fully agree with your approach on keeping developments modular.
I also fully agree cleaning / editing directly in ODK is what would make the most sense. I still foresee that it may be interesting to run data quality checks in R (e.g., if you want customised sophisticated processings such as approximate string matching, bootstrapping, etc), but I understand this would rather generate data quality reports that will then be used by data managers to query / edit data through Enketo (which would make perfect sense).
Happy to contribute to ideas while working on my different studies - also happy to share more examples as soon as I have something consistent enough.
That's great, thanks and looking forward to your experiences with ruODK!
I'll work your suggestions into a vignette for sure.
Agree that direct editing in ODK Central could address a good few QA operations.
On the other hand, a use case for ETL downstream of the ODK ecosystem is if you'd need context from other points of truth, as in longitudinal records that came in through other avenues, matching ODK Collect usernames to actual names in user profiles. Another use case could be if you'd need some tricked out visualisations beyond a simple list view of submissions, such as "list of mildly related records". E.g. fuzzy matches to possible typo duplicates, as we get it if over-tired field workers hand-write/type turtle flipper tag IDs.
Hi Florian,
Thanks for this awesome package it makes interfacing with ODK Central so much easier and straightforward for R users
I'm trying to use the pp parameter in the submission_export() function (as mentioned in the documentation here), however, once I supply the pp parameter and run the code it doesn't seem to decrypt and export all the rows of data.
Also, I'm unable to set the pp as an environment variable in the ru_setup (see code below). It doesn't seem to work:
Thanks Forian!
I updated the package to version 0.9.8 and now the ruODK::ru_setup can be run with the pp parameter.
However, when I use the ruODK::submission_export(pp = Sys.getenv("ODKC_PP") it downloads the zip file but only exports the non-encrypted submissions. The encrypted submissions are missing. Despite being listed in the submissions list. Code is below.
@Lal_S: thanks a lot for raising this! You're absolutely right, I tested submission_export.R on a project I have just encrypted and the function does not decrypt the zip as it should.
@Florian_May: I have checked the code and I have found that I should have cast the encryption ID as a string in the body request. I have sent you a pull request on GitHub + still need to work on the rmarkdown vignette I had promised months ago.
@Lal_S I've released ruODK v0.9.9 with @Thalie's bugfix and support for skipping media or repeat data.
@Lal_S Does submission_export now decrypt the submissions as you'd expect?
Note: if you've already set up your passphrase through ru_setup(pp = ...), you can omit the pp parameter from submission_export().
@Florian_May thanks so much for addressing the bug fix so quickly!
This might need testing but, after adding pp to ru_setup() the submission export only fully exports unencrypted rows, whilst for encrypted rows under status in the dataframe it says "not decrypted", after running ruODK::submission_export().
However, when I pass pp in ruODK::submission_export(pp = Sys.getenv("ODKC_PP")) it seems to work perfectly fine. I'm not sure if it's my set up or otherwise, but the workaround is fine for my usage.
@Lal_S sorry I was undercaffeinated. If you re-install the latest ruODK, submission_export now uses get_default_pp() as passphrase, which you can set via ru_setup(pp=...).