Seeking your feedback: ruODK

Florian_May · September 17, 2019, 3:02am

Hi all,

ruODK is an R client for ODK Central API. Currently, it's under review at ROpenSci:

rOpenSci is a non-profit initiative founded in 2011 by Karthik Ram, Scott Chamberlain, and Carl Boettiger to make scientific data retrieval reproducible. Over the past seven years we have developed an ecosystem of open source tools, we run annual unconferences, and review community developed software.

I would love to get any kind of feedback on ruODK from the ODK developers and community:
Are the demonstrated workflows correct?
Are explanations and technical terms correct?
Is the language of the documentation accessible and inclusive? (non-native speaker here)
Did I miss an obvious dad joke anywhere?

Any feedback here or at the ruODK submission would be greatly appreciated.

Update: now with hosted one-click demo https://github.com/dbca-wa/urODK

larnsce · January 14, 2020, 12:35pm

Hi Florian,

Thanks for the fantastic work on ruODK. Your workflow description for setting up ODK Central and ruODK is great. I have managed to run through the entire process in less than 4 hours.

My greatest challenges along the way:

Generating an XLSForm (non ruODK related and non-experienced form developer here). I watched this tutorial by Secure Data Kit, which I found very helpful and concise.
R package dependencies. In particular Issue 46. But, again, non ruODK related.

Something that would be great to add are examples for the two use cases of "smaller" and "larger" projects that you describe.

Best,
Lars

Florian_May · January 14, 2020, 11:06pm

Thanks for the feedback, Lars!

I'll add your suggestions to the docs.

All those pesky development dependencies should eventually find their way to CRAN. Maybe adding an install_all_dependencies.R could save some typing? One other already available approach is the binder template for ruODK, which already comes with all packages.

Fresh off the press as an example for a small use case: https://dbca-wa.github.io/rGeoCBI/index.html
One paper form becomes one ODK form, with the plumbing being delivered as a small R package.

A larger use case is documented at https://dbca-wa.github.io/wastd/index.html

Cheers,
Florian

wu · July 20, 2020, 4:05am

FYI the link https://dbca-wa.github.io/ruODK/ doesn't seem to work at the moment, we see a 404 error.

Florian_May · July 21, 2020, 1:50am

Hi @wu!

Welcome to the ODK forum! If you have a moment, introduce yourself here.

ruODK has just passed peer review, and is being transferred to rOpenSci at https://github.com/ropensci/ruODK. I'm sorting out a few squeaky bits, then we'll announce the release officially.
Let me know if you run into any troubles in the meantime.

Edit: transfer is done, and updated links have been merged into ODK Central.

Florian_May · August 20, 2020, 11:14am

Follow up, the GH repo migration dust has settled.
ruODK is now available at https://docs.ropensci.org/ruODK/ with an all new logo (HT @yanokwa, cheers for your advice). Special thanks to @Odil @Are_Strom @dickoah and @dmenne for excellent feedback and vigorous testing.

Thalie · August 25, 2020, 7:17am

Hello, congrats for the great work achieved here! It really makes ODK central super attractive. I am especially interested in using the R data handling capabilities, Shiny and rmarkdown on the data collected with ODK as part of a large-scale RCT + a few mixed-research studies.

I think the only one thing I struggled with when setting up RuODK was the fact that I was working with ODK forms that were still drafts and had not been published yet, and hence got empty tables when trying to read submissions. It may be worth re-emphasising this (although I see that publishing is indeed mentioned in your section describing the ODK Central setup)

Some further comments with a data management perspective:

It would be absolutely awesome to be able to read the *.csv files generated by the ODK audit log with RuODK, in addition to the server audit log (e.g. as a data manager, I would like to know if the data collectors have modified their answers many times, or when changes have been done, in order to monitor my processes)
I was also wondering how the community would envision data cleaning activities within the pipeline? I would currently see it as a regular reading of the ODK database(s) through RuODK, making advantage of R's large dataset handling capabilities (https://rpubs.com/msundar/large_data_analysis), so that the whole data are processed / cleaned again using the same code and SQL / SQLite tables (or *.csv / Excel) generated as outputs are overwritten. Necessary corrections / edit to solve queries raised by data quality checks would be stored in separate SQLite tables so that they can be easily tracked.

Florian_May · August 25, 2020, 8:52am

Hi @Thalie, thanks for the kind words!

Re form drafts: https://github.com/ropensci/ruODK/issues/91
Re audit logs: https://github.com/ropensci/ruODK/issues/41

Re data cleaning:
Server-side data editing for ODK Central (with audit logs!) is on the roadmap: What's coming in Central
This would bring QA capabilities to ODK Central.

ruODK's scope is to extract and parse the data from ODK Central. I drew the line there to keep it as small and modular as possible.
Therefore, the docs are sparse on all the wonderful data cleaning / workflow automation options in the R universe. I feel a "showcase" vignette coming up... additions and ideas welcome at https://github.com/ropensci/ruODK/issues/92

Thalie · August 25, 2020, 9:38am

Thanks a lot for all the links! Don't take my question on data management as something to be developed as part of RuODK, I fully agree with your approach on keeping developments modular.

I also fully agree cleaning / editing directly in ODK is what would make the most sense. I still foresee that it may be interesting to run data quality checks in R (e.g., if you want customised sophisticated processings such as approximate string matching, bootstrapping, etc), but I understand this would rather generate data quality reports that will then be used by data managers to query / edit data through Enketo (which would make perfect sense).

Happy to contribute to ideas while working on my different studies - also happy to share more examples as soon as I have something consistent enough.

Florian_May · August 25, 2020, 10:57am

That's great, thanks and looking forward to your experiences with ruODK!
I'll work your suggestions into a vignette for sure.

Agree that direct editing in ODK Central could address a good few QA operations.

On the other hand, a use case for ETL downstream of the ODK ecosystem is if you'd need context from other points of truth, as in longitudinal records that came in through other avenues, matching ODK Collect usernames to actual names in user profiles. Another use case could be if you'd need some tricked out visualisations beyond a simple list view of submissions, such as "list of mildly related records". E.g. fuzzy matches to possible typo duplicates, as we get it if over-tired field workers hand-write/type turtle flipper tag IDs.

Lal_S · March 13, 2021, 10:26pm

Hi Florian,
Thanks for this awesome package it makes interfacing with ODK Central so much easier and straightforward for R users

I'm trying to use the pp parameter in the submission_export() function (as mentioned in the documentation here), however, once I supply the pp parameter and run the code it doesn't seem to decrypt and export all the rows of data.

Also, I'm unable to set the pp as an environment variable in the ru_setup (see code below). It doesn't seem to work:

ruODK::ru_setup(
  svc = "https://www.abc.com/v1/projects/18/forms/laLmRDT2.svc", 
  un = Sys.getenv("ODKC_UN"), 
  pw = Sys.getenv("ODKC_PW"),
  tz = "UTC",
  pp = Sys.getenv("123445656"),
  verbose = TRUE
)  

Error in ruODK::ru_setup(svc = "https://www.abc.com/v1/projects/18/forms/laLmRDT2.svc",  : 
  unused argument (pp = Sys.getenv("123445656"))

Any help would be greatly appreciated!

Sham

Florian_May · March 15, 2021, 6:21am

Hi @Lal_S !

Thanks for giving ruODK a whirl!
You found a bug indeed (tracked here) - ruODK docs are ahead of its implementation. Sorry, that one slipped past me.

You should be able to supply the passphrase directly to submission_export:

ruODK::submission_export(pp = Sys.getenv("ODKC_PP_FORM1")

It's probably still a good idea to pull the passphrase from an env variable.

Edit: fixed and pushed a patch release. This version should work as intended:

remotes::install_github("ropensci/ruODK@main", dependencies = TRUE)

Note that I've named the default env var for the passphrase ODKC_PP.

Lal_S · March 15, 2021, 5:03pm

Thanks Forian!
I updated the package to version 0.9.8 and now the ruODK::ru_setup can be run with the pp parameter.

However, when I use the ruODK::submission_export(pp = Sys.getenv("ODKC_PP") it downloads the zip file but only exports the non-encrypted submissions. The encrypted submissions are missing. Despite being listed in the submissions list. Code is below.

any thoughts would be greatly appreciated,

ruODK::ru_setup(
  svc = Sys.getenv("ODKC_SVC"), 
  un = Sys.getenv("ODKC_UN"), 
  pw = Sys.getenv("ODKC_PW"),
  pp = Sys.getenv("ODKC_PP"),
  tz = Sys.getenv("RU_TIMEZONE"),
  verbose = TRUE
)


ruODK::submission_list()
ruODK::submission_export(pp = Sys.getenv("ODKC_PP"))
ruODK::submission_export(pp = "abc12345?")

Thalie · March 15, 2021, 10:19pm

@Lal_S: thanks a lot for raising this! You're absolutely right, I tested submission_export.R on a project I have just encrypted and the function does not decrypt the zip as it should.

@Florian_May: I have checked the code and I have found that I should have cast the encryption ID as a string in the body request. I have sent you a pull request on GitHub + still need to work on the rmarkdown vignette I had promised months ago.

Florian_May · March 16, 2021, 5:34am

@Lal_S I've released ruODK v0.9.9 with @Thalie's bugfix and support for skipping media or repeat data.

@Lal_S Does submission_export now decrypt the submissions as you'd expect?
Note: if you've already set up your passphrase through ru_setup(pp = ...), you can omit the pp parameter from submission_export().

@Thalie btw no rush on the vignette.

Lal_S · March 16, 2021, 9:46am

@Florian_May thanks so much for addressing the bug fix so quickly!

This might need testing but, after adding pp to ru_setup() the submission export only fully exports unencrypted rows, whilst for encrypted rows under status in the dataframe it says "not decrypted", after running ruODK::submission_export().

However, when I pass pp in ruODK::submission_export(pp = Sys.getenv("ODKC_PP")) it seems to work perfectly fine. I'm not sure if it's my set up or otherwise, but the workaround is fine for my usage.

@Thalie @Florian_May thank you both!

Florian_May · March 17, 2021, 1:48am

@Lal_S sorry I was undercaffeinated. If you re-install the latest ruODK, submission_export now uses get_default_pp() as passphrase, which you can set via ru_setup(pp=...).

The updated reference page should be online soon.

Code example 1: Set defaults, e.g. to run multiple commands against one form.

ru_setup(
  svc = "https://xxx.svc",
  un = Sys.getenv("ODKC_UN"),
  pw = Sys.getenv("ODKC_PW"),
  pp=Sys.getenv("PASSPHRASE_PROJECT1_FORM2")
)
svc <- odata_service_get()
...
x <- submission_export()

Code example 2: Supply all parameters directly, e.g. in unit tests:

x <- submission_export(
  pid = ...,
  fid = ...,
  url = ...,
  un = ...,
  pw = ...,
  pp = ...
)

Lal_S · March 18, 2021, 10:00pm

@Florian_May thank you works perfectly!