Run an external script via ODK briefcase

chrissyhroberts · August 1, 2018, 8:50am

Many ODK users are not programmers but need to run downstream analysis on exported CSV files via scripts. It might be nice to include a box in briefcase which points at the location of a script (R, python, BASH etc) that can run downstream scripts at the push of a button.

The benefit of this is avoiding the need for non-programmers to initiate a script via command line, which many people find terrifying.

This could be a great way to automate processes such as turning each line of a csv in to a PDF report and a simple way to allow user community to flexibly integrate other systems like R with ODK briefcase.

yanokwa · August 1, 2018, 7:46pm

Agreed that a post-script would be an interesting addition. Is this a feature someone from LSHTM would be interested in contributing?

chrissyhroberts · August 1, 2018, 9:44pm

we can certainly provide scripts for various simple but useful tasks like making reports in html, pdf or docx format. Currently we have this sort of thing working via R and Pandoc but we could substitute python for R to some degree and add a lot of post-export

I was thinking of this as a generic 'plugin' that users who are not familiar with java could develop for and so contribute to additional functionality

We could have a box in which you add the names of whatever scripts you want to run (i.e. you could add multiple scripts in order you want them to run. Briefcase would need to be smart enough to recognise key file types (.r .py .sh etc) and run correctly cross platform. Then when you click go button it runs these on your CSVs/instances in order and dumps outputs in to some folders within the ODK briefcase storage folder.

Of course users would need to install any necessary dependencies, but could be very useful for helping people to run R scripts/python/etc. in a totally headless manner.

Would need some rules about folders, naming etc, but our team is already working on an R package for this kind of thing. Maybe others could comment on useful/desirable post-export processes that they'd like something like this to do.

ggalmazor · August 6, 2018, 11:26am

I have some concerns about this feature.

Is there any user-friendly OSS program to define and run workflows like the one you're describing? My guts feeling is that this problem has already been solved and I'm worried about half-way reinventing the wheel on this one.
I recognize I have no experience in the field, but since the user still needs to know how to install and configure any third party program (R, Pandoc, Python...), and actually write scripts (or, at least, know how to use them), it doesn't seem like we're saving them much of the hassle by making Briefcase run their scripts.

I also see problems with the implementation of this feature, mainly due to the huge variety of environments we should have to consider. Basically, there's a ton of things we need to consider just to invoke a third party program on the host. This is a non-comprehensive list of stuff we won't ever know about the host's environment beforehand:

Host operating system
All the regional settings
JDK version and vendor
Location of third-party programs binaries
Environment variables required to run those third-party programs.

Any unexpected variation of these things could make it impossible for Briefcase to even start the scripts. On top of this, we need to consider error handling and user feedback, which can be another challenge in itself.

Honestly, I'm worried about biting more than we can chew, especially if we can find an OSS alternative that would solve the sequencing of Briefcase and script invocations (which I'd bet it already exists).

I would like to propose an alternative path, though.

We could build a Docker image with Briefcase and any other third party tools like Python, Pandoc, and R build into it and provide simple commands to run pre-established workflows like the ones @chrissyhroberts is describing.

This would not only support the main feature of running scripts after a Briefcase export. It would also save the users the need for installing and configuring any third-party tool in their machines. Since the Docker environment would be known, we could enforce any required rule about output folder locations, etc.

If required, this Docker image could be developed in a separate repo that the community could work on and collaborate to improve the pre-established workflows.

yanokwa · August 6, 2018, 4:40pm

@ggalmazor Thanks for pointing out some of the challenges! Spoken like someone who's been maintaining the Briefcase codebase

I agree with you that since Briefcase has a CLI, it's already easy to sequence commands with cron or whatever. That said, I think there is value in having a "Run script after export" box in Briefcase that runs one and only one script and that script can do whatever. The value is that it reduces the CLI interactions that a novice user has to do. They can just point Briefcase at a script that some advanced user has put together.

Maybe Docker makes what @chrissyhroberts is proposing easier, but I don't know if the core team should be in the business of packaging Briefcase in a container with Python/Pandoc/R. It feels pretty narrow as a use-case. I could be convinced that Briefcase and only Briefcase in a container is a bit broader use-case and others can build on that.

chrissyhroberts · August 9, 2018, 10:50am

Thanks both for these comments. I understand that this is simple seeming but potentially harder to do in real world.

Docker can be a problem in my experience. It seems like such a great idea but noobs don't know how to use it out of the box, so not much help in real world.

I agree with Yaw that the scripts would be written by one person but used by another who potentially has less computer background. Good example is our work in DRC ebola outbreak where we had to help field workers to install software (easy enough as packaged builds available) and then run scripts (much harder for most people). We ended up using RStudio as a GUI, calling odk briefcase via command line with system calls from R. This basically worked fine, except that briefcase is slower via command line and we still have to get users to run a script to get it to work. In real terms that only really meant pushing the source button, but even seeing code on their screen freaks some people out and there is always the risk that they might change the script as it is open on their console.
Briefcase remembers settings between sessions, so using it as a GUI that could control R or whatever would be far superior.

I'm no programmer really and this is probably naïve, but I would assume that this could be done using the following approach

User tells briefcase what target script to run [x.py / x.R / x.sh] via a box in GUI
Briefcase runs an internal script that
i) detects the OS type (linux/osx/windows)
ii) reads the file extension of the target script to determine what kind of script it is (from a small library of formats like perl, python, R, bash, whatever windows has)
iii) runs a system call to the correct programme based on file extension [python x.py / rscript x.R / ./x.sh]
iv) The target script is then responsible for spawning any downstream system calls (which is up to the developer of those scripts)

So the briefcase part only needs to be responsible for kicking off the downstream scripts as system calls. The system commands for mac/linux and pc might differ slightly and I guess linux flavoured scripts can't be run on windows, but that's not a problem as long as briefcase knows which ones are which.

Chrissy h

Marcelo_Raul_SCARONE · August 12, 2019, 4:26pm

Hi!
Do you have a simple way to convert record on CSV (line by line) to PDF?

chrissyhroberts · September 12, 2019, 12:46pm

Sorry for slow response here.
I posted this a couple of years ago,
It's basic but does the job