How to Automate ODK Data into Visualization Dashboards

What tools do you usually use for data cleaning? Especially once I start trying to do things like merge repeat group CSV files, I find that doing the data cleaning in Excel starts getting more complicated and less reproducible. I find that statistical software like R, Stata, and SPSS are sometimes even more helpful for data cleaning than for data analysis. If you happen to be using Stata (which is not free), I've written a Stata program named odkmeta that imports and cleans data exported from Briefcase.

I think this sort of automation is a very interesting idea. I think ODK has the potential to enable it, since you can use the form to glean metadata about each field. For example, you can use the form to automatically determine the names of the fields used for start and end times.

In between basic data cleaning and data analysis, I think there is also a lot of potential for automated data checking. For example, some projects check the quality of their incoming data on a regular basis. I know that Innovations for Poverty Action has a system for implementing high-frequency checks of incoming data, which they describe on GitHub here. However, again that's a Stata-based workflow.

Power BI seems to work well for a number of workflows. If you're interested in Power BI, you may be interested in ODK Central, a new ODK 1 server for which we recently released a beta version. ODK Central supports OData out of the box, which means that you can use it immediately with Power BI, Tableau, and other tools.

In general, automation gets easier when the server offers an API. For example, one suggestion in the thread you mentioned was to use a combination of an API and Shiny/R. Aggregate has API options, and Central has a REST API.

3 Likes