Hi fellow ODK-ers,
I'm wondering if this is the right category for this post - hopefully! To start off - I am not a developer. However, I want to try to understand the different possibilities for workflow or information flow of automating as much as possible the information/data management cycle. In my experience with ODK - new users start off with one or two forms for one or two surveys - and it's quite manageable to "humanly" manage pulling data, analysing in excel, throwing a report together, and sharing within your organisation.
BUT - once you get excited by the possibilities that ODK opens up - then you want MORE forms - to the point that then most organisations don't have automated systems to help them manage all their digital data collection - and loads of data then sits around never being used, because humans don't physically have enough time to manually pull, clean, analyze, and visualize all the different data coming in. This is especially true for multi-sectoral teams that are using ODK for daily/regular data collection.
Putting aside the fact that a recommendation could be "don't collect more data than you're going to be able to use" - organisations have to start somewhere. So I was wondering if any of you out there have best practices to share about how you automate some or all of the different parts of ODK data management?
I'm going to share what I know from my own experience first (excuse any incorrect terminology), and would be really happy if others have ideas about this, what tools work best for different steps, or what are "better" ways of doing this:
I'm assuming data is collected with ODK Collect and stored on an ODK Aggregate server hosted on Google Appspot.
1) Automating the data downloads. Included in this: a timer that will automatically pull new data submissions on a set schedule; a script to use ODK Briefcase to pull data so that encrypted submissions can be decrypted; a way of ensuring non-English alphabets are saved in UTF-8 format properly; saving all files in an acceptable format (.csv in my case) to a local or cloud (encrypted) server.
2) Automating data cleaning and data associations. Included in this: making sure labels are substituted for 1s, 2s, 3s, etc; associating "repeat" data with their parent keys; translating data results into multiple languages (based on the ODK form); flagging duplicates.
3) Automating data analysis. Included in this: setting up standard indicators to be calculated based on the collected data (such as age categories if collecting age, or "time to complete questionnaire" based on start/end times).
4) Automating data visualisation (maybe this is combined with #3 above). Included in this: loading clean data and visualizing indicators set up through data analysis; ideally, this visualization is via a URL that can be accessed by appropriate stakeholders through #5.
5) Automating data dissemination. Included in this: having a website that people can log into to view visualized data, ideally with user management capabilities to allow access to different dashboards by different stakeholders; or having a PDF report template set up that inserts new data automatically so that reports can be created automatically.
Previously, I've seen QlikSense Enterprise be used to do pretty much all of the above steps - however, that can be quite a significant investment for smaller organisations. Does anyone have suggestions for doing the above automations with free or close-to-free tools?
A couple use cases I'm interested in:
- Is there a way to automate this process to show data in a PHP/mySQL website dashboard?
- Could this be set up with PowerBI dashboards?
- I know next to nothing about Google Fusion Tables, which are mentioned in the documentation. Would this be a good solution?
Some related threads:
https://docs.opendatakit.org/aggregate-data-access/#publish-data
https://forum.getodk.org/search?q=php
https://forum.getodk.org/t/dashboard-recommendations-for-odk-data-and-connecting-to-dhis2
Okay, this is now too long. Thanks for your ideas!
Janna