ODK Central clarifications required

LN · October 2, 2020, 4:11am

We’ve tried to be clear about what we see as the bounds of Central in What's coming in Central. I think it would be helpful to have some of this captured more prominently in the documentation and will make sure it gets added.

Central’s future direction can absolutely be influenced and we appreciate all feedback. But ultimately, it’s concrete financial or software development contributions that will have the biggest impact, especially where big changes in approach are desired. We fund the development of ODK tools through a mix of contract work and direct feature sponsorship. If more organizations that use or even sell ODK tools made investments in the platform, we could certainly do more than we can today.

Our primary focus on Central has been to provide rich form and user management features to support secure data collection at scale even on modest server hardware. A big priority has been to handle large volumes of incoming submissions quickly without risk of data corruption.

Our goal for data analysis has been to help users connect to systems that are specifically intended for analyzing data. We do this through fast CSV exports and the OData feed. You say "Central seriously looks handicapped as a dashboard" and indeed, it does not intend to be a dashboard.

One of the early decisions we made was to store incoming form submissions as XML blobs rather than splitting them into database tables as Aggregate does. This is a decision that we did not make lightly. It has helped our small team make quicker progress and has ensured speed and stability as submissions come in. We learned from Aggregate and other systems that splitting records is a big source of code complexity, bugs, and performance bottlenecks.

The tradeoff is that this limits the analysis that can be done directly on Central -- the data is not organized for any kind of fast operations across the dataset. Additional implications are that directly connecting to the Central database for analysis is not practical and that we don't provide a performant API for open-ended data querying. As I said previously, what we learned from Aggregate is that most people need to rely on external tools for analysis anyway. The OData feed makes all of the above possible with live-updating data.

What it sounds like you have been doing with Aggregate is using it as an entity repository. That is, the data collection you’re involved in is more about building registers of entities than about producing an analysis artifact. You want to be able to look up specific entities either geographically or filtered by some criteria. This is a completely valid use case, and what we’ve done for folks with that need is set up an Excel or PowerBI project with live-updating views on the data. Another great option that requires a little bit of R knowledge would be to provide a Shiny app. It sounds like that may not be practical for you and if Aggregate continues to do what you need it to, then you may not need to switch!

One area we could certainly improve is in having more explicit guides on how to set up common kinds of analysis or querying pipelines. There are some good examples shared in the Showcase but they aren’t incorporated in the documentation. The development team aims to provide complete documentation but we have focused more on software development than writing detailed guides because documentation is an area where community members could participate. There’s generally a lot of opportunity for community members to have high impact in analysis (shoutout to @Florian_May and ruODK).

As we explore more managed workflows for entity-based data collection, it is possible that we will introduce an entity concept that is more richly queryable. However, this is unlikely to become our immediate focus because it is an entirely new area of work. Additionally, what may look like simple functionality can be complex or computationally expensive to do on large datasets. We’d like to first strengthen what can be done with web-based forms (e.g. submission edits), improve and enrich the user and permissions model, and make sure the features we already have are polished.