A JSON Option to XLSForms/ODK Forms

I just got an email about a spec from Don Fizachi that I thought would
be super interesting to the JSON-y folks in the community. Anyone out
there looking to collaborate a JSON spec for XLSForms?

··· > At various times since 2014, I was part of, or the lead of a team that > carried out one of the largest data surveys using ODK Collect and an > associated tool. These tools were god sent. A big thank you to the ODK > community for this. > > During my time leading this team there were a number of issues we faced > using these survey tools. Some of these issues, given our use case scenario, > bordered on user friendliness, form integrity, data security, data analysis > etc. And as a result, I had to look for ways to remedy all this. I had a > number of options which included building on top of ODK or designing from > the scratch. I chose the latter for technical reasons. Anyway, as part of > the development process, a new JSON form definition specification was > created in the spirit of XLSForm/ODK Forms. I published this specification > this week. Please find it here: > > http://www.codeproject.com/Articles/1102431/Advanced-JSON-Form-Specification-Chapter-Introdu > > I don’t know how feasible it is, given the strong community built around > XLSForm/ODK Forms, but if the community would like to explore a JSON option > for this, please by all means feel free to appropriate and possibly improve > on this specification as much as is possible.

Hi,

A couple of thoughts

Complex JSON Objects or Arrays can be difficult since you need to go
through multiple layers of objects and arrays until you find the element
you are looking for. JSON Path (http://goessner.net/articles/JsonPath/)
makes this a little easier and would make the spec more like the XML one.
Another observation is that the new XLSXConverter on ODK 2 transforms
directly to json so perhaps a spec is more relevant there than on the
XLSForm conversion. It might be good to clarify for people building new
tools the advantages/disadvantages of each - for example is the intention
both converters continue going forward, will one be deprecated in favor of
the other or they serve different needs so both will be actively developed
and so on. Its a little unclear at least to me

Regards

··· On Thu, Jul 21, 2016 at 12:35 AM, Yaw Anokwa wrote:

I just got an email about a spec from Don Fizachi that I thought would
be super interesting to the JSON-y folks in the community. Anyone out
there looking to collaborate a JSON spec for XLSForms?

At various times since 2014, I was part of, or the lead of a team that
carried out one of the largest data surveys using ODK Collect and an
associated tool. These tools were god sent. A big thank you to the ODK
community for this.

During my time leading this team there were a number of issues we faced
using these survey tools. Some of these issues, given our use case
scenario,
bordered on user friendliness, form integrity, data security, data
analysis
etc. And as a result, I had to look for ways to remedy all this. I had a
number of options which included building on top of ODK or designing from
the scratch. I chose the latter for technical reasons. Anyway, as part of
the development process, a new JSON form definition specification was
created in the spirit of XLSForm/ODK Forms. I published this
specification
this week. Please find it here:

http://www.codeproject.com/Articles/1102431/Advanced-JSON-Form-Specification-Chapter-Introdu

I don’t know how feasible it is, given the strong community built around
XLSForm/ODK Forms, but if the community would like to explore a JSON
option
for this, please by all means feel free to appropriate and possibly
improve
on this specification as much as is possible.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Ian Lawrence and Alex Dorey,

Many thanks for the comments.

@Ian Lawrence
Actually JSON provides a means by which the XPath mechanism is achieved using the “$ref” keyword. I discussed the way this keyword works here: http://www.codeproject.com/Articles/890353/JSON-version-4-to-Csharp-Objects-and-Back-Part-2

@Ian Lawrence, @Alex Dorey

It is important to understand that the intention of this JSON specification is not to replace XLS Forms/ODK Xforms but to make available a JSON option, if it is ever required. And as such, on a technical level, both specifications are not going to be compatible.
To really appreciate the motivation behind this JSON specification one has to look at the whole data collection, transmission, data storage and analysis supply chain from the perspective of data integrity. I’ll attempt to do this in the next couple of paragraphs based on my experience using existing ODK tools or spin-offs for large scale data surveys.

On Data Collection

We noticed that during each survey exercise that a significant amount of data streaming in from the field contained unexpected values. When we investigated we found some of the reasons for this.
For example, when the form contains a text prompt that was compulsory, we found that lazy enumerators where able to bypass this screen by simply typing a blank space in. The damage had already been done but putting a RegEx constraint on text input prompts was enough to remedy this. However requiring RegEx for every text input was a barrier of some sort to our non-technical form designers.
With JSON we were able to make sure that blanks were invalid for compulsory text input prompts. And before a completed form instance is saved it is run against the JSON schema for the widget. If the instance fails the test it is not saved. I do know that some variant of ODK collects have this capability built into the application but for us we got this feature for free from the JSON schema.

On Data Transmission

ODK collect is such that the XML form on the device can be modified using text editors. This makes it easy to remove constraints and enter invalid data into the form. The completed instance itself could be modified after it is saved.
To solve this problem, on the server, we ran every incoming instance against its JSON schema to ensure that modified instances are not accepted. Again, this did not require extra work on our part as the JSON parser took care of this. It will be made clear in the next section how important this server side validation was to us with regards to analysis.

On Data Analysis

One of the features we wanted moving forward was to be able to query and analyze/summarize data stored in the backend without the need for a middle layer. Given the fact that most DBMSs of note now provide more tools to query JSON data than they do for XML, it made sense to store our data in the JSON format. For example, we are able to carry out mathematical analysis such as the computation of standard deviations, variance, correlation, regression etc. in-situ against our data in the database using SQL commands.

Now, with regards to data validation discussed in the previous section, consider a scenario where the variance of a particular numerical form field needs to be computed. Assume that there are a million instances for this form in the database. Also assume that one of these instances was modified on the device such that this numerical field contains a string. This analytical computation will fail and from our experience this failure is not graceful.

We also wanted a feature where analysis could be carried out across multiple forms that share one or more similar fields. For this to be possible, the fields in question not only had to have the same name but also had to be of the same type. We were able to achieve this simply by establishing JSON field equivalence in forms to be queried.

In the Nutshell

Using the inbuilt mechanisms found in JSON helped to assure data integrity and allowed proper analysis.
Putting JSON at the center of our work means that we are able to utilize the same platform agnostic validation techniques of JSON as we launch new data collection services on disparate devices e.g., the Rasberry Pi.

1 Like

****Clarification

I spoke about XPath and JSON “$ref” in my last post from the perspective of the schema that defines the form not for querying the data values in an JSON document. For querying JSON data, we use SQL statements provided by the DBMS. Outside of the DBMS, the JSON Path should suffice, although I have never used it.

Interesting. I put up a discussion in the XLSForm repo
https://github.com/XLSForm/xlsform.github.io/issues/57 several months
back about this kind of thing but nothing much came of it.

We've started using a json format internally in the KoBo tools that mimics
XLSForms wherever possible.

The initial difference that I see between what we're using and the linked
spec is that this new format stores all the settings in the root of the
JSON structure and the questions are in a list called "screenDisplayArray".
From skimming it, it seems like there are a number of changes that would
take some getting used to.

I also don't see a way to convert an "Advanced-JSON-Form" to/from XLSForms
(if it's possible), which I think would be important for understanding and
adopting a new format.

By contrast, in the format we're using a simple survey starts out like
this--

{

"survey": [
{
"type": "text",
"name": "q1",
"label": "Q1 Label"
}
]
}

In KoBo, we've since developed it a bit and have a way to describe
translations and constraints/relevants that compiles to XLSForms and
XPaths. I'd be happy to write up more on the xlsform.github.io repo
https://github.com/XLSForm/xlsform.github.io/issues.

cheers,
-Alex

··· On Fri, Jul 22, 2016 at 9:08 AM, Ian Lawrence wrote:

Hi,

A couple of thoughts

Complex JSON Objects or Arrays can be difficult since you need to go
through multiple layers of objects and arrays until you find the element
you are looking for. JSON Path (http://goessner.net/articles/JsonPath/)
makes this a little easier and would make the spec more like the XML one.
Another observation is that the new XLSXConverter on ODK 2 transforms
directly to json so perhaps a spec is more relevant there than on the
XLSForm conversion. It might be good to clarify for people building new
tools the advantages/disadvantages of each - for example is the intention
both converters continue going forward, will one be deprecated in favor of
the other or they serve different needs so both will be actively developed
and so on. Its a little unclear at least to me

Regards

On Thu, Jul 21, 2016 at 12:35 AM, Yaw Anokwa yanokwa@nafundi.com wrote:

I just got an email about a spec from Don Fizachi that I thought would
be super interesting to the JSON-y folks in the community. Anyone out
there looking to collaborate a JSON spec for XLSForms?

At various times since 2014, I was part of, or the lead of a team that
carried out one of the largest data surveys using ODK Collect and an
associated tool. These tools were god sent. A big thank you to the ODK
community for this.

During my time leading this team there were a number of issues we faced
using these survey tools. Some of these issues, given our use case
scenario,
bordered on user friendliness, form integrity, data security, data
analysis
etc. And as a result, I had to look for ways to remedy all this. I had a
number of options which included building on top of ODK or designing
from
the scratch. I chose the latter for technical reasons. Anyway, as part
of
the development process, a new JSON form definition specification was
created in the spirit of XLSForm/ODK Forms. I published this
specification
this week. Please find it here:

http://www.codeproject.com/Articles/1102431/Advanced-JSON-Form-Specification-Chapter-Introdu

I don’t know how feasible it is, given the strong community built around
XLSForm/ODK Forms, but if the community would like to explore a JSON
option
for this, please by all means feel free to appropriate and possibly
improve
on this specification as much as is possible.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Alex Dorey
415.886.7537

To clarify the differences between the 1.x and 2.0 ODK tools:

The ODK 2.0 tools (e.g., XLSXConverter) are all based upon JSON,
Javascript, CSS and HTML.

They are distinct and independent of the XML-based 1.x tools. We
determined that the limitations of document-based XML was too restrictive
for the user-directed-navigation, data-revision and data-reporting use
cases we are aiming to address with the ODK 2.0 tools.

In ODK 2.0, the submission data is stored in SQLite databases and shared
across devices, with advanced SQL queries available for retrieving data in
a presentation layer using HTML and Javascript. As such, the 2.0 tools do
not use XPath for field referencing. They also do not have repeat groups,
but use linked tables to represent nested (or peer) relationships.

··· ------------- It would seem that Enketo would be the natural reference implementation for anyone developing a JSON or Javascript-based layering on top of the ODK 1.x XML.

W.r.t. limits of ODK Collect and its ability to restrict valid input, this,
again, depends upon the community's willingness, abilities, and interest in
contributing to the common code base.

I can't really fault you on this, as this is exactly why we designed ODK
2.0 to use HTML and Javascript - so it would be easier for survey creators
to customize the prompts and behaviors. If anything, it confirms that this
design decision is the correct one for the ODK 2.0 tools.

On Sat, Jul 23, 2016 at 12:30 AM, don.fizachi@gmail.com wrote:

****Clarification

I spoke about XPath and JSON “$ref” in my last post from the perspective
of the schema that defines the form not for querying the data values in an
JSON document. For querying JSON data, we use SQL statements provided by
the DBMS. Outside of the DBMS, the JSON Path should suffice, although I
have never used it.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com