External Data proposal (XForm Spec)

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as far
as I can see. They build on the valuable work done by SurveyCTO to provide
this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source of
the external data and it cannot be queried using XPath, the logic language
of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec:
https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by Dimagi
because this specification is elegant, extensible and inline with existing
ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

··· -

the location of the external resource is listed in the xformsManifest
(similar to the currently used jr://images prefix for media files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the client.
E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the case
currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet, that can
contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to the
client to deal with the missing in old non-compliant XForms for
backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and external
choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the existing
choice-filter (in XLSForms terminology) to create nodesets from external
data. It’s up to the client whether to convert this internally to a db
query or to handle it in pure XPath on XML. For users this will mean that
the difference between external and internal choice-lists is minimal.
Creating e.g. cascade logic will be done in exactly the same way for both.

For historical reasons, I think we could use the current search() function
inside a predicate (choice-filter), where the first parameter would become
obsolete.

However, I’d much prefer using the following separate (and much simplified,
more user-friendly) XPath functions instead (2 of these are native XPath
1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

Hi Martijn,

Pretty sure this already exists with fast/external itemsets. Have you
taken a look at that feature? If so, it'd be good to understand the
benefit of your approach.

https://code.google.com/p/opendatakit/source/detail?r=03976f9012b55556861a0a28b555a65b7ba60f3f&repo=collect

Yaw

··· -- Need ODK services? http://nafundi.com provides form design, server setup, professional support, and software development for ODK.

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt martijn@enketo.org wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as far as I can see. They build on the valuable work done by SurveyCTO to provide this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source of the external data and it cannot be queried using XPath, the logic language of our beloved form format.

We may want to store (large) external data in a database, use the CSV format, and query the client database directly for performance reasons. This proposal allows a client to keep doing that. It doesn’t describe the underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this existing spec: https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by Dimagi because this specification is elegant, extensible and inline with existing ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the xformsManifest (similar to the currently used jr://images prefix for media files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the case currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at the csv extension to give the correct src prefix (because in the future we may want to add other formats) @Mitch is this perhaps what you are hinting at the end of this blog post?

or, add an ‘external data’ column, e.g. in the settings sheet, that can contain multiple sources (ie. filenames with the .csv extension in this case).

Existing forms using pulldata() can remain fully functional. It’s up to the client to deal with the missing in old non-compliant XForms for backwards compatibility sake. The commitment with adopting this new approach is simply to start requiring the new method for new XForms and for new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the existing choice-filter (in XLSForms terminology) to create nodesets from external data. It’s up to the client whether to convert this internally to a db query or to handle it in pure XPath on XML. For users this will mean that the difference between external and internal choice-lists is minimal. Creating e.g. cascade logic will be done in exactly the same way for both.

For historical reasons, I think we could use the current search() function inside a predicate (choice-filter), where the first parameter would become obsolete.

However, I’d much prefer using the following separate (and much simplified, more user-friendly) XPath functions instead (2 of these are native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various approaches
exceed the costs; and practically, it's important that somebody (or some
people) be willing to bear the cost of implementation. If Nafundi's
fast-itemset implementation is already very close to meeting these proposed
requirements, then the cost of implementing your proposal may be relatively
low. That would be great. If the cost becomes high, though, then somebody
would fundraise to support the implementation? (Please forgive my
open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

··· On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as far
as I can see. They build on the valuable work done by SurveyCTO to provide
this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source of
the external data and it cannot be queried using XPath, the logic language
of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec:
https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by Dimagi
because this specification is elegant, extensible and inline with existing
ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the xformsManifest
(similar to the currently used jr://images prefix for media files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the client.
E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the case
currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet, that
can contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and external
choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the existing
choice-filter (in XLSForms terminology) to create nodesets from external
data. It’s up to the client whether to convert this internally to a db
query or to handle it in pure XPath on XML. For users this will mean that
the difference between external and internal choice-lists is minimal.
Creating e.g. cascade logic will be done in exactly the same way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

2 posts were split to a new topic: External select not cascading

Thanks Yaw,

Those links are quite interesting and totally new to me (but it explains
the note in the last ODK blog post). I'm a bit puzzled about what's going
on in that Pyxform pull request because of the markup it creates. I see
that a select_one external <https://github.com/SEL-Columbia/pyxform/blob/master/pyxform/tests/example_xls/select_one_external.xlsx>
produces this output (in Ona) (note the brand new query attribute, and the
input element with type=string instead of a select1):

If anybody can shed light on what's going on here (before this becomes
functional please), that would be great.

The proposal I posted aims to address these technical XForm issues with the
current external data functionality using csv files:

  1. the data source is not defined in the form format anywhere
  2. the external data cannot be queried/accessed with XPath like other
    data in our XForm
  3. the function pulldata() is not an XPath function. The function
    queries data that is not present in the XForm.
  4. search() is not an XPath function
  5. search() is an appearance
  6. the way search() works with defining the column names as item/label
    and item/value (textContent) is too much of a hack
  7. the search() function's impressive flexibility makes it quite
    complex, I think it’s better to split this up into separate functions (this
    choice was probably related to using appearance)
  8. creating choice lists from external and internal data is done in a
    completely different way - one uses a search() function inside an
    appearance, the other a choice-filter (XPath predicate)

Please let me know if any of these issues are incorrect or if anybody
thinks there is a better way to address them to ensure the long-term health
of our beloved XForm format.

Cheers, Martijn

··· On Sat, Sep 6, 2014 at 2:59 PM, Yaw Anokwa wrote:

Hi Martijn,

Pretty sure this already exists with fast/external itemsets. Have you
taken a look at that feature? If so, it'd be good to understand the
benefit of your approach.

https://code.google.com/p/opendatakit/source/detail?r=03976f9012b55556861a0a28b555a65b7ba60f3f&repo=collect

https://github.com/SEL-Columbia/pyxform/pull/120

Yaw

Need ODK services? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt martijn@enketo.org wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as
far as I can see. They build on the valuable work done by SurveyCTO to
provide this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source
of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec:
https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by Dimagi
because this specification is elegant, extensible and inline with existing
ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the xformsManifest
(similar to the currently used jr://images prefix for media files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the
case currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post?

or, add an ‘external data’ column, e.g. in the settings sheet, that can
contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will mean
that the difference between external and internal choice-lists is minimal.
Creating e.g. cascade logic will be done in exactly the same way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Did you know that Enketo Smart Paper has now become the #1 tool for data
collection? Don't fall behind. Use it!

Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR | Twitter https://twitter.com/enketo

(tried to post this before, but probably got caught in spam filter)

Thanks Yaw,

Those links are quite interesting and totally new to me (but it explains
the note in the last ODK blog post). I'm a bit puzzled about what's going
on in that Pyxform pull request because of the markup it creates. I see
that a select_one external <https://github.com/SEL-Columbia/pyxform/blob/master/pyxform/tests/example_xls/select_one_external.xlsx>
produces this output (in Ona) (note the brand new query attribute, and the
input element with type=string instead of a select1):

[input query="instance('counties')/root/item[state=
/select_one_external/state ]" ref="/select_one_external/county"]

If anybody can shed light on what's going on here (before this becomes
functional please), that would be great.

The proposal I posted aims to address these technical XForm issues with the
current external data functionality using csv files:

  1. the data source is not defined in the form format anywhere
  2. the external data cannot be queried/accessed with XPath like other
    data in our XForm
  3. the function pulldata() is not an XPath function. The function
    queries data that is not present in the XForm.
  4. search() is not an XPath function
  5. search() is an appearance
  6. the way search() works with defining the column names as item/label
    and item/value (textContent) is too much of a hack
  7. the search() function's impressive flexibility makes it quite
    complex, I think it’s better to split this up into separate functions (this
    choice was probably related to using appearance)
  8. creating choice lists from external and internal data is done in a
    completely different way - one uses a search() function inside an
    appearance, the other a choice-filter (XPath predicate)

Please let me know if any of these issues are incorrect or if anybody
thinks there is a better way to address them to ensure the long-term health
of our beloved XForm format.

Cheers, Martijn

··· On Saturday, September 6, 2014 2:59:43 PM UTC-6, Yaw Anokwa wrote: > > Hi Martijn, > > Pretty sure this already exists with fast/external itemsets. Have you > taken a look at that feature? If so, it'd be good to understand the > benefit of your approach. > > > https://code.google.com/p/opendatakit/source/detail?r=03976f9012b55556861a0a28b555a65b7ba60f3f&repo=collect > > https://github.com/SEL-Columbia/pyxform/pull/120 > > Yaw > -- > Need ODK services? http://nafundi.com provides form design, server > setup, professional support, and software development for ODK. > > On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt <mar...@enketo.org > wrote: > > > > Hi ODK XForm enthusiasts, > > > > > > I would like to propose to tweak the way external data is added to ODK > XForms and the way dynamic choice lists are generated from external data. > > > > > > These changes will not need to break existing forms in ODK Collect as > far as I can see. They build on the valuable work done by SurveyCTO to > provide this very useful functionality. > > > > 1. Adding External Data > > > > > > In the currently supported method, the XForm doesn’t provide the source > of the external data and it cannot be queried using XPath, the logic > language of our beloved form format. > > > > > > We may want to store (large) external data in a database, use the CSV > format, and query the client database directly for performance reasons. > This proposal allows a client to keep doing that. It doesn’t describe the > underlying implementation, just the XForm format. > > > > > > This proposal is also nothing new. It simply adds a csv source to this > existing spec: > https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by Dimagi > because this specification is elegant, extensible and inline with existing > ODK XForm functionality. > > > > > > Adding a CSV file to a form could be done by simply adding a secondary > instance element with a src attribute: > > > > > > > > > > > > The jr://file-csv/ prefix indicates that: > > > > the location of the external resource is listed in the xformsManifest > (similar to the currently used jr://images prefix for media files) > > > > the data has the CSV format > > > > > > It is assumed that this (virtually) creates an XML instance in the > client. E.g. jr://file-csv/fruits.csv > > > > > > > > is interpreted as: > > > > > > > > > > > > > > mango > > > > Mango > > > > > > > > … > > > > > > > > papaya > > > > Papaya > > > > > > > > > > > > > > If a ‘sortby’ column is present, the items will be sorted, as is the > case currently. > > > > > > The instance can be queried in normal supported XPath, like: > > > > > > instance(‘fruits’)/item[value=’papaya’]/label => Papaya > > > > > > or with the existing pulldata() function: > > > > > > pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya > > > > > > Changes in XLSForm: > > > > > > We’d need a way to output the external source. Some options: > > > > the type “select_one external fruits.csv” outputs the instance with > src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at > the csv extension to give the correct src prefix (because in the future we > may want to add other formats) @Mitch is this perhaps what you are hinting > at the end of this blog post? > > > > or, add an ‘external data’ column, e.g. in the settings sheet, that can > contain multiple sources (ie. filenames with the .csv extension in this > case). > > > > > > Existing forms using pulldata() can remain fully functional. It’s up to > the client to deal with the missing in old non-compliant XForms > for backwards compatibility sake. The commitment with adopting this new > approach is simply to start requiring the new method for new XForms and for > new XForms to work as described in the new specification. > > > > > > The one thing you cannot do with the new method is mix static and > external choices. > > > > > > 2. Creating dynamic select choice lists from external data > > > > > > Implementing #1 will also mean that users will be able to use the > existing choice-filter (in XLSForms terminology) to create nodesets from > external data. It’s up to the client whether to convert this internally to > a db query or to handle it in pure XPath on XML. For users this will mean > that the difference between external and internal choice-lists is minimal. > Creating e.g. cascade logic will be done in exactly the same way for both. > > > > > > For historical reasons, I think we could use the current search() > function inside a predicate (choice-filter), where the first parameter > would become obsolete. > > > > > > However, I’d much prefer using the following separate (and much > simplified, more user-friendly) XPath functions instead (2 of these are > native XPath 1.0 functions): > > > > instance(‘fruits’)/item[starts-with(value, ‘p’)] > > > > instance(‘fruits’)/item[ends-with(value, ‘a’)] > > > > instance(‘fruits’)/item[contains(value, ‘y’)] > > > > > > In XLSForm: > > > > > > > > What do you think? > > > > > > Martijn > > > > -- > > You received this message because you are subscribed to the Google > Groups "ODK Developers" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to opendatakit-developers+unsubscribe@googlegroups.com > . > > For more options, visit https://groups.google.com/d/optout. >

Martijn,

Chris' point about performance and cost is why external itemsets are built
the way they are. JavaRosa requires that select one options be loaded into
memory. Further, you can't save a select one option that did not come from
one of those loaded in memory.

These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete was
removed from Collect (
https://code.google.com/p/opendatakit/issues/detail?id=289).

Agreed that none of this is ideal. The solution is to rewrite the core, but
that will require a lot of effort without a lot of visible benefit to the
user.

Yaw

··· -- Need ODK services? http://nafundi.com provides form design, server setup, professional support, and software development for ODK.

On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert crobert@surveycto.com wrote:

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt martijn@enketo.org wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as far
as I can see. They build on the valuable work done by SurveyCTO to provide
this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source
of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec:
https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by Dimagi
because this specification is elegant, extensible and inline with existing
ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the xformsManifest
(similar to the currently used jr://images prefix for media files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the case
currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet, that
can contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will
mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the same
way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Chris, Yaw,

I see some real opportunity here to agree on a common and solid approach
for external data. I'd like to jump in on the performance issue first.

After the explanation, I imagine fast itemsets work something like this
(please correct me if I'm wrong):

if select(1) with itemset
parse as regular itemset
if input with type=text and query attribute
parse as fast itemset

If the above is more or less the case, would it be possible to change this
to:

if select(1) with itemset
if external data from csv
parse as fast itemset
else
parse as *regular *itemset

Advantages of the latter:

  • XForms syntax remains clean and correct, i.e. a select and select1 remain
    what they are, and so do itemsets
  • No need for additional query attribute

Cheers,
Martijn

P.S. If any Dimagi devs see this, it would be very useful if you could
share your experience on making large itemsets faster if you have done this.

··· On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote:

Martijn,

Chris' point about performance and cost is why external itemsets are built
the way they are. JavaRosa requires that select one options be loaded into
memory. Further, you can't save a select one option that did not come from
one of those loaded in memory.

These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete
was removed from Collect (
https://code.google.com/p/opendatakit/issues/detail?id=289).

Agreed that none of this is ideal. The solution is to rewrite the core,
but that will require a lot of effort without a lot of visible benefit to
the user.

Yaw

Need ODK services? http://nafundi.com provides form design, server setup,
professional support, and software development for ODK.

On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert <crobert@surveycto.com wrote:

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt <martijn@enketo.org wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as
far as I can see. They build on the valuable work done by SurveyCTO to
provide this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source
of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec:
https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by
Dimagi because this specification is elegant, extensible and inline with
existing ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the
xformsManifest (similar to the currently used jr://images prefix for media
files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the
case currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet, that
can contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will
mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the same
way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Did you know that Enketo Smart Paper has now become the #1 tool for data
collection? Don't fall behind. Use it!

Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR | Twitter https://twitter.com/enketo

Hey Martijn,

You're close, but the external vs. regular itemsets is more like this:

if select(1) with itemset
let javarosa figure out what to display, then display Collect's select1
widget with data from javarosa
else if input with type==text and query attribute
don't let javarosa do anything. Display an external-Itemset widget in
Collect, handle all of the db lookups, etc... and only use javarosa to save
the selected answer to the data model in memory.

The reason we used text instead of select(1) is that javarosa doesn't allow
saving values in a select1 node that are not included in that select(1)'s
list of values. Since we're doing all of the processing externally, the
select(1)'s list is empty, and will never save anything.

I believe javarosa's implementation is called a "closed" select1, and there
was some talk of adding the option of "open" select1s at some point, but I
don't think anything happened beyond discussion.

Hope that helps,
-Carl

··· On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: > > Thanks Chris, Yaw, > > I see some real opportunity here to agree on a common and solid approach > for external data. I'd like to jump in on the performance issue first. > > After the explanation, I imagine fast itemsets work something like this > (please correct me if I'm wrong): > > if select(1) with itemset > parse as *regular* itemset > if input with type=text and query attribute > parse as *fast* itemset > > If the above is more or less the case, would it be possible to change this > to: > > if select(1) with itemset > if external data from csv > parse as *fast* itemset > else > parse as *regular *itemset > > Advantages of the latter: > - XForms syntax remains clean and correct, i.e. a select and select1 > remain what they are, and so do itemsets > - No need for additional query attribute > > Cheers, > Martijn > > P.S. If any Dimagi devs see this, it would be very useful if you could > share your experience on making large itemsets faster if you have done this. > > On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote: > >> Martijn, >> >> Chris' point about performance and cost is why external itemsets are >> built the way they are. JavaRosa requires that select one options be loaded >> into memory. Further, you can't save a select one option that did not come >> from one of those loaded in memory. >> >> These constraints are why external itemsets overload text input ( >> https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete >> was removed from Collect ( >> https://code.google.com/p/opendatakit/issues/detail?id=289). >> >> Agreed that none of this is ideal. The solution is to rewrite the core, >> but that will require a lot of effort without a lot of visible benefit to >> the user. >> >> Yaw >> -- >> Need ODK services? http://nafundi.com provides form design, server >> setup, professional support, and software development for ODK. >> >> On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < crobert@surveycto.com> wrote: >> >>> Hi Martijn, >>> >>> This looks neat. Thanks for putting in the work on it. >>> >>> As Yaw suggests, it sounds similar to Nafundi's "fast itemset" >>> contribution. We hadn't made use of that ourselves because our users author >>> in Excel or Google Drive, and because we lacked the python expertise to >>> make the relatively significant pyxform changes necessary to make it >>> accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, >>> but our efforts to minimize changes to pyxform are particularly extreme >>> given that we're a Java development team. Nathan's pyxform contribution, >>> though, would seem to bring fast itemsets into the pyxform fold, which is >>> fantastic (this is the contribution linked by Yaw in this thread). >>> >>> My primary concerns with this spec are practical in nature: >>> >>> *1. Performance. *JavaRosa's handling of XForm-based choice lists (and >>> filtering) performs such that forms on fast, modern devices cannot >>> practically exceed 1,000-2,000 choices in total. Since a great many of our >>> users exceed those limits, we needed something that would perform >>> significantly better. Obviously, Nafundi's contribution grew from the same >>> need, and they hewed much closer to the XForm spirit (and maybe even the >>> specification). Still, I wonder if meeting all of the elements of your >>> proposed specification (namely the ability to reference and filter via >>> XPath expressions) would lead to an in-memory representation that would >>> suffer the same practical limits as originated the need for something >>> faster. Perhaps some kind of SQLite-based (as opposed to memory-based) >>> method of storing and accessing items can be devised that meets the various >>> requirements you propose ("artificial instances" in the Dimagi spec's >>> language) -- but I'm guessing that it would be pretty hard. (I could be >>> wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already >>> meets the key requirements while still performing well with many thousands >>> of choice items?) >>> >>> *2. Cost.* Socially, it's important that the benefits of various >>> approaches exceed the costs; and practically, it's important that somebody >>> (or some people) be willing to bear the cost of implementation. If >>> Nafundi's fast-itemset implementation is already very close to meeting >>> these proposed requirements, then the cost of implementing your proposal >>> may be relatively low. That would be great. If the cost becomes high, >>> though, then somebody would fundraise to support the implementation? >>> (Please forgive my open-source ignorance here.) >>> >>> My secondary concerns regard the user experience: >>> >>> 1. It would be a bit of a shame to lose the ability to mix static and >>> dynamic choices. Lots of SurveyCTO users use this, and the obvious >>> work-around (include all static choices in the dynamic .csv file) is >>> awkward for a great many cases. For example, say you pull a list of people >>> from a .csv file and then want to include a "Don't know" or "Not listed" >>> option: you could obviously add those static options to the .csv, but it >>> would mean that you couldn't just directly dump your list of people from a >>> database or from another form... so the experience for the user would >>> become much more complicated. I dare say that the potential user-experience >>> pay-off might warrant devising a method to mix static and dynamic options. >>> >>> 2. It's not clear to me how to easily match or beat pulldata()'s >>> ease-of-use (such as it is!) in the pyxform context. Mitch has proposed >>> various extensions to the (perhaps over-)simplistic ${fieldname} >>> referencing scheme, and it seems that we'd need to work referencing of >>> external instances into one of those extensions. That should be possible, >>> but it might involve (much) greater surgery to pyxform. Somehow, it needs >>> to be easy for spreadsheet-based users to grab a cell from one of these >>> external data sources. (Or pulldata() remains the go-to method for >>> spreadsheet-based users... in which case it ought to be documented as part >>> of the spec.) >>> >>> >>> Thanks again, >>> >>> Chris >>> >>> >>> On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt < martijn@enketo.org> wrote: >>> >>>> Hi ODK XForm enthusiasts, >>>> >>>> I would like to propose to tweak the way external data is added to ODK >>>> XForms and the way dynamic choice lists are generated from external data. >>>> >>>> These changes will not need to break existing forms in ODK Collect as >>>> far as I can see. They build on the valuable work done by SurveyCTO to >>>> provide this very useful functionality. >>>> 1. Adding External Data >>>> >>>> In the currently supported method, the XForm doesn’t provide the source >>>> of the external data and it cannot be queried using XPath, the logic >>>> language of our beloved form format. >>>> >>>> >>>> We may want to store (large) external data in a database, use the CSV >>>> format, and query the client database directly for performance reasons. >>>> This proposal allows a client to keep doing that. It doesn’t describe the >>>> underlying implementation, just the XForm format. >>>> >>>> This proposal is also nothing new. It simply adds a csv source to this >>>> existing spec: >>>> https://bitbucket.org/javarosa/javarosa/wiki/externalinstances by >>>> Dimagi because this specification is elegant, extensible and inline with >>>> existing ODK XForm functionality. >>>> >>>> Adding a CSV file to a form could be done by simply adding a secondary >>>> instance element with a src attribute: >>>> >>>> >>>> >>>> The jr://file-csv/ prefix indicates that: >>>> >>>> - >>>> >>>> the location of the external resource is listed in the >>>> xformsManifest (similar to the currently used jr://images prefix for media >>>> files) >>>> - >>>> >>>> the data has the CSV format >>>> >>>> >>>> It is assumed that this (virtually) creates an XML instance in the >>>> client. E.g. jr://file-csv/fruits.csv >>>> >>>> >>>> is interpreted as: >>>> >>>> >>>> >>>> >>>> >>>> mango >>>> >>>> Mango >>>> >>>> >>>> >>>> … >>>> >>>> >>>> >>>> papaya >>>> >>>> Papaya >>>> >>>> >>>> >>>> >>>> >>>> If a ‘sortby’ column is present, the items will be sorted, as is the >>>> case currently. >>>> >>>> The instance can be queried in normal supported XPath, like: >>>> >>>> instance(‘fruits’)/item[value=’papaya’]/label => Papaya >>>> >>>> or with the existing pulldata() function: >>>> >>>> pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya >>>> >>>> Changes in XLSForm: >>>> >>>> We’d need a way to output the external source. Some options: >>>> >>>> - >>>> >>>> the type “select_one external fruits.csv” outputs the instance with >>>> src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at >>>> the csv extension to give the correct src prefix (because in the future we >>>> may want to add other formats) @Mitch is this perhaps what you are hinting >>>> at the end of this blog post >>>> ? >>>> - >>>> >>>> or, add an ‘external data’ column, e.g. in the settings sheet, that >>>> can contain multiple sources (ie. filenames with the .csv extension in this >>>> case). >>>> >>>> >>>> Existing forms using pulldata() can remain fully functional. It’s up >>>> to the client to deal with the missing in old non-compliant >>>> XForms for backwards compatibility sake. The commitment with adopting this >>>> new approach is simply to start requiring the new method for new XForms and >>>> for new XForms to work as described in the new specification. >>>> >>>> The one thing you cannot do with the new method is mix static and >>>> external choices. >>>> >>>> 2. Creating dynamic select choice lists from external data >>>> >>>> Implementing #1 will also mean that users will be able to use the >>>> existing choice-filter (in XLSForms terminology) to create nodesets from >>>> external data. It’s up to the client whether to convert this internally to >>>> a db query or to handle it in pure XPath on XML. For users this will >>>> mean that the difference between external and internal choice-lists is >>>> minimal. Creating e.g. cascade logic will be done in exactly the same >>>> way for both. >>>> >>>> For historical reasons, I think we could use the current search() >>>> function inside a predicate (choice-filter), where the first parameter >>>> would become obsolete. >>>> >>>> However, I’d much prefer using the following separate (and much >>>> simplified, more user-friendly) XPath functions instead (2 of these are >>>> native XPath 1.0 functions): >>>> >>>> - >>>> >>>> instance(‘fruits’)/item[starts-with(value, ‘p’)] >>>> - >>>> >>>> instance(‘fruits’)/item[ends-with(value, ‘a’)] >>>> - >>>> >>>> instance(‘fruits’)/item[contains(value, ‘y’)] >>>> >>>> >>>> In XLSForm: >>>> >>>> >>>> What do you think? >>>> >>>> Martijn >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "ODK Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to opendatakit-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "ODK Developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to opendatakit-developers+unsubscribe@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "ODK Developers" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> opendatakit-developers+unsubscribe@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > *Did you know that Enketo Smart Paper has now become the #1 tool for data > collection? Don't fall behind. Use it!* > > Enketo | LinkedIn > | GitHub > | Twitter >

Hi all,

And the SurveyCTO-contributed "dynamic search-and-select" feature acts as a
kind of hybrid. It loads a dynamic list into memory so that JavaRosa will
respect it. But it only loads a filtered subset of the on-desk DB into
JavaRosa for all of the performance reasons already discussed. If you end
up with a extremely long list presented to the user, it will still slow
down (and eat loads of memory), unlike the Nafundi-contributed "fast
itemset" implementation.

Best,

Chris

··· On Mon, Sep 15, 2014 at 11:48 AM, chartung wrote:

Hey Martijn,

You're close, but the external vs. regular itemsets is more like this:

if select(1) with itemset
let javarosa figure out what to display, then display Collect's select1
widget with data from javarosa
else if input with type==text and query attribute
don't let javarosa do anything. Display an external-Itemset widget in
Collect, handle all of the db lookups, etc... and only use javarosa to save
the selected answer to the data model in memory.

The reason we used text instead of select(1) is that javarosa doesn't
allow saving values in a select1 node that are not included in that
select(1)'s list of values. Since we're doing all of the processing
externally, the select(1)'s list is empty, and will never save anything.

I believe javarosa's implementation is called a "closed" select1, and
there was some talk of adding the option of "open" select1s at some point,
but I don't think anything happened beyond discussion.

Hope that helps,
-Carl

On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote:

Thanks Chris, Yaw,

I see some real opportunity here to agree on a common and solid approach
for external data. I'd like to jump in on the performance issue first.

After the explanation, I imagine fast itemsets work something like this
(please correct me if I'm wrong):

if select(1) with itemset
parse as regular itemset
if input with type=text and query attribute
parse as fast itemset

If the above is more or less the case, would it be possible to change
this to:

if select(1) with itemset
if external data from csv
parse as fast itemset
else
parse as *regular *itemset

Advantages of the latter:

  • XForms syntax remains clean and correct, i.e. a select and select1
    remain what they are, and so do itemsets
  • No need for additional query attribute

Cheers,
Martijn

P.S. If any Dimagi devs see this, it would be very useful if you could
share your experience on making large itemsets faster if you have done this.

On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa yanokwa@nafundi.com wrote:

Martijn,

Chris' point about performance and cost is why external itemsets are
built the way they are. JavaRosa requires that select one options be loaded
into memory. Further, you can't save a select one option that did not come
from one of those loaded in memory.

These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete
was removed from Collect (https://code.google.com/p/
opendatakit/issues/detail?id=289).

Agreed that none of this is ideal. The solution is to rewrite the core,
but that will require a lot of effort without a lot of visible benefit to
the user.

Yaw

Need ODK services? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < crobert@surveycto.com> wrote:

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt < martijn@enketo.org> wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as
far as I can see. They build on the valuable work done by SurveyCTO to
provide this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the
source of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec: https://bitbucket.org/javarosa/javarosa/wiki/
externalinstances by Dimagi because this specification is elegant,
extensible and inline with existing ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the
xformsManifest (similar to the currently used jr://images prefix for media
files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the
case currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance
with src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks
at the csv extension to give the correct src prefix (because in the future
we may want to add other formats) @Mitch is this perhaps what you are
hinting at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet,
that can contain multiple sources (ie. filenames with the .csv extension in
this case).

Existing forms using pulldata() can remain fully functional. It’s up
to the client to deal with the missing in old non-compliant
XForms for backwards compatibility sake. The commitment with adopting this
new approach is simply to start requiring the new method for new XForms and
for new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will
mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the same
way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/opendatakit-developers/9_VZoe7crVY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Did you know that Enketo Smart Paper has now become the #1 tool for data
collection? Don't fall behind. Use it!

Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR | Twitter
https://twitter.com/enketo

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for this Carl. I think I understand now and I don't envy you for
having to make these maneuvers to avoid touching JavaRosa.

If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets /
external data using a clean XForm syntax instead, what would it take on the
JavaRosa side? Would an external contribution to make select a select1
'open', be welcomed and be sufficient? Anything else?

Martijn

··· On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote: > > Hi all, > > And the SurveyCTO-contributed "dynamic search-and-select" feature acts as > a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will > respect it. But it only loads a filtered subset of the on-desk DB into > JavaRosa for all of the performance reasons already discussed. If you end > up with a extremely long list presented to the user, it will still slow > down (and eat loads of memory), unlike the Nafundi-contributed "fast > itemset" implementation. > > Best, > > Chris > > > On Mon, Sep 15, 2014 at 11:48 AM, chartung <char...@nafundi.com > wrote: > >> Hey Martijn, >> >> You're close, but the external vs. regular itemsets is more like this: >> >> if select(1) with itemset >> let javarosa figure out what to display, then display Collect's select1 >> widget with data from javarosa >> else if input with type==text and query attribute >> don't let javarosa do anything. Display an external-Itemset widget in >> Collect, handle all of the db lookups, etc... and only use javarosa to save >> the selected answer to the data model in memory. >> >> The reason we used text instead of select(1) is that javarosa doesn't >> allow saving values in a select1 node that are not included in that >> select(1)'s list of values. Since we're doing all of the processing >> externally, the select(1)'s list is empty, and will never save anything. >> >> I believe javarosa's implementation is called a "closed" select1, and >> there was some talk of adding the option of "open" select1s at some point, >> but I don't think anything happened beyond discussion. >> >> Hope that helps, >> -Carl >> >> >> On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: >>> >>> Thanks Chris, Yaw, >>> >>> I see some real opportunity here to agree on a common and solid approach >>> for external data. I'd like to jump in on the performance issue first. >>> >>> After the explanation, I imagine fast itemsets work something like this >>> (please correct me if I'm wrong): >>> >>> if select(1) with itemset >>> parse as *regular* itemset >>> if input with type=text and query attribute >>> parse as *fast* itemset >>> >>> If the above is more or less the case, would it be possible to change >>> this to: >>> >>> if select(1) with itemset >>> if external data from csv >>> parse as *fast* itemset >>> else >>> parse as *regular *itemset >>> >>> Advantages of the latter: >>> - XForms syntax remains clean and correct, i.e. a select and select1 >>> remain what they are, and so do itemsets >>> - No need for additional query attribute >>> >>> Cheers, >>> Martijn >>> >>> P.S. If any Dimagi devs see this, it would be very useful if you could >>> share your experience on making large itemsets faster if you have done this. >>> >>> On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa <yan...@nafundi.com > wrote: >>> >>>> Martijn, >>>> >>>> Chris' point about performance and cost is why external itemsets are >>>> built the way they are. JavaRosa requires that select one options be loaded >>>> into memory. Further, you can't save a select one option that did not come >>>> from one of those loaded in memory. >>>> >>>> These constraints are why external itemsets overload text input ( >>>> https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete >>>> was removed from Collect (https://code.google.com/p/ >>>> opendatakit/issues/detail?id=289). >>>> >>>> Agreed that none of this is ideal. The solution is to rewrite the core, >>>> but that will require a lot of effort without a lot of visible benefit to >>>> the user. >>>> >>>> Yaw >>>> -- >>>> Need ODK services? http://nafundi.com provides form design, server >>>> setup, professional support, and software development for ODK. >>>> >>>> On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < cro...@surveycto.com > wrote: >>>> >>>>> Hi Martijn, >>>>> >>>>> This looks neat. Thanks for putting in the work on it. >>>>> >>>>> As Yaw suggests, it sounds similar to Nafundi's "fast itemset" >>>>> contribution. We hadn't made use of that ourselves because our users author >>>>> in Excel or Google Drive, and because we lacked the python expertise to >>>>> make the relatively significant pyxform changes necessary to make it >>>>> accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, >>>>> but our efforts to minimize changes to pyxform are particularly extreme >>>>> given that we're a Java development team. Nathan's pyxform contribution, >>>>> though, would seem to bring fast itemsets into the pyxform fold, which is >>>>> fantastic (this is the contribution linked by Yaw in this thread). >>>>> >>>>> My primary concerns with this spec are practical in nature: >>>>> >>>>> *1. Performance. *JavaRosa's handling of XForm-based choice lists >>>>> (and filtering) performs such that forms on fast, modern devices cannot >>>>> practically exceed 1,000-2,000 choices in total. Since a great many of our >>>>> users exceed those limits, we needed something that would perform >>>>> significantly better. Obviously, Nafundi's contribution grew from the same >>>>> need, and they hewed much closer to the XForm spirit (and maybe even the >>>>> specification). Still, I wonder if meeting all of the elements of your >>>>> proposed specification (namely the ability to reference and filter via >>>>> XPath expressions) would lead to an in-memory representation that would >>>>> suffer the same practical limits as originated the need for something >>>>> faster. Perhaps some kind of SQLite-based (as opposed to memory-based) >>>>> method of storing and accessing items can be devised that meets the various >>>>> requirements you propose ("artificial instances" in the Dimagi spec's >>>>> language) -- but I'm guessing that it would be pretty hard. (I could be >>>>> wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already >>>>> meets the key requirements while still performing well with many thousands >>>>> of choice items?) >>>>> >>>>> *2. Cost.* Socially, it's important that the benefits of various >>>>> approaches exceed the costs; and practically, it's important that somebody >>>>> (or some people) be willing to bear the cost of implementation. If >>>>> Nafundi's fast-itemset implementation is already very close to meeting >>>>> these proposed requirements, then the cost of implementing your proposal >>>>> may be relatively low. That would be great. If the cost becomes high, >>>>> though, then somebody would fundraise to support the implementation? >>>>> (Please forgive my open-source ignorance here.) >>>>> >>>>> My secondary concerns regard the user experience: >>>>> >>>>> 1. It would be a bit of a shame to lose the ability to mix static and >>>>> dynamic choices. Lots of SurveyCTO users use this, and the obvious >>>>> work-around (include all static choices in the dynamic .csv file) is >>>>> awkward for a great many cases. For example, say you pull a list of people >>>>> from a .csv file and then want to include a "Don't know" or "Not listed" >>>>> option: you could obviously add those static options to the .csv, but it >>>>> would mean that you couldn't just directly dump your list of people from a >>>>> database or from another form... so the experience for the user would >>>>> become much more complicated. I dare say that the potential user-experience >>>>> pay-off might warrant devising a method to mix static and dynamic options. >>>>> >>>>> 2. It's not clear to me how to easily match or beat pulldata()'s >>>>> ease-of-use (such as it is!) in the pyxform context. Mitch has proposed >>>>> various extensions to the (perhaps over-)simplistic ${fieldname} >>>>> referencing scheme, and it seems that we'd need to work referencing of >>>>> external instances into one of those extensions. That should be possible, >>>>> but it might involve (much) greater surgery to pyxform. Somehow, it needs >>>>> to be easy for spreadsheet-based users to grab a cell from one of these >>>>> external data sources. (Or pulldata() remains the go-to method for >>>>> spreadsheet-based users... in which case it ought to be documented as part >>>>> of the spec.) >>>>> >>>>> >>>>> Thanks again, >>>>> >>>>> Chris >>>>> >>>>> >>>>> On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt < mar...@enketo.org > wrote: >>>>> >>>>>> Hi ODK XForm enthusiasts, >>>>>> >>>>>> I would like to propose to tweak the way external data is added to >>>>>> ODK XForms and the way dynamic choice lists are generated from external >>>>>> data. >>>>>> >>>>>> These changes will not need to break existing forms in ODK Collect as >>>>>> far as I can see. They build on the valuable work done by SurveyCTO to >>>>>> provide this very useful functionality. >>>>>> 1. Adding External Data >>>>>> >>>>>> In the currently supported method, the XForm doesn’t provide the >>>>>> source of the external data and it cannot be queried using XPath, the logic >>>>>> language of our beloved form format. >>>>>> >>>>>> >>>>>> We may want to store (large) external data in a database, use the CSV >>>>>> format, and query the client database directly for performance reasons. >>>>>> This proposal allows a client to keep doing that. It doesn’t describe the >>>>>> underlying implementation, just the XForm format. >>>>>> >>>>>> This proposal is also nothing new. It simply adds a csv source to >>>>>> this existing spec: https://bitbucket.org/javarosa/javarosa/wiki/ >>>>>> externalinstances by Dimagi because this specification is elegant, >>>>>> extensible and inline with existing ODK XForm functionality. >>>>>> >>>>>> Adding a CSV file to a form could be done by simply adding a >>>>>> secondary instance element with a src attribute: >>>>>> >>>>>> >>>>>> >>>>>> The jr://file-csv/ prefix indicates that: >>>>>> >>>>>> - >>>>>> >>>>>> the location of the external resource is listed in the >>>>>> xformsManifest (similar to the currently used jr://images prefix for media >>>>>> files) >>>>>> - >>>>>> >>>>>> the data has the CSV format >>>>>> >>>>>> >>>>>> It is assumed that this (virtually) creates an XML instance in the >>>>>> client. E.g. jr://file-csv/fruits.csv >>>>>> >>>>>> >>>>>> is interpreted as: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> mango >>>>>> >>>>>> Mango >>>>>> >>>>>> >>>>>> >>>>>> … >>>>>> >>>>>> >>>>>> >>>>>> papaya >>>>>> >>>>>> Papaya >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> If a ‘sortby’ column is present, the items will be sorted, as is the >>>>>> case currently. >>>>>> >>>>>> The instance can be queried in normal supported XPath, like: >>>>>> >>>>>> instance(‘fruits’)/item[value=’papaya’]/label => Papaya >>>>>> >>>>>> or with the existing pulldata() function: >>>>>> >>>>>> pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya >>>>>> >>>>>> Changes in XLSForm: >>>>>> >>>>>> We’d need a way to output the external source. Some options: >>>>>> >>>>>> - >>>>>> >>>>>> the type “select_one external fruits.csv” outputs the instance >>>>>> with src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks >>>>>> at the csv extension to give the correct src prefix (because in the future >>>>>> we may want to add other formats) @Mitch is this perhaps what you are >>>>>> hinting at the end of this blog post >>>>>> ? >>>>>> - >>>>>> >>>>>> or, add an ‘external data’ column, e.g. in the settings sheet, >>>>>> that can contain multiple sources (ie. filenames with the .csv extension in >>>>>> this case). >>>>>> >>>>>> >>>>>> Existing forms using pulldata() can remain fully functional. It’s up >>>>>> to the client to deal with the missing in old non-compliant >>>>>> XForms for backwards compatibility sake. The commitment with adopting this >>>>>> new approach is simply to start requiring the new method for new XForms and >>>>>> for new XForms to work as described in the new specification. >>>>>> >>>>>> The one thing you cannot do with the new method is mix static and >>>>>> external choices. >>>>>> >>>>>> 2. Creating dynamic select choice lists from external data >>>>>> >>>>>> Implementing #1 will also mean that users will be able to use the >>>>>> existing choice-filter (in XLSForms terminology) to create nodesets from >>>>>> external data. It’s up to the client whether to convert this internally to >>>>>> a db query or to handle it in pure XPath on XML. For users this will >>>>>> mean that the difference between external and internal choice-lists is >>>>>> minimal. Creating e.g. cascade logic will be done in exactly the >>>>>> same way for both. >>>>>> >>>>>> For historical reasons, I think we could use the current search() >>>>>> function inside a predicate (choice-filter), where the first parameter >>>>>> would become obsolete. >>>>>> >>>>>> However, I’d much prefer using the following separate (and much >>>>>> simplified, more user-friendly) XPath functions instead (2 of these are >>>>>> native XPath 1.0 functions): >>>>>> >>>>>> - >>>>>> >>>>>> instance(‘fruits’)/item[starts-with(value, ‘p’)] >>>>>> - >>>>>> >>>>>> instance(‘fruits’)/item[ends-with(value, ‘a’)] >>>>>> - >>>>>> >>>>>> instance(‘fruits’)/item[contains(value, ‘y’)] >>>>>> >>>>>> >>>>>> In XLSForm: >>>>>> >>>>>> >>>>>> What do you think? >>>>>> >>>>>> Martijn >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "ODK Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to opendatakit-developers+unsubscribe@googlegroups.com >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "ODK Developers" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to opendatakit-developers+unsubscribe@googlegroups.com >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "ODK Developers" group. >>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>> topic/opendatakit-developers/9_VZoe7crVY/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> opendatakit-developers+unsubscribe@googlegroups.com . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> *Did you know that Enketo Smart Paper has now become the #1 tool for >>> data collection? Don't fall behind. Use it!* >>> >>> Enketo | LinkedIn >>> | GitHub >>> | Twitter >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "ODK Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to opendatakit-developers+unsubscribe@googlegroups.com >> . >> For more options, visit https://groups.google.com/d/optout. >> > >

Short answer: NO

This is touching the very inner workings of the XPath expression evaluation
code. That is very delicate code. Rather than modifying Javarosa, it would
likely be better to start with a fully validated, fully tested, fully
vetted XPath expression evaluator, and then add hooks to that to get
behaviors that replicate the functionality of javarosa -- i.e., scrap the
entire JR codebase.

I say this because there are two implementations of that innermost section
of Java code, one that Dimagi uses, and an earlier branch that ODK uses.

Last time I tried to merge up to the Dimagi branch, I had to back out the
changes because they altered the outcomes of the random forms I was testing
against.

I don't know which codebase is correct, or perhaps both are broken?

··· ---------- The only way we could move forward is to expend a lot of effort setting up automated tests to confirm the current behavior of the system.

Once we have those tests in place, then you could make dramatic internal
changes with some assurance that they do not affect outcomes.


Mitch

On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt martijn@enketo.org wrote:

Thanks for this Carl. I think I understand now and I don't envy you for
having to make these maneuvers to avoid touching JavaRosa.

If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets /
external data using a clean XForm syntax instead, what would it take on the
JavaRosa side? Would an external contribution to make select a select1
'open', be welcomed and be sufficient? Anything else?

Martijn

On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote:

Hi all,

And the SurveyCTO-contributed "dynamic search-and-select" feature acts as
a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will
respect it. But it only loads a filtered subset of the on-desk DB into
JavaRosa for all of the performance reasons already discussed. If you end
up with a extremely long list presented to the user, it will still slow
down (and eat loads of memory), unlike the Nafundi-contributed "fast
itemset" implementation.

Best,

Chris

On Mon, Sep 15, 2014 at 11:48 AM, chartung char...@nafundi.com wrote:

Hey Martijn,

You're close, but the external vs. regular itemsets is more like this:

if select(1) with itemset
let javarosa figure out what to display, then display Collect's
select1 widget with data from javarosa
else if input with type==text and query attribute
don't let javarosa do anything. Display an external-Itemset widget in
Collect, handle all of the db lookups, etc... and only use javarosa to save
the selected answer to the data model in memory.

The reason we used text instead of select(1) is that javarosa doesn't
allow saving values in a select1 node that are not included in that
select(1)'s list of values. Since we're doing all of the processing
externally, the select(1)'s list is empty, and will never save anything.

I believe javarosa's implementation is called a "closed" select1, and
there was some talk of adding the option of "open" select1s at some point,
but I don't think anything happened beyond discussion.

Hope that helps,
-Carl

On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote:

Thanks Chris, Yaw,

I see some real opportunity here to agree on a common and solid
approach for external data. I'd like to jump in on the performance issue
first.

After the explanation, I imagine fast itemsets work something like this
(please correct me if I'm wrong):

if select(1) with itemset
parse as regular itemset
if input with type=text and query attribute
parse as fast itemset

If the above is more or less the case, would it be possible to change
this to:

if select(1) with itemset
if external data from csv
parse as fast itemset
else
parse as *regular *itemset

Advantages of the latter:

  • XForms syntax remains clean and correct, i.e. a select and select1
    remain what they are, and so do itemsets
  • No need for additional query attribute

Cheers,
Martijn

P.S. If any Dimagi devs see this, it would be very useful if you could
share your experience on making large itemsets faster if you have done this.

On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa yan...@nafundi.com wrote:

Martijn,

Chris' point about performance and cost is why external itemsets are
built the way they are. JavaRosa requires that select one options be loaded
into memory. Further, you can't save a select one option that did not come
from one of those loaded in memory.

These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why
autocomplete was removed from Collect (https://code.google.com/p/ope
ndatakit/issues/detail?id=289).

Agreed that none of this is ideal. The solution is to rewrite the
core, but that will require a lot of effort without a lot of visible
benefit to the user.

Yaw

Need ODK services? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < cro...@surveycto.com> wrote:

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists
(and filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt < mar...@enketo.org> wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to
ODK XForms and the way dynamic choice lists are generated from external
data.

These changes will not need to break existing forms in ODK Collect
as far as I can see. They build on the valuable work done by SurveyCTO to
provide this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the
source of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the
CSV format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to
this existing spec: https://bitbucket.org/javarosa/javarosa/wiki/
externalinstances by Dimagi because this specification is elegant,
extensible and inline with existing ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a
secondary instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the
xformsManifest (similar to the currently used jr://images prefix for media
files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the
case currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance
with src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks
at the csv extension to give the correct src prefix (because in the future
we may want to add other formats) @Mitch is this perhaps what you are
hinting at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet,
that can contain multiple sources (ie. filenames with the .csv extension in
this case).

Existing forms using pulldata() can remain fully functional. It’s
up to the client to deal with the missing in old non-compliant
XForms for backwards compatibility sake. The commitment with adopting this
new approach is simply to start requiring the new method for new XForms and
for new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this
will mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the
same way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to opendatakit-developers+unsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/opendatakit-developers/9_VZoe7crVY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Did you know that Enketo Smart Paper has now become the #1 tool for
data collection? Don't fall behind. Use it!

Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR | Twitter
https://twitter.com/enketo

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Thanks Mitch,

I understand. Hopefully that can be sorted out eventually as I am worried
about the future of the ODK XForm format due the amount of technical debt
that seems to only be increasing - it is after all the core we all revolve
around. I believe Dimagi's branch has full XPath predicate support as well
and does some kind of static analysis of XPath search queries to address
performance of itemsets (though the latter may not be in JavaRosa - not
sure).

When it's time to add external data support to Enketo, I'll have to go for
the third solution, probably as proposed above with a different XLSForm
syntax. As mentioned, on the XForm side this is basically nothing more than
introducing the src attribute on a secondary instance. The potential of
external data is rather promising for users, especially for a web-based
application like Enketo where you could potentially pass custom data
sources to a URL. Therefore, I'll have to make sure it's completely solid
from the beginning and doesn't involve ending up with three completely
different ways to create itemsets.

Cheers,
Martijn

··· On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote: > > Short answer: NO > > This is touching the very inner workings of the XPath expression > evaluation code. That is very delicate code. Rather than modifying > Javarosa, it would likely be better to start with a fully validated, fully > tested, fully vetted XPath expression evaluator, and then add hooks to that > to get behaviors that replicate the functionality of javarosa -- i.e., > scrap the entire JR codebase. > > I say this because there are two implementations of that innermost section > of Java code, one that Dimagi uses, and an earlier branch that ODK uses. > > Last time I tried to merge up to the Dimagi branch, I had to back out the > changes because they altered the outcomes of the random forms I was testing > against. > > I don't know which codebase is correct, or perhaps both are broken? > > ---------- > The only way we could move forward is to expend a lot of effort setting up > automated tests to confirm the current behavior of the system. > > Once we have those tests in place, then you could make dramatic internal > changes with some assurance that they do not affect outcomes. > > ------- > Mitch > > > On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt <mar...@enketo.org > wrote: > >> Thanks for this Carl. I think I understand now and I don't envy you for >> having to make these maneuvers to avoid touching JavaRosa. >> >> If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets / >> external data using a clean XForm syntax instead, what would it take on the >> JavaRosa side? Would an external contribution to make select a select1 >> 'open', be welcomed and be sufficient? Anything else? >> >> Martijn >> >> On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote: >>> >>> Hi all, >>> >>> And the SurveyCTO-contributed "dynamic search-and-select" feature acts >>> as a kind of hybrid. It loads a dynamic list into memory so that JavaRosa >>> will respect it. But it only loads a filtered subset of the on-desk DB into >>> JavaRosa for all of the performance reasons already discussed. If you end >>> up with a extremely long list presented to the user, it will still slow >>> down (and eat loads of memory), unlike the Nafundi-contributed "fast >>> itemset" implementation. >>> >>> Best, >>> >>> Chris >>> >>> >>> On Mon, Sep 15, 2014 at 11:48 AM, chartung wrote: >>> >>>> Hey Martijn, >>>> >>>> You're close, but the external vs. regular itemsets is more like this: >>>> >>>> if select(1) with itemset >>>> let javarosa figure out what to display, then display Collect's >>>> select1 widget with data from javarosa >>>> else if input with type==text and query attribute >>>> don't let javarosa do anything. Display an external-Itemset widget >>>> in Collect, handle all of the db lookups, etc... and only use javarosa to >>>> save the selected answer to the data model in memory. >>>> >>>> The reason we used text instead of select(1) is that javarosa doesn't >>>> allow saving values in a select1 node that are not included in that >>>> select(1)'s list of values. Since we're doing all of the processing >>>> externally, the select(1)'s list is empty, and will never save anything. >>>> >>>> I believe javarosa's implementation is called a "closed" select1, and >>>> there was some talk of adding the option of "open" select1s at some point, >>>> but I don't think anything happened beyond discussion. >>>> >>>> Hope that helps, >>>> -Carl >>>> >>>> >>>> On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: >>>> >>>>> Thanks Chris, Yaw, >>>>> >>>>> I see some real opportunity here to agree on a common and solid >>>>> approach for external data. I'd like to jump in on the performance issue >>>>> first. >>>>> >>>>> After the explanation, I imagine fast itemsets work something like >>>>> this (please correct me if I'm wrong): >>>>> >>>>> if select(1) with itemset >>>>> parse as *regular* itemset >>>>> if input with type=text and query attribute >>>>> parse as *fast* itemset >>>>> >>>>> If the above is more or less the case, would it be possible to change >>>>> this to: >>>>> >>>>> if select(1) with itemset >>>>> if external data from csv >>>>> parse as *fast* itemset >>>>> else >>>>> parse as *regular *itemset >>>>> >>>>> Advantages of the latter: >>>>> - XForms syntax remains clean and correct, i.e. a select and select1 >>>>> remain what they are, and so do itemsets >>>>> - No need for additional query attribute >>>>> >>>>> Cheers, >>>>> Martijn >>>>> >>>>> P.S. If any Dimagi devs see this, it would be very useful if you could >>>>> share your experience on making large itemsets faster if you have done this. >>>>> >>>>> On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote: >>>>> >>>>>> Martijn, >>>>>> >>>>>> Chris' point about performance and cost is why external itemsets are >>>>>> built the way they are. JavaRosa requires that select one options be loaded >>>>>> into memory. Further, you can't save a select one option that did not come >>>>>> from one of those loaded in memory. >>>>>> >>>>>> These constraints are why external itemsets overload text input ( >>>>>> https://github.com/SEL-Columbia/pyxform/pull/120) and why >>>>>> autocomplete was removed from Collect (https://code.google.com/p/ope >>>>>> ndatakit/issues/detail?id=289). >>>>>> >>>>>> Agreed that none of this is ideal. The solution is to rewrite the >>>>>> core, but that will require a lot of effort without a lot of visible >>>>>> benefit to the user. >>>>>> >>>>>> Yaw >>>>>> -- >>>>>> Need ODK services? http://nafundi.com provides form design, server >>>>>> setup, professional support, and software development for ODK. >>>>>> >>>>>> On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < cro...@surveycto.com> wrote: >>>>>> >>>>>>> Hi Martijn, >>>>>>> >>>>>>> This looks neat. Thanks for putting in the work on it. >>>>>>> >>>>>>> As Yaw suggests, it sounds similar to Nafundi's "fast itemset" >>>>>>> contribution. We hadn't made use of that ourselves because our users author >>>>>>> in Excel or Google Drive, and because we lacked the python expertise to >>>>>>> make the relatively significant pyxform changes necessary to make it >>>>>>> accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, >>>>>>> but our efforts to minimize changes to pyxform are particularly extreme >>>>>>> given that we're a Java development team. Nathan's pyxform contribution, >>>>>>> though, would seem to bring fast itemsets into the pyxform fold, which is >>>>>>> fantastic (this is the contribution linked by Yaw in this thread). >>>>>>> >>>>>>> My primary concerns with this spec are practical in nature: >>>>>>> >>>>>>> *1. Performance. *JavaRosa's handling of XForm-based choice lists >>>>>>> (and filtering) performs such that forms on fast, modern devices cannot >>>>>>> practically exceed 1,000-2,000 choices in total. Since a great many of our >>>>>>> users exceed those limits, we needed something that would perform >>>>>>> significantly better. Obviously, Nafundi's contribution grew from the same >>>>>>> need, and they hewed much closer to the XForm spirit (and maybe even the >>>>>>> specification). Still, I wonder if meeting all of the elements of your >>>>>>> proposed specification (namely the ability to reference and filter via >>>>>>> XPath expressions) would lead to an in-memory representation that would >>>>>>> suffer the same practical limits as originated the need for something >>>>>>> faster. Perhaps some kind of SQLite-based (as opposed to memory-based) >>>>>>> method of storing and accessing items can be devised that meets the various >>>>>>> requirements you propose ("artificial instances" in the Dimagi spec's >>>>>>> language) -- but I'm guessing that it would be pretty hard. (I could be >>>>>>> wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already >>>>>>> meets the key requirements while still performing well with many thousands >>>>>>> of choice items?) >>>>>>> >>>>>>> *2. Cost.* Socially, it's important that the benefits of various >>>>>>> approaches exceed the costs; and practically, it's important that somebody >>>>>>> (or some people) be willing to bear the cost of implementation. If >>>>>>> Nafundi's fast-itemset implementation is already very close to meeting >>>>>>> these proposed requirements, then the cost of implementing your proposal >>>>>>> may be relatively low. That would be great. If the cost becomes high, >>>>>>> though, then somebody would fundraise to support the implementation? >>>>>>> (Please forgive my open-source ignorance here.) >>>>>>> >>>>>>> My secondary concerns regard the user experience: >>>>>>> >>>>>>> 1. It would be a bit of a shame to lose the ability to mix static >>>>>>> and dynamic choices. Lots of SurveyCTO users use this, and the obvious >>>>>>> work-around (include all static choices in the dynamic .csv file) is >>>>>>> awkward for a great many cases. For example, say you pull a list of people >>>>>>> from a .csv file and then want to include a "Don't know" or "Not listed" >>>>>>> option: you could obviously add those static options to the .csv, but it >>>>>>> would mean that you couldn't just directly dump your list of people from a >>>>>>> database or from another form... so the experience for the user would >>>>>>> become much more complicated. I dare say that the potential user-experience >>>>>>> pay-off might warrant devising a method to mix static and dynamic options. >>>>>>> >>>>>>> 2. It's not clear to me how to easily match or beat pulldata()'s >>>>>>> ease-of-use (such as it is!) in the pyxform context. Mitch has proposed >>>>>>> various extensions to the (perhaps over-)simplistic ${fieldname} >>>>>>> referencing scheme, and it seems that we'd need to work referencing of >>>>>>> external instances into one of those extensions. That should be possible, >>>>>>> but it might involve (much) greater surgery to pyxform. Somehow, it needs >>>>>>> to be easy for spreadsheet-based users to grab a cell from one of these >>>>>>> external data sources. (Or pulldata() remains the go-to method for >>>>>>> spreadsheet-based users... in which case it ought to be documented as part >>>>>>> of the spec.) >>>>>>> >>>>>>> >>>>>>> Thanks again, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt < mar...@enketo.org> wrote: >>>>>>> >>>>>>>> Hi ODK XForm enthusiasts, >>>>>>>> >>>>>>>> I would like to propose to tweak the way external data is added to >>>>>>>> ODK XForms and the way dynamic choice lists are generated from external >>>>>>>> data. >>>>>>>> >>>>>>>> These changes will not need to break existing forms in ODK Collect >>>>>>>> as far as I can see. They build on the valuable work done by SurveyCTO to >>>>>>>> provide this very useful functionality. >>>>>>>> 1. Adding External Data >>>>>>>> >>>>>>>> In the currently supported method, the XForm doesn’t provide the >>>>>>>> source of the external data and it cannot be queried using XPath, the logic >>>>>>>> language of our beloved form format. >>>>>>>> >>>>>>>> >>>>>>>> We may want to store (large) external data in a database, use the >>>>>>>> CSV format, and query the client database directly for performance reasons. >>>>>>>> This proposal allows a client to keep doing that. It doesn’t describe the >>>>>>>> underlying implementation, just the XForm format. >>>>>>>> >>>>>>>> This proposal is also nothing new. It simply adds a csv source to >>>>>>>> this existing spec: https://bitbucket.org/javarosa/javarosa/wiki/ >>>>>>>> externalinstances by Dimagi because this specification is elegant, >>>>>>>> extensible and inline with existing ODK XForm functionality. >>>>>>>> >>>>>>>> Adding a CSV file to a form could be done by simply adding a >>>>>>>> secondary instance element with a src attribute: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The jr://file-csv/ prefix indicates that: >>>>>>>> >>>>>>>> - >>>>>>>> >>>>>>>> the location of the external resource is listed in the >>>>>>>> xformsManifest (similar to the currently used jr://images prefix for media >>>>>>>> files) >>>>>>>> - >>>>>>>> >>>>>>>> the data has the CSV format >>>>>>>> >>>>>>>> >>>>>>>> It is assumed that this (virtually) creates an XML instance in the >>>>>>>> client. E.g. jr://file-csv/fruits.csv >>>>>>>> >>>>>>>> >>>>>>>> is interpreted as: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> mango >>>>>>>> >>>>>>>> Mango >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> … >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> papaya >>>>>>>> >>>>>>>> Papaya >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> If a ‘sortby’ column is present, the items will be sorted, as is >>>>>>>> the case currently. >>>>>>>> >>>>>>>> The instance can be queried in normal supported XPath, like: >>>>>>>> >>>>>>>> instance(‘fruits’)/item[value=’papaya’]/label => Papaya >>>>>>>> >>>>>>>> or with the existing pulldata() function: >>>>>>>> >>>>>>>> pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya >>>>>>>> >>>>>>>> Changes in XLSForm: >>>>>>>> >>>>>>>> We’d need a way to output the external source. Some options: >>>>>>>> >>>>>>>> - >>>>>>>> >>>>>>>> the type “select_one external fruits.csv” outputs the instance >>>>>>>> with src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks >>>>>>>> at the csv extension to give the correct src prefix (because in the future >>>>>>>> we may want to add other formats) @Mitch is this perhaps what you are >>>>>>>> hinting at the end of this blog post >>>>>>>> >>>>>>>> ? >>>>>>>> - >>>>>>>> >>>>>>>> or, add an ‘external data’ column, e.g. in the settings sheet, >>>>>>>> that can contain multiple sources (ie. filenames with the .csv extension in >>>>>>>> this case). >>>>>>>> >>>>>>>> >>>>>>>> Existing forms using pulldata() can remain fully functional. It’s >>>>>>>> up to the client to deal with the missing in old non-compliant >>>>>>>> XForms for backwards compatibility sake. The commitment with adopting this >>>>>>>> new approach is simply to start requiring the new method for new XForms and >>>>>>>> for new XForms to work as described in the new specification. >>>>>>>> >>>>>>>> The one thing you cannot do with the new method is mix static and >>>>>>>> external choices. >>>>>>>> >>>>>>>> 2. Creating dynamic select choice lists from external data >>>>>>>> >>>>>>>> Implementing #1 will also mean that users will be able to use the >>>>>>>> existing choice-filter (in XLSForms terminology) to create nodesets from >>>>>>>> external data. It’s up to the client whether to convert this internally to >>>>>>>> a db query or to handle it in pure XPath on XML. For users this >>>>>>>> will mean that the difference between external and internal choice-lists is >>>>>>>> minimal. Creating e.g. cascade logic will be done in exactly the >>>>>>>> same way for both. >>>>>>>> >>>>>>>> For historical reasons, I think we could use the current search() >>>>>>>> function inside a predicate (choice-filter), where the first parameter >>>>>>>> would become obsolete. >>>>>>>> >>>>>>>> However, I’d much prefer using the following separate (and much >>>>>>>> simplified, more user-friendly) XPath functions instead (2 of these are >>>>>>>> native XPath 1.0 functions): >>>>>>>> >>>>>>>> - >>>>>>>> >>>>>>>> instance(‘fruits’)/item[starts-with(value, ‘p’)] >>>>>>>> - >>>>>>>> >>>>>>>> instance(‘fruits’)/item[ends-with(value, ‘a’)] >>>>>>>> - >>>>>>>> >>>>>>>> instance(‘fruits’)/item[contains(value, ‘y’)] >>>>>>>> >>>>>>>> >>>>>>>> In XLSForm: >>>>>>>> >>>>>>>> >>>>>>>> What do you think? >>>>>>>> >>>>>>>> Martijn >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "ODK Developers" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to opendatakit-developers+unsubsc >>>>>>>> ribe@googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "ODK Developers" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to opendatakit-developers+unsubscribe@googlegroups.com >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to a topic in >>>>>> the Google Groups "ODK Developers" group. >>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>>>>> pic/opendatakit-developers/9_VZoe7crVY/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> opendatakit-developers+unsubscribe@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Did you know that Enketo Smart Paper has now become the #1 tool for >>>>> data collection? Don't fall behind. Use it!* >>>>> >>>>> Enketo | LinkedIn >>>>> | GitHub >>>>> | Twitter >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "ODK Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to opendatakit-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "ODK Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to opendatakit-developers+unsubscribe@googlegroups.com >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Mitch Sundt > Software Engineer > University of Washington > mitche...@gmail.com >

Hi Martijn,
We intend to develop this external data feature for the web form (enketo).
Can you guide us how and where to start, or we can collaborate in
developing as well.

Many thanks,
Trung.

··· On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote: > > Thanks Mitch, > > I understand. Hopefully that can be sorted out eventually as I am worried > about the future of the ODK XForm format due the amount of technical debt > that seems to only be increasing - it is after all the core we all revolve > around. I believe Dimagi's branch has full XPath predicate support as well > and does some kind of static analysis of XPath search queries to address > performance of itemsets (though the latter may not be in JavaRosa - not > sure). > > When it's time to add external data support to Enketo, I'll have to go for > the third solution, probably as proposed above with a different XLSForm > syntax. As mentioned, on the XForm side this is basically nothing more than > introducing the src attribute on a secondary instance. The potential of > external data is rather promising for users, especially for a web-based > application like Enketo where you could potentially pass custom data > sources to a URL. Therefore, I'll have to make sure it's completely solid > from the beginning and doesn't involve ending up with three completely > different ways to create itemsets. > > Cheers, > Martijn > > On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote: > > Short answer: NO > > This is touching the very inner workings of the XPath expression > evaluation code. That is very delicate code. Rather than modifying > Javarosa, it would likely be better to start with a fully validated, fully > tested, fully vetted XPath expression evaluator, and then add hooks to that > to get behaviors that replicate the functionality of javarosa -- i.e., > scrap the entire JR codebase. > > I say this because there are two implementations of that innermost section > of Java code, one that Dimagi uses, and an earlier branch that ODK uses. > > Last time I tried to merge up to the Dimagi branch, I had to back out the > changes because they altered the outcomes of the random forms I was testing > against. > > I don't know which codebase is correct, or perhaps both are broken? > > ---------- > The only way we could move forward is to expend a lot of effort setting up > automated tests to confirm the current behavior of the system. > > Once we have those tests in place, then you could make dramatic internal > changes with some assurance that they do not affect outcomes. > > ------- > Mitch > > > On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt wrote: > > Thanks for this Carl. I think I understand now and I don't envy you for > having to make these maneuvers to avoid touching JavaRosa. > > If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets / > external data using a clean XForm syntax instead, what would it take on the > JavaRosa side? Would an external contribution to make select a select1 > 'open', be welcomed and be sufficient? Anything else? > > Martijn > > On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote: > > Hi all, > > And the SurveyCTO-contributed "dynamic search-and-select" feature acts as > a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will > respect it. But it only loads a filtered subset of the on-desk DB into > JavaRosa for all of the performance reasons already discussed. If you end > up with a extremely long list presented to the user, it will still slow > down (and eat loads of memory), unlike the Nafundi-contributed "fast > itemset" implementation. > > Best, > > Chris > > > On Mon, Sep 15, 2014 at 11:48 AM, chartung wrote: > > Hey Martijn, > > You're close, but the external vs. regular itemsets is more like this: > > if select(1) with itemset > let javarosa figure out what to display, then display Collect's select1 > widget with data from javarosa > else if input with type==text and query attribute > don't let javarosa do anything. Display an external-Itemset widget in > Collect, handle all of the db lookups, etc... and only use javarosa to save > the selected answer to the data model in memory. > > The reason we used text instead of select(1) is that javarosa doesn't > allow saving values in a select1 node that are not included in that > select(1)'s list of values. Since we're doing all of the processing > externally, the select(1)'s list is empty, and will never save anything. > > I believe javarosa's implementation is called a "closed" select1, and > there was some talk of adding the option of "open" select1s at some point, > but I don't think anything happened beyond discussion. > > Hope that helps, > -Carl > > > On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: > > Thanks Chris, Yaw, > > I see some real opportunity here to agree on a common and solid approach > for external data. I'd like to jump in on the performance issue first. > > After the explanation, I imagine fast itemsets work something like this > (please correct me if I'm wrong): > > if select(1) with itemset > parse as *regular* itemset > if input with type=text and query attribute > parse as *fast* itemset > > If the above is more or less the case, would it be possible to change this > to: > > if select(1) with itemset > if external data from csv > parse as *fast* itemset > else > parse as *regular *itemset > > Advantages of the latter: > - XForms syntax remains clean and correct, i.e. a select and select1 > remain what they are, and so do itemsets > - No need for additional query attribute > > Cheers, > Martijn > > P.S. If any Dimagi devs see this, it would be very useful if you could > share your experience on making large itemsets faster if you have done this. > > On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote: > > Martijn, > > Chris' point about performance and cost is why external itemsets are built > the way they are. JavaRosa requires that select one options be loaded into > memory. Further, you can't save a select one option that did not come from > one of those loaded in memory. > > These constraints are why external itemsets overload text input ( > https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete > was removed from Collect (https://code.google.com/p/ope > ndatakit/issues/detail?id=289). > > Agreed that none of this is ideal. The solution is to rewrite the core, > but that will require a lot of effort without a lot of visible benefit to > the user. > > Yaw > -- > Need ODK services? http://nafundi.com provides form design, server setup, > professional support, and software development for ODK. > > On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert wrote: > > Hi Martijn, > > This looks neat. Thanks for putting in the work on it. > > As Yaw suggests, it sounds similar to Nafundi's "fast itemset" > contribution. We hadn't made use of that ourselves because our users author > in Excel or Google Drive, and because we lacked the python expertise to > make the relatively significant pyxform changes necessary to make it > accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, > but our efforts to minimize changes to pyxform are particularly extreme > given that we're a Java development team. Nathan's pyxform contribution, > though, would seem to bring fast itemsets into the pyxform fold, which is > fantastic (this is the contribution linked by Yaw in this thread). > > My primary concerns with this spec are practical in nature: > > *1. Performance. *JavaRosa's handling of XForm-based choice lists (and > filtering) performs such that forms on fast, modern devices cannot > practically exceed 1,000-2,000 choices in total. Since a great many of our > users exceed those limits, we needed something that would perform > significantly better. Obviously, Nafundi's contribution grew from the same > need, and they hewed much closer to the XForm spirit (and maybe even the > specification). Still, I wonder if meeting all of the elements of your > proposed specification (namely the ability to reference and filter via > XPath expressions) would lead to an in-memory representation that would > suffer the same practical limits as originated the need for something > faster. Perhaps some kind of SQLite-based (as opposed to memory-based) > method of storing and accessing items can be devised that meets the various > requirements you propose ("artificial instances" in the Dimagi spec's > language) -- but I'm guessing that it would be pretty hard. (I could be > wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already > meets the key requirements while still performing well with many thousands > of choice items?) > > *2. Cost.* Socially, it's important that the benefits of various > approaches exceed the costs; and practically, it's important that somebody > (or some people) be willing to bear the cost of implementation. If > Nafundi's fast-itemset implementation is already very close to meeting > these proposed requirements, then the cost of implementing your proposal > may be relatively low. That would be great. If the cost becomes high, > though, then somebody would fundraise to support the implementation? > (Please forgive my open-source ignorance here.) > > My secondary concerns regard the user experience: > > 1. It would be a bit of a shame to lose the ability to mix static and > dynamic choices. Lots of SurveyCTO users use this, and the obvious > work-around (include all static choices in the dynamic .csv file) is > awkward for a great many cases. For example, say you pull a list of people > from a .csv file and then want to include a "Don't know" or "Not listed" > option: you could obviously add those static options to the .csv, but it > would mean that you couldn't just directly dump your list of people from a > database or from another form... so the experience for the user would > become much more complicated. I dare say that the potential user-experience > pay-off might warrant devising a method to mix static and dynamic options. > > 2. It's not clear to me how to easily match or beat pulldata()'s > ease-of-use (such as it is!) in the pyxform context. Mitch has proposed > various extensions to the (perhaps over-)simplistic ${fieldname} > referencing scheme, and it seems that we'd need to work referencing of > external instances into one of those extensions. That should be possible, > but it might involve (much) greater surgery to pyxform. Somehow, it needs > to be easy for spreadsheet-based users to grab a cell from one of these > external data sources. (Or pulldata() remains the go-to method for > spreadsheet-based users... in which case it ought to be documented as part > of the spec.) > > > Thanks again, > > Chris > > > On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt wrote: > > Hi ODK XForm enthusiasts, > > I would like to propose to tweak the way external data is added to ODK > XForms and the way dynamic choice lists are generated from external data. > > These changes will not need to break existing forms in ODK Collect as far > as I can see. They build on the valuable work done by SurveyCTO to provide > this very useful functionality. > 1. Adding External Data > > In the currently supported method, the XForm doesn’t provide the source of > the external data and it cannot be queried using XPath, the logic language > of our beloved form format. > > > We may want to store (large) external data in a database, use the CSV > format, and query the client database directly for performance reasons. > This proposal allows a client to keep doing that. It doesn’t describe the > underlying implementation, just the XForm format. > > This proposal is also nothing new. It simply adds a csv source to this > existing spec: https://bitbucket.org/javarosa/javarosa/wiki/ > externalinstances by Dimagi because this specification is elegant, > extensible and inline with existing ODK XForm functionality. > > Adding a CSV file to a form could be done by simply adding a secondary > instance element with a src attribute: > > > > The jr://file-csv/ prefix indicates that: > > - > > the location of the external resource is listed in the xformsManifest > (similar to the currently used jr://images prefix for media files) > - > > the data has the CSV format > > > It is assumed that this (virtually) creates an XML instance in the client. > E.g. jr://file-csv/fruits.csv > > > is interpreted as: > > > > > > mango > > Mango > > > > … > > > > papaya > > Papaya > > > > > > If a ‘sortby’ column is present, the items will be sorted, as is the case > currently. > > The instance can be queried in normal supported XPath, like: > > instance(‘fruits’)/item[value=’papaya’]/label => Papaya > > or with the existing pulldata() function: > > pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya > > Changes in XLSForm: > > We’d need a way to output the external source. Some options: > > - > > the type “select_one external fruits.csv” outputs the instance with > src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at > the csv extension to give the correct src prefix (because in the future we > may want to add other formats) @Mitch is this perhaps what you are hinting > at the end of this blog post > ? > - > > or, add an ‘external data’ column, e.g. in the settings sheet, that > can contain multiple sources (ie. filenames with the .csv extension in this > case). > > > Existing forms using pulldata() can remain fully functional. It’s up to > the client to deal with the missing in old non-compliant XForms > for backwards compatibility sake. The commitment with adopting this new > approach is simply to start requiring the new method for new XForms and for > new XForms to work as described in the new specification. > > The one thing you cannot do with the new method is mix static and external > choices. > > 2. Creating dynamic select choice lists from external data > > Implementing #1 will also mean that users will be able to use the existing > choice-filter (in XLSForms terminology) to create nodesets from external > data. It’s up to the client whether to convert this internally to a db > query or to handle it in pure XPath on XML. For users this will mean that > the difference between external and internal choice-lists is minimal. > Creating e.g. cascade logic will be done in exactly the same way for both. > > For historical reasons, I think we could use the current search() > function inside a predicate (choice-filter), where the first parameter > would become obsolete. > > However, I’d much prefer using the following separate (and much > simplified, more user-friendly) XPath functions instead (2 of these are > native XPath 1.0 functions): > > - > > instance(‘fruits’)/item[starts-with(value, ‘p’)] > - > > instance(‘fruits’)/item[ends-with(value, ‘a’)] > - > > instance(‘fruits’)/item[contains(value, ‘y’)] > > > In XLSForm: > > > What do you think? > > Martijn > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "ODK Developers" group. > To unsubscribe from this topic, visit https://groups.google.com/d/to > pic/opendatakit-developers/9_VZoe7crVY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > > > -- > *Did you know that Enketo Smart Paper has now become the #1 tool for data > collection? Don't fall behind. Use it!* > > Enketo | LinkedIn > | GitHub > | Twitter > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to > > ...

Very cool!

On my side, I have some funding to implement this feature (as described in
the OP) in Enketo Express this month. I consider this the first phase that
would not be performance optimized yet. The second phase would be
performance optimization, e.g. by storing the data in a browser storage
solution and convert XPath queries to query this db. For the latter I have
no funding.

There are no plans to include the additional 2 itemset creation methods
(also no interest in a PR for this as it would make the form format and
form engine needlessly complex). Also no plans to add support in Enketo
Legacy.

What feature are you referring to?
(a) The new CommCare-derived feature as proposed in the OP,
(b) the "SurveyCTO way" of creating itemsets from external csv data, or
(c) the "Nafundi way" of creating itemsets from external csv data?

Cheers,
Martijn

··· On Friday, February 6, 2015 at 8:18:21 AM UTC-7, Trung wrote: > > Hi Martijn, > We intend to develop this external data feature for the web form (enketo). > Can you guide us how and where to start, or we can collaborate in > developing as well. > > Many thanks, > Trung. > > On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote: > > Thanks Mitch, > > I understand. Hopefully that can be sorted out eventually as I am worried > about the future of the ODK XForm format due the amount of technical debt > that seems to only be increasing - it is after all the core we all revolve > around. I believe Dimagi's branch has full XPath predicate support as well > and does some kind of static analysis of XPath search queries to address > performance of itemsets (though the latter may not be in JavaRosa - not > sure). > > When it's time to add external data support to Enketo, I'll have to go for > the third solution, probably as proposed above with a different XLSForm > syntax. As mentioned, on the XForm side this is basically nothing more than > introducing the src attribute on a secondary instance. The potential of > external data is rather promising for users, especially for a web-based > application like Enketo where you could potentially pass custom data > sources to a URL. Therefore, I'll have to make sure it's completely solid > from the beginning and doesn't involve ending up with three completely > different ways to create itemsets. > > Cheers, > Martijn > > On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote: > > Short answer: NO > > This is touching the very inner workings of the XPath expression > evaluation code. That is very delicate code. Rather than modifying > Javarosa, it would likely be better to start with a fully validated, fully > tested, fully vetted XPath expression evaluator, and then add hooks to that > to get behaviors that replicate the functionality of javarosa -- i.e., > scrap the entire JR codebase. > > I say this because there are two implementations of that innermost section > of Java code, one that Dimagi uses, and an earlier branch that ODK uses. > > Last time I tried to merge up to the Dimagi branch, I had to back out the > changes because they altered the outcomes of the random forms I was testing > against. > > I don't know which codebase is correct, or perhaps both are broken? > > ---------- > The only way we could move forward is to expend a lot of effort setting up > automated tests to confirm the current behavior of the system. > > Once we have those tests in place, then you could make dramatic internal > changes with some assurance that they do not affect outcomes. > > ------- > Mitch > > > On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt wrote: > > Thanks for this Carl. I think I understand now and I don't envy you for > having to make these maneuvers to avoid touching JavaRosa. > > If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets / > external data using a clean XForm syntax instead, what would it take on the > JavaRosa side? Would an external contribution to make select a select1 > 'open', be welcomed and be sufficient? Anything else? > > Martijn > > On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote: > > Hi all, > > And the SurveyCTO-contributed "dynamic search-and-select" feature acts as > a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will > respect it. But it only loads a filtered subset of the on-desk DB into > JavaRosa for all of the performance reasons already discussed. If you end > up with a extremely long list presented to the user, it will still slow > down (and eat loads of memory), unlike the Nafundi-contributed "fast > itemset" implementation. > > Best, > > Chris > > > On Mon, Sep 15, 2014 at 11:48 AM, chartung wrote: > > Hey Martijn, > > You're close, but the external vs. regular itemsets is more like this: > > if select(1) with itemset > let javarosa figure out what to display, then display Collect's select1 > widget with data from javarosa > else if input with type==text and query attribute > don't let javarosa do anything. Display an external-Itemset widget in > Collect, handle all of the db lookups, etc... and only use javarosa to save > the selected answer to the data model in memory. > > The reason we used text instead of select(1) is that javarosa doesn't > allow saving values in a select1 node that are not included in that > select(1)'s list of values. Since we're doing all of the processing > externally, the select(1)'s list is empty, and will never save anything. > > I believe javarosa's implementation is called a "closed" select1, and > there was some talk of adding the option of "open" select1s at some point, > but I don't think anything happened beyond discussion. > > Hope that helps, > -Carl > > > On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: > > Thanks Chris, Yaw, > > I see some real opportunity here to agree on a common and solid approach > for external data. I'd like to jump in on the performance issue first. > > After the explanation, I imagine fast itemsets work something like this > (please correct me if I'm wrong): > > if select(1) with itemset > parse as *regular* itemset > if input with type=text and query attribute > parse as *fast* itemset > > If the above is more or less the case, would it be possible to change this > to: > > if select(1) with itemset > if external data from csv > parse as *fast* itemset > else > parse as *regular *itemset > > Advantages of the latter: > - XForms syntax remains clean and correct, i.e. a select and select1 > remain what they are, and so do itemsets > - No need for additional query attribute > > Cheers, > Martijn > > P.S. If any Dimagi devs see this, it would be very useful if you could > share your experience on making large itemsets faster if you have done this. > > On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote: > > Martijn, > > Chris' point about performance and cost is why external itemsets are built > the way they are. JavaRosa requires that select one options be loaded into > memory. Further, you can't save a select one option that did not come from > one of those loaded in memory. > > These constraints are why external itemsets overload text input ( > https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete > was removed from Collect (https://code.google.com/p/ope > ndatakit/issues/detail?id=289). > > Agreed that none of this is ideal. The solution is to rewrite the core, > but that will require a lot of effort without a lot of visible benefit to > the user. > > Yaw > -- > Need ODK services? http://nafundi.com provides form design, server setup, > professional support, and software development for ODK. > > On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert wrote: > > Hi Martijn, > > This looks neat. Thanks for putting in the work on it. > > As Yaw suggests, it sounds similar to Nafundi's "fast itemset" > contribution. We hadn't made use of that ourselves because our users author > in Excel or Google Drive, and because we lacked the python expertise to > make the relatively significant pyxform changes necessary to make it > accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, > but our efforts to minimize changes to pyxform are particularly extreme > given that we're a Java development team. Nathan's pyxform contribution, > though, would seem to bring fast itemsets into the pyxform fold, which is > fantastic (this is the contribution linked by Yaw in this thread). > > My primary concerns with this spec are practical in nature: > > *1. Performance. *JavaRosa's handling of XForm-based choice lists (and > filtering) performs such that forms on fast, modern devices cannot > practically exceed 1,000-2,000 choices in total. Since a great many of our > users exceed those limits, we needed something that would perform > significantly better. Obviously, Nafundi's contribution grew from the same > need, and they hewed much closer to the XForm spirit (and maybe even the > specification). Still, I wonder if meeting all of the elements of your > proposed specification (namely the ability to reference and filter via > XPath expressions) would lead to an in-memory representation that would > suffer the same practical limits as originated the need for something > faster. Perhaps some kind of SQLite-based (as opposed to memory-based) > method of storing and accessing items can be devised that meets the various > requirements you propose ("artificial instances" in the Dimagi spec's > language) -- but I'm guessing that it would be pretty hard. (I could be > wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already > meets the key requirements while still performing well with many thousands > of choice items?) > > *2. Cost.* Socially, it's important that the benefits of various > approaches exceed the costs; and practically, it's important that somebody > (or some people) be willing to bear the cost of implementation. If > Nafundi's fast-itemset implementation is already very close to meeting > these proposed requirements, then the cost of implementing your proposal > may be relatively low. That would be great. If the cost becomes high, > though, then somebody would fundraise to support the implementation? > (Please forgive my open-source ignorance here.) > > My secondary concerns regard the user experience: > > 1. It would be a bit of a shame to lose the ability to mix static and > dynamic choices. Lots of SurveyCTO users use this, and the obvious > work-around (include all static choices in the dynamic .csv file) is > awkward for a great many cases. For example, say you pull a list of people > from a .csv file and then want to include a "Don't know" or "Not listed" > option: you could obviously add those static options to the .csv, but it > would mean that you couldn't just directly dump your list of people from a > database or from another form... so the experience for the user would > become much more complicated. I dare say that the potential user-experience > pay-off might warrant devising a method to mix static and dynamic options. > > 2. It's not clear to me how to easily match or beat pulldata()'s > ease-of-use (such as it is!) in the pyxform context. Mitch has proposed > various extensions to the (perhaps over-)simplistic ${fieldname} > referencing scheme, and it seems that we'd need to work referencing of > external instances into one of those extensions. That should be possible, > but it might involve (much) greater surgery to pyxform. Somehow, it needs > to be easy for spreadsheet-based users to grab a cell from one of these > external data sources. (Or pulldata() remains the go-to method for > spreadsheet-based users... in which case it ought to be documented as part > of the spec.) > > > Thanks again, > > Chris > > > On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt wrote: > > Hi ODK XForm enthusiasts, > > I would like to propose to tweak the way external data is added to ODK > XForms and the way dynamic choice lists are generated from external data. > > These changes will not need to break existing forms in ODK Collect as far > as I can see. They build on the valuable work done by SurveyCTO to provide > this very useful functionality. > 1. Adding External Data > > In the currently supported method, the XForm doesn’t provide the source of > the external data and it cannot be queried using XPath, the logic language > of our beloved form format. > > > We may want to store (large) external data in a database, use the CSV > format, and query the client database directly for performance reasons. > This proposal allows a client to keep doing that. It doesn’t describe the > underlying implementation, just the XForm format. > > This proposal is also nothing new. It simply adds a csv source to this > existing spec: https://bitbucket.org/javarosa/javarosa/wiki/ > externalinstances by Dimagi because this specification is elegant, > extensible and inline with existing ODK XForm functionality. > > Adding a CSV file to a form could be done by simply adding a secondary > instance element with a src attribute: > > > > The jr://file-csv/ prefix indicates that: > > - > > the location of the external resource is listed in the xformsManifest > (similar to the currently used jr://images prefix for media files) > - > > the data has the CSV format > > > It is assumed that this (virtually) creates an XML instance in the client. > E.g. jr://file-csv/fruits.csv > > > is interpreted as: > > > > > > mango > > Mango > > > > … > > > > papaya > > Papaya > > > > > > If a ‘sortby’ column is present, the items will be sorted, as is the case > currently. > > The instance can be queried in normal supported XPath, like: > > instance(‘fruits’)/item[value=’papaya’]/label => Papaya > > or with the existing pulldata() function: > > pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya > > Changes in XLSForm: > > We’d need a way to output the external source. Some options: > > - > > the type “select_one external fruits.csv” outputs the instance with > src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at > the csv extension to give the correct src prefix (because in the future we > may want to add other formats) @Mitch is this perhaps what you are hinting > at the end of this blog post > ? > - > > or, add an ‘external data’ column, e.g. in the settings sheet, that > can contain multiple sources (ie. filenames with the .csv extension in this > case). > > > Existing forms using pulldata() can remain fully functional. It’s up to > the client to deal with the missing in old non-compliant XForms > for backwards compatibility sake. The commitment with adopting this new > approach is simply to start requiring the new method for new XForms and for > new XForms to work as described in the new specification. > > The one thing you cannot do with the new method is mix static and external > choices. > > 2. Creating dynamic select choice lists from external data > > Implementing #1 will also mean that users will be able to use the existing > choice-filter (in XLSForms terminology) to create nodesets from external > data. It’s up to the client whether to convert this internally to a db > query or to handle it in pure XPath on XML. For users this will mean that > the difference between external and internal choice-lists is minimal. > Creating e.g. cascade logic will be done in exactly the same way for both. > > For historical reasons, I think we could use the current search() > function inside a predicate (choice-filter), where the first parameter > would become obsolete. > > However, I’d much prefer using the following separate (and much > simplified, more user-friendly) XPath functions instead (2 of these are > native XPath 1.0 functions): > > - > > instance(‘fruits’)/item[starts-with(value, ‘p’)] > - > > instance(‘fruits’)/item[ends-with(value, ‘a’)] > - > > instance(‘fruits’)/item[contains(value, ‘y’)] > > > In XLSForm: > > > What do you think? > > Martijn > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "ODK Developers" group. > To unsubscribe from this topic, visit https://groups.google.com/d/to > pic/opendatakit-developers/9_VZoe7crVY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > > > -- > *Did you know that Enketo Smart Paper has now become the #1 tool for data > collection? Don't fall behind. Use it!* > > Enketo | LinkedIn > | GitHub > | Twitter > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" g > > ...

--

Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

Thanks Martijn,
I have made some progress, which I can describe a bit here. We ended up
rewriting the pulldata function to point the query to an external csv file
or .db file, which is dynamically created by our custom ODK Collect Build
(we call it RTA
Survey--https://play.google.com/store/apps/details?id=vn.rta.survey.android).

I'm not sure if this falls into one of the 3 ways you mentioned. But so far
it seems to solve our problem.

··· On Monday, February 9, 2015 at 10:06:37 PM UTC+7, Martijn van de Rijdt wrote: > > Very cool! > > On my side, I have some funding to implement this feature (as described in > the OP) in Enketo Express this month. I consider this the first phase that > would not be performance optimized yet. The second phase would be > performance optimization, e.g. by storing the data in a browser storage > solution and convert XPath queries to query this db. For the latter I have > no funding. > > There are no plans to include the additional 2 itemset creation methods > (also no interest in a PR for this as it would make the form format and > form engine needlessly complex). Also no plans to add support in Enketo > Legacy. > > What feature are you referring to? > (a) The new CommCare-derived feature as proposed in the OP, > (b) the "SurveyCTO way" of creating itemsets from external csv data, or > (c) the "Nafundi way" of creating itemsets from external csv data? > > Cheers, > Martijn > > > On Friday, February 6, 2015 at 8:18:21 AM UTC-7, Trung wrote: > > Hi Martijn, > We intend to develop this external data feature for the web form (enketo). > Can you guide us how and where to start, or we can collaborate in > developing as well. > > Many thanks, > Trung. > > On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote: > > Thanks Mitch, > > I understand. Hopefully that can be sorted out eventually as I am worried > about the future of the ODK XForm format due the amount of technical debt > that seems to only be increasing - it is after all the core we all revolve > around. I believe Dimagi's branch has full XPath predicate support as well > and does some kind of static analysis of XPath search queries to address > performance of itemsets (though the latter may not be in JavaRosa - not > sure). > > When it's time to add external data support to Enketo, I'll have to go for > the third solution, probably as proposed above with a different XLSForm > syntax. As mentioned, on the XForm side this is basically nothing more than > introducing the src attribute on a secondary instance. The potential of > external data is rather promising for users, especially for a web-based > application like Enketo where you could potentially pass custom data > sources to a URL. Therefore, I'll have to make sure it's completely solid > from the beginning and doesn't involve ending up with three completely > different ways to create itemsets. > > Cheers, > Martijn > > On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote: > > Short answer: NO > > This is touching the very inner workings of the XPath expression > evaluation code. That is very delicate code. Rather than modifying > Javarosa, it would likely be better to start with a fully validated, fully > tested, fully vetted XPath expression evaluator, and then add hooks to that > to get behaviors that replicate the functionality of javarosa -- i.e., > scrap the entire JR codebase. > > I say this because there are two implementations of that innermost section > of Java code, one that Dimagi uses, and an earlier branch that ODK uses. > > Last time I tried to merge up to the Dimagi branch, I had to back out the > changes because they altered the outcomes of the random forms I was testing > against. > > I don't know which codebase is correct, or perhaps both are broken? > > ---------- > The only way we could move forward is to expend a lot of effort setting up > automated tests to confirm the current behavior of the system. > > Once we have those tests in place, then you could make dramatic internal > changes with some assurance that they do not affect outcomes. > > ------- > Mitch > > > On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt wrote: > > Thanks for this Carl. I think I understand now and I don't envy you for > having to make these maneuvers to avoid touching JavaRosa. > > If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets / > external data using a clean XForm syntax instead, what would it take on the > JavaRosa side? Would an external contribution to make select a select1 > 'open', be welcomed and be sufficient? Anything else? > > Martijn > > On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote: > > Hi all, > > And the SurveyCTO-contributed "dynamic search-and-select" feature acts as > a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will > respect it. But it only loads a filtered subset of the on-desk DB into > JavaRosa for all of the performance reasons already discussed. If you end > up with a extremely long list presented to the user, it will still slow > down (and eat loads of memory), unlike the Nafundi-contributed "fast > itemset" implementation. > > Best, > > Chris > > > On Mon, Sep 15, 2014 at 11:48 AM, chartung wrote: > > Hey Martijn, > > You're close, but the external vs. regular itemsets is more like this: > > if select(1) with itemset > let javarosa figure out what to display, then display Collect's select1 > widget with data from javarosa > else if input with type==text and query attribute > don't let javarosa do anything. Display an external-Itemset widget in > Collect, handle all of the db lookups, etc... and only use javarosa to save > the selected answer to the data model in memory. > > The reason we used text instead of select(1) is that javarosa doesn't > allow saving values in a select1 node that are not included in that > select(1)'s list of values. Since we're doing all of the processing > externally, the select(1)'s list is empty, and will never save anything. > > I believe javarosa's implementation is called a "closed" select1, and > there was some talk of adding the option of "open" select1s at some point, > but I don't think anything happened beyond discussion. > > Hope that helps, > -Carl > > > On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: > > Thanks Chris, Yaw, > > I see some real opportunity here to agree on a common and solid approach > for external data. I'd like to jump in on the performance issue first. > > After the explanation, I imagine fast itemsets work something like this > (please correct me if I'm wrong): > > if select(1) with itemset > parse as *regular* itemset > if input with type=text and query attribute > parse as *fast* itemset > > If the above is more or less the case, would it be possible to change this > to: > > if select(1) with itemset > if external data from csv > parse as *fast* itemset > else > parse as *regular *itemset > > Advantages of the latter: > - XForms syntax remains clean and correct, i.e. a select and select1 > remain what they are, and so do itemsets > - No need for additional query attribute > > Cheers, > Martijn > > P.S. If any Dimagi devs see this, it would be very useful if you could > share your experience on making large itemsets faster if you have done this. > > On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote: > > Martijn, > > Chris' point about performance and cost is why external itemsets are built > the way they are. JavaRosa requires that select one options be loaded into > memory. Further, you can't save a select one option that did not come from > one of those loaded in memory. > > These constraints are why external itemsets overload text input ( > https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete > was removed from Collect (https://code.google.com/p/ope > ndatakit/issues/detail?id=289). > > Agreed that none of this is ideal. The solution is to rewrite the core, > but that will require a lot of effort without a lot of visible benefit to > the user. > > Yaw > -- > Need ODK services? http://nafundi.com provides form design, server setup, > professional support, and software development for ODK. > > On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert wrote: > > Hi Martijn, > > This looks neat. Thanks for putting in the work on it. > > As Yaw suggests, it sounds similar to Nafundi's "fast itemset" > contribution. We hadn't made use of that ourselves because our users author > in Excel or Google Drive, and because we lacked the python expertise to > make the relatively significant pyxform changes necessary to make it > accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, > but our efforts to minimize changes to pyxform are particularly extreme > given that we're a Java development team. Nathan's pyxform contribution, > though, would seem to bring fast itemsets into the pyxform fold, which is > fantastic (this is the contribution linked by Yaw in this thread). > > My primary concerns with this spec are practical in nature: > > *1. Performance. *JavaRosa's handling of XForm-based choice lists (and > filtering) performs such that forms on fast, modern devices cannot > practically exceed 1,000-2,000 choices in total. Since a great many of our > users exceed those limits, we needed something that would perform > significantly better. Obviously, Nafundi's contribution grew from the same > need, and they hewed much closer to the XForm spirit (and maybe even the > specification). Still, I wonder if meeting all of the elements of your > proposed specification (namely the ability to reference and filter via > XPath expressions) would lead to an in-memory representation that would > suffer the same practical limits as originated the need for something > faster. Perhaps some kind of SQLite-based (as opposed to memory-based) > method of storing and accessing items can be devised that meets the various > requirements you propose ("artificial instances" in the Dimagi spec's > language) -- but I'm guessing that it would be pretty hard. (I could be > wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already > meets the key requirements while still performing well with many thousands > of choice items?) > > *2. Cost.* Socially, it's important that the benefits of various > approaches exceed the costs; and practically, it's important that somebody > (or some people) be willing to bear the cost of implementation. If > Nafundi's fast-itemset implementation is already very close to meeting > these proposed requirements, then the cost of implementing your proposal > may be relatively low. That would be great. If the cost becomes high, > though, then somebody would fundraise to support the implementation? > (Please forgive my open-source ignorance here.) > > My secondary concerns regard the user experience: > > 1. It would be a bit of a shame to lose the ability to mix static and > dynamic choices. Lots of SurveyCTO users use this, and the obvious > work-around (include all static choices in the dynamic .csv file) is > awkward for a great many cases. For example, say you pull a list of people > from a .csv file and then want to include a "Don't know" or "Not listed" > option: you could obviously add those static options to the .csv, but it > would mean that you couldn't just directly dump your list of people from a > database or from another form... so the experience for the user would > become much more complicated. I dare say that the potential user-experience > pay-off might warrant devising a method to mix static and dynamic options. > > 2. It's not clear to me how to easily match or beat pulldata()'s > ease-of-use (such as it is!) in the pyxform context. Mitch has proposed > various extensions to the (perhaps over-)simplistic ${fieldname} > referencing scheme, and it seems that we'd need to work referencing of > external instances into one of those extensions. That should be possible, > but it might involve (much) greater surgery to pyxform. Somehow, it needs > to be easy for spreadsheet-based users to grab a cell from one of these > external data sources. (Or pulldata() remains the go-to method for > spreadsheet-based users... in which case it ought to be documented as part > of the spec.) > > > Thanks again, > > Chris > > > On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt wrote: > > Hi ODK XForm enthusiasts, > > I would like to propose to tweak the way external data is added to ODK > XForms and the way dynamic choice lists are generated from external data. > > These changes will not need to break existing forms in ODK Collect as far > as I can see. They build on the valuable work done by SurveyCTO to provide > this very useful functionality. > 1. Adding External Data > > In the currently supported method, the XForm doesn’t provide the source of > the external data and it cannot be queried using XPath, the logic language > of our beloved form format. > > > We may want to store (large) external data in a database, use the CSV > format, and query the client database directly for performance reasons. > This proposal allows a client to keep doing that. It doesn’t describe the > underlying implementation, just the XForm format. > > This proposal is also nothing new. It simply adds a csv source to this > existing spec: https://bitbucket.org/javarosa/javarosa/wiki/ > externalinstances by Dimagi because this specification is elegant, > extensible and inline with existing ODK XForm functionality. > > Adding a CSV file to a form could be done by simply adding a secondary > instance element with a src attribute: > > > > The jr://file-csv/ prefix indicates that: > > - > > the location of the external resource is listed in the xformsManifest > (similar to the currently used jr://images prefix for media files) > - > > the data has the CSV format > > > It is assumed that this (virtually) creates an XML instance in the client. > E.g. jr://file-csv/fruits.csv > > > is interpreted as: > > > > > > mango > > Mango > > > > … > > > > papaya > > Papaya > > > > > > If a ‘sortby’ column is present, the items will be sorted, as is the case > currently. > > The instance can be queried in normal supported XPath, like: > > instance(‘fruits’)/item[value=’papaya’]/label => Papaya > > or with the existing pulldata() function: > > pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya > > Changes in XLSForm: > > We’d need a way to output the external source. Some options: > > - > > the type “select_one external fruits.csv” outputs the instance with > src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at > the csv extension to give the correct src prefix (because in the future we > may want to add other formats) @Mitch is this perhaps what you are hinting > at the end of this blog post > ? > - > > or, add an ‘external data’ column, e.g. in the settings sheet, that > can contain multiple sources (ie. filenames with the .csv extension in this > case). > > > Existing forms using pulldata() can remain fully functional. It’s up to > the client to deal with the missing in old non-compliant XForms > for backwards compatibility sake. The commitment with adopting this new > approach is simply to start requiring the new method for new XForms and for > new XForms to work as described in the new specification. > > The one thing you cannot do with the new method is mix static and external > choices. > > 2. Creating dynamic select choice lists from external data > > Implementing #1 will also mean that users will be able to use the existing > choice-filter (in XLSForms terminology) to create nodesets from external > data. It’s up to the client whether to convert this internally to a db > query or to handle it in pure XPath on XML. For users this will mean that > the difference between external and internal choice-lists is minimal. > Creating e.g. cascade logic will be done in exactly the same way for both. > > For historical reasons, I think we could use the current search() > function inside a predicate (choice-filter), where the first parameter > would become obsolete. > > However, I’d much prefer using the following separate (and much > simplified, more user-friendly) XPath functions instead (2 of these are > native XPath 1.0 functions): > > - > > instance(‘fruits’)/item[starts-with(value, ‘p’)] > - > > instance(‘fruits’)/item[ends-with(value, ‘a’)] > - > > instance(‘fruits’)/item[contains(value, ‘y’)] > > > In XLSForm: > > > What do you think? > > Martijn > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "ODK Developers" group. > To unsubscribe from this topic, visit https://groups.google.com/d/to > pic/opendatakit-developers/9_VZoe7crVY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > > > -- > *Did you know that Enketo Smart Paper has now become the #1 tool for data > collection? Don't fall behind. Use it!* > > Enketo | LinkedIn > | GitHub > | > > ...

Hi Trung,

Sorry, I thought you were referring to Enketo. There is no shared codebase
between Enketo and ODK Collect.

··· On Tue, Feb 10, 2015 at 8:29 AM, Trung wrote:

Thanks Martijn,
I have made some progress, which I can describe a bit here. We ended up
rewriting the pulldata function to point the query to an external csv file
or .db file, which is dynamically created by our custom ODK Collect Build
(we call it RTA Survey--
https://play.google.com/store/apps/details?id=vn.rta.survey.android).

I'm not sure if this falls into one of the 3 ways you mentioned. But so
far it seems to solve our problem.

On Monday, February 9, 2015 at 10:06:37 PM UTC+7, Martijn van de Rijdt wrote:

Very cool!

On my side, I have some funding to implement this feature (as described
in the OP) in Enketo Express this month. I consider this the first phase
that would not be performance optimized yet. The second phase would be
performance optimization, e.g. by storing the data in a browser storage
solution and convert XPath queries to query this db. For the latter I have
no funding.

There are no plans to include the additional 2 itemset creation methods
(also no interest in a PR for this as it would make the form format and
form engine needlessly complex). Also no plans to add support in Enketo
Legacy.

What feature are you referring to?
(a) The new CommCare-derived feature as proposed in the OP,
(b) the "SurveyCTO way" of creating itemsets from external csv data, or
(c) the "Nafundi way" of creating itemsets from external csv data?

Cheers,
Martijn

On Friday, February 6, 2015 at 8:18:21 AM UTC-7, Trung wrote:

Hi Martijn,
We intend to develop this external data feature for the web form
(enketo). Can you guide us how and where to start, or we can collaborate in
developing as well.

Many thanks,
Trung.

On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote:

Thanks Mitch,

I understand. Hopefully that can be sorted out eventually as I am worried
about the future of the ODK XForm format due the amount of technical debt
that seems to only be increasing - it is after all the core we all revolve
around. I believe Dimagi's branch has full XPath predicate support as well
and does some kind of static analysis of XPath search queries to address
performance of itemsets (though the latter may not be in JavaRosa - not
sure).

When it's time to add external data support to Enketo, I'll have to go
for the third solution, probably as proposed above with a different XLSForm
syntax. As mentioned, on the XForm side this is basically nothing more than
introducing the src attribute on a secondary instance. The potential of
external data is rather promising for users, especially for a web-based
application like Enketo where you could potentially pass custom data
sources to a URL. Therefore, I'll have to make sure it's completely solid
from the beginning and doesn't involve ending up with three completely
different ways to create itemsets.

Cheers,
Martijn

On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote:

Short answer: NO

This is touching the very inner workings of the XPath expression
evaluation code. That is very delicate code. Rather than modifying
Javarosa, it would likely be better to start with a fully validated, fully
tested, fully vetted XPath expression evaluator, and then add hooks to that
to get behaviors that replicate the functionality of javarosa -- i.e.,
scrap the entire JR codebase.

I say this because there are two implementations of that innermost
section of Java code, one that Dimagi uses, and an earlier branch that ODK
uses.

Last time I tried to merge up to the Dimagi branch, I had to back out the
changes because they altered the outcomes of the random forms I was testing
against.

I don't know which codebase is correct, or perhaps both are broken?


The only way we could move forward is to expend a lot of effort setting
up automated tests to confirm the current behavior of the system.

Once we have those tests in place, then you could make dramatic internal
changes with some assurance that they do not affect outcomes.


Mitch

On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt mar...@enketo.org wrote:

Thanks for this Carl. I think I understand now and I don't envy you for
having to make these maneuvers to avoid touching JavaRosa.

If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets /
external data using a clean XForm syntax instead, what would it take on the
JavaRosa side? Would an external contribution to make select a select1
'open', be welcomed and be sufficient? Anything else?

Martijn

On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote:

Hi all,

And the SurveyCTO-contributed "dynamic search-and-select" feature acts as
a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will
respect it. But it only loads a filtered subset of the on-desk DB into
JavaRosa for all of the performance reasons already discussed. If you end
up with a extremely long list presented to the user, it will still slow
down (and eat loads of memory), unlike the Nafundi-contributed "fast
itemset" implementation.

Best,

Chris

On Mon, Sep 15, 2014 at 11:48 AM, chartung char...@nafundi.com wrote:

Hey Martijn,

You're close, but the external vs. regular itemsets is more like this:

if select(1) with itemset
let javarosa figure out what to display, then display Collect's select1
widget with data from javarosa
else if input with type==text and query attribute
don't let javarosa do anything. Display an external-Itemset widget in
Collect, handle all of the db lookups, etc... and only use javarosa to save
the selected answer to the data model in memory.

The reason we used text instead of select(1) is that javarosa doesn't
allow saving values in a select1 node that are not included in that
select(1)'s list of values. Since we're doing all of the processing
externally, the select(1)'s list is empty, and will never save anything.

I believe javarosa's implementation is called a "closed" select1, and
there was some talk of adding the option of "open" select1s at some point,
but I don't think anything happened beyond discussion.

Hope that helps,
-Carl

On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote:

Thanks Chris, Yaw,

I see some real opportunity here to agree on a common and solid approach
for external data. I'd like to jump in on the performance issue first.

After the explanation, I imagine fast itemsets work something like this
(please correct me if I'm wrong):

if select(1) with itemset
parse as regular itemset
if input with type=text and query attribute
parse as fast itemset

If the above is more or less the case, would it be possible to change
this to:

if select(1) with itemset
if external data from csv
parse as fast itemset
else
parse as *regular *itemset

Advantages of the latter:

  • XForms syntax remains clean and correct, i.e. a select and select1
    remain what they are, and so do itemsets
  • No need for additional query attribute

Cheers,
Martijn

P.S. If any Dimagi devs see this, it would be very useful if you could
share your experience on making large itemsets faster if you have done this.

On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa yan...@nafundi.com wrote:

Martijn,

Chris' point about performance and cost is why external itemsets are
built the way they are. JavaRosa requires that select one options be loaded
into memory. Further, you can't save a select one option that did not come
from one of those loaded in memory.

These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete
was removed from Collect (https://code.google.com/p/ope
ndatakit/issues/detail?id=289).

Agreed that none of this is ideal. The solution is to rewrite the core,
but that will require a lot of effort without a lot of visible benefit to
the user.

Yaw

Need ODK services? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert <cro...@surveycto.com wrote:

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt mar...@enketo.org wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as far
as I can see. They build on the valuable work done by SurveyCTO to provide
this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source
of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec: https://bitbucket.org/javarosa/javarosa/wiki/
externalinstances by Dimagi because this specification is elegant,
extensible and inline with existing ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the xformsManifest
(similar to the currently used jr://images prefix for media files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the case
currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet, that
can contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will
mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the same
way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/opendatakit-developers/9_VZoe7crVY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Did you know that Enketo Smart Paper has now become the #1 tool for data
collection? Don't fall behind. Use it!

Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR |

...

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

Yes,
I'm actually referring to Enketo, although for us we need to make it work
for both the web and mobile app.

Trung

··· On Feb 10, 2015 11:36 PM, "Martijn van de Rijdt" wrote:

Hi Trung,

Sorry, I thought you were referring to Enketo. There is no shared codebase
between Enketo and ODK Collect.

On Tue, Feb 10, 2015 at 8:29 AM, Trung trungdangle@gmail.com wrote:

Thanks Martijn,
I have made some progress, which I can describe a bit here. We ended up
rewriting the pulldata function to point the query to an external csv file
or .db file, which is dynamically created by our custom ODK Collect Build
(we call it RTA Survey--
https://play.google.com/store/apps/details?id=vn.rta.survey.android).

I'm not sure if this falls into one of the 3 ways you mentioned. But so
far it seems to solve our problem.

On Monday, February 9, 2015 at 10:06:37 PM UTC+7, Martijn van de Rijdt wrote:

Very cool!

On my side, I have some funding to implement this feature (as described
in the OP) in Enketo Express this month. I consider this the first phase
that would not be performance optimized yet. The second phase would be
performance optimization, e.g. by storing the data in a browser storage
solution and convert XPath queries to query this db. For the latter I have
no funding.

There are no plans to include the additional 2 itemset creation methods
(also no interest in a PR for this as it would make the form format and
form engine needlessly complex). Also no plans to add support in Enketo
Legacy.

What feature are you referring to?
(a) The new CommCare-derived feature as proposed in the OP,
(b) the "SurveyCTO way" of creating itemsets from external csv data, or
(c) the "Nafundi way" of creating itemsets from external csv data?

Cheers,
Martijn

On Friday, February 6, 2015 at 8:18:21 AM UTC-7, Trung wrote:

Hi Martijn,
We intend to develop this external data feature for the web form
(enketo). Can you guide us how and where to start, or we can collaborate in
developing as well.

Many thanks,
Trung.

On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote:

Thanks Mitch,

I understand. Hopefully that can be sorted out eventually as I am
worried about the future of the ODK XForm format due the amount of
technical debt that seems to only be increasing - it is after all the core
we all revolve around. I believe Dimagi's branch has full XPath predicate
support as well and does some kind of static analysis of XPath search
queries to address performance of itemsets (though the latter may not be in
JavaRosa - not sure).

When it's time to add external data support to Enketo, I'll have to go
for the third solution, probably as proposed above with a different XLSForm
syntax. As mentioned, on the XForm side this is basically nothing more than
introducing the src attribute on a secondary instance. The potential of
external data is rather promising for users, especially for a web-based
application like Enketo where you could potentially pass custom data
sources to a URL. Therefore, I'll have to make sure it's completely solid
from the beginning and doesn't involve ending up with three completely
different ways to create itemsets.

Cheers,
Martijn

On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote:

Short answer: NO

This is touching the very inner workings of the XPath expression
evaluation code. That is very delicate code. Rather than modifying
Javarosa, it would likely be better to start with a fully validated, fully
tested, fully vetted XPath expression evaluator, and then add hooks to that
to get behaviors that replicate the functionality of javarosa -- i.e.,
scrap the entire JR codebase.

I say this because there are two implementations of that innermost
section of Java code, one that Dimagi uses, and an earlier branch that ODK
uses.

Last time I tried to merge up to the Dimagi branch, I had to back out
the changes because they altered the outcomes of the random forms I was
testing against.

I don't know which codebase is correct, or perhaps both are broken?


The only way we could move forward is to expend a lot of effort setting
up automated tests to confirm the current behavior of the system.

Once we have those tests in place, then you could make dramatic internal
changes with some assurance that they do not affect outcomes.


Mitch

On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt <mar...@enketo.org wrote:

Thanks for this Carl. I think I understand now and I don't envy you for
having to make these maneuvers to avoid touching JavaRosa.

If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets /
external data using a clean XForm syntax instead, what would it take on the
JavaRosa side? Would an external contribution to make select a select1
'open', be welcomed and be sufficient? Anything else?

Martijn

On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote:

Hi all,

And the SurveyCTO-contributed "dynamic search-and-select" feature acts
as a kind of hybrid. It loads a dynamic list into memory so that JavaRosa
will respect it. But it only loads a filtered subset of the on-desk DB into
JavaRosa for all of the performance reasons already discussed. If you end
up with a extremely long list presented to the user, it will still slow
down (and eat loads of memory), unlike the Nafundi-contributed "fast
itemset" implementation.

Best,

Chris

On Mon, Sep 15, 2014 at 11:48 AM, chartung char...@nafundi.com wrote:

Hey Martijn,

You're close, but the external vs. regular itemsets is more like this:

if select(1) with itemset
let javarosa figure out what to display, then display Collect's
select1 widget with data from javarosa
else if input with type==text and query attribute
don't let javarosa do anything. Display an external-Itemset widget in
Collect, handle all of the db lookups, etc... and only use javarosa to save
the selected answer to the data model in memory.

The reason we used text instead of select(1) is that javarosa doesn't
allow saving values in a select1 node that are not included in that
select(1)'s list of values. Since we're doing all of the processing
externally, the select(1)'s list is empty, and will never save anything.

I believe javarosa's implementation is called a "closed" select1, and
there was some talk of adding the option of "open" select1s at some point,
but I don't think anything happened beyond discussion.

Hope that helps,
-Carl

On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote:

Thanks Chris, Yaw,

I see some real opportunity here to agree on a common and solid approach
for external data. I'd like to jump in on the performance issue first.

After the explanation, I imagine fast itemsets work something like this
(please correct me if I'm wrong):

if select(1) with itemset
parse as regular itemset
if input with type=text and query attribute
parse as fast itemset

If the above is more or less the case, would it be possible to change
this to:

if select(1) with itemset
if external data from csv
parse as fast itemset
else
parse as *regular *itemset

Advantages of the latter:

  • XForms syntax remains clean and correct, i.e. a select and select1
    remain what they are, and so do itemsets
  • No need for additional query attribute

Cheers,
Martijn

P.S. If any Dimagi devs see this, it would be very useful if you could
share your experience on making large itemsets faster if you have done this.

On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa yan...@nafundi.com wrote:

Martijn,

Chris' point about performance and cost is why external itemsets are
built the way they are. JavaRosa requires that select one options be loaded
into memory. Further, you can't save a select one option that did not come
from one of those loaded in memory.

These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete
was removed from Collect (https://code.google.com/p/ope
ndatakit/issues/detail?id=289).

Agreed that none of this is ideal. The solution is to rewrite the core,
but that will require a lot of effort without a lot of visible benefit to
the user.

Yaw

Need ODK services? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < cro...@surveycto.com> wrote:

Hi Martijn,

This looks neat. Thanks for putting in the work on it.

As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).

My primary concerns with this spec are practical in nature:

*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)

2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)

My secondary concerns regard the user experience:

  1. It would be a bit of a shame to lose the ability to mix static and
    dynamic choices. Lots of SurveyCTO users use this, and the obvious
    work-around (include all static choices in the dynamic .csv file) is
    awkward for a great many cases. For example, say you pull a list of people
    from a .csv file and then want to include a "Don't know" or "Not listed"
    option: you could obviously add those static options to the .csv, but it
    would mean that you couldn't just directly dump your list of people from a
    database or from another form... so the experience for the user would
    become much more complicated. I dare say that the potential user-experience
    pay-off might warrant devising a method to mix static and dynamic options.

  2. It's not clear to me how to easily match or beat pulldata()'s
    ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
    various extensions to the (perhaps over-)simplistic ${fieldname}
    referencing scheme, and it seems that we'd need to work referencing of
    external instances into one of those extensions. That should be possible,
    but it might involve (much) greater surgery to pyxform. Somehow, it needs
    to be easy for spreadsheet-based users to grab a cell from one of these
    external data sources. (Or pulldata() remains the go-to method for
    spreadsheet-based users... in which case it ought to be documented as part
    of the spec.)

Thanks again,

Chris

On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt <mar...@enketo.org wrote:

Hi ODK XForm enthusiasts,

I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.

These changes will not need to break existing forms in ODK Collect as
far as I can see. They build on the valuable work done by SurveyCTO to
provide this very useful functionality.

  1. Adding External Data

In the currently supported method, the XForm doesn’t provide the source
of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.

We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.

This proposal is also nothing new. It simply adds a csv source to this
existing spec: https://bitbucket.org/javarosa/javarosa/wiki/
externalinstances by Dimagi because this specification is elegant,
extensible and inline with existing ODK XForm functionality.

Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:

The jr://file-csv/ prefix indicates that:

the location of the external resource is listed in the
xformsManifest (similar to the currently used jr://images prefix for media
files)

the data has the CSV format

It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv

is interpreted as:

<value>mango</value>

<label>Mango</label>

<value>papaya</value>

<label>Papaya</label>

If a ‘sortby’ column is present, the items will be sorted, as is the
case currently.

The instance can be queried in normal supported XPath, like:

instance(‘fruits’)/item[value=’papaya’]/label => Papaya

or with the existing pulldata() function:

pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya

Changes in XLSForm:

We’d need a way to output the external source. Some options:

the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?

or, add an ‘external data’ column, e.g. in the settings sheet, that
can contain multiple sources (ie. filenames with the .csv extension in this
case).

Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.

The one thing you cannot do with the new method is mix static and
external choices.

  1. Creating dynamic select choice lists from external data

Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will
mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the same
way for both.

For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.

However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):

instance(‘fruits’)/item[starts-with(value, ‘p’)]

instance(‘fruits’)/item[ends-with(value, ‘a’)]

instance(‘fruits’)/item[contains(value, ‘y’)]

In XLSForm:

What do you think?

Martijn

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/opendatakit-developers/9_VZoe7crVY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Did you know that Enketo Smart Paper has now become the #1 tool for
data collection? Don't fall behind. Use it!

Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR |

...

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cool, so you've introduced to your version of ODK Collect, external data
that is added to an XForm like this:

Where itemsets are built the same way as they are built when the data is
'internal' in a secondary instance? That would be wonderful news.

Cheers,
Martijn

··· On Tuesday, February 10, 2015 at 9:39:35 AM UTC-7, Trung wrote: > > Yes, > I'm actually referring to Enketo, although for us we need to make it work > for both the web and mobile app. > > Trung > On Feb 10, 2015 11:36 PM, "Martijn van de Rijdt" <mar...@enketo.org > wrote: > > Hi Trung, > > Sorry, I thought you were referring to Enketo. There is no shared codebase > between Enketo and ODK Collect. > > On Tue, Feb 10, 2015 at 8:29 AM, Trung <trung...@gmail.com > wrote: > > Thanks Martijn, > I have made some progress, which I can describe a bit here. We ended up > rewriting the pulldata function to point the query to an external csv file > or .db file, which is dynamically created by our custom ODK Collect Build > (we call it RTA Survey-- > https://play.google.com/store/apps/details?id=vn.rta.survey.android). > > I'm not sure if this falls into one of the 3 ways you mentioned. But so > far it seems to solve our problem. > > > On Monday, February 9, 2015 at 10:06:37 PM UTC+7, Martijn van de Rijdt wrote: > > Very cool! > > On my side, I have some funding to implement this feature (as described in > the OP) in Enketo Express this month. I consider this the first phase that > would not be performance optimized yet. The second phase would be > performance optimization, e.g. by storing the data in a browser storage > solution and convert XPath queries to query this db. For the latter I have > no funding. > > There are no plans to include the additional 2 itemset creation methods > (also no interest in a PR for this as it would make the form format and > form engine needlessly complex). Also no plans to add support in Enketo > Legacy. > > What feature are you referring to? > (a) The new CommCare-derived feature as proposed in the OP, > (b) the "SurveyCTO way" of creating itemsets from external csv data, or > (c) the "Nafundi way" of creating itemsets from external csv data? > > Cheers, > Martijn > > > On Friday, February 6, 2015 at 8:18:21 AM UTC-7, Trung wrote: > > Hi Martijn, > We intend to develop this external data feature for the web form (enketo). > Can you guide us how and where to start, or we can collaborate in > developing as well. > > Many thanks, > Trung. > > On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote: > > Thanks Mitch, > > I understand. Hopefully that can be sorted out eventually as I am worried > about the future of the ODK XForm format due the amount of technical debt > that seems to only be increasing - it is after all the core we all revolve > around. I believe Dimagi's branch has full XPath predicate support as well > and does some kind of static analysis of XPath search queries to address > performance of itemsets (though the latter may not be in JavaRosa - not > sure). > > When it's time to add external data support to Enketo, I'll have to go for > the third solution, probably as proposed above with a different XLSForm > syntax. As mentioned, on the XForm side this is basically nothing more than > introducing the src attribute on a secondary instance. The potential of > external data is rather promising for users, especially for a web-based > application like Enketo where you could potentially pass custom data > sources to a URL. Therefore, I'll have to make sure it's completely solid > from the beginning and doesn't involve ending up with three completely > different ways to create itemsets. > > Cheers, > Martijn > > On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote: > > Short answer: NO > > This is touching the very inner workings of the XPath expression > evaluation code. That is very delicate code. Rather than modifying > Javarosa, it would likely be better to start with a fully validated, fully > tested, fully vetted XPath expression evaluator, and then add hooks to that > to get behaviors that replicate the functionality of javarosa -- i.e., > scrap the entire JR codebase. > > I say this because there are two implementations of that innermost section > of Java code, one that Dimagi uses, and an earlier branch that ODK uses. > > Last time I tried to merge up to the Dimagi branch, I had to back out the > changes because they altered the outcomes of the random forms I was testing > against. > > I don't know which codebase is correct, or perhaps both are broken? > > ---------- > The only way we could move forward is to expend a lot of effort setting up > automated tests to confirm the current behavior of the system. > > Once we have those tests in place, then you could make dramatic internal > changes with some assurance that they do not affect outcomes. > > ------- > Mitch > > > On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt wrote: > > Thanks for this Carl. I think I understand now and I don't envy you for > having to make these maneuvers to avoid touching JavaRosa. > > If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets / > external data using a clean XForm syntax instead, what would it take on the > JavaRosa side? Would an external contribution to make select a select1 > 'open', be welcomed and be sufficient? Anything else? > > Martijn > > On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote: > > Hi all, > > And the SurveyCTO-contributed "dynamic search-and-select" feature acts as > a kind of hybrid. It loads a dynamic list into memory so that JavaRosa will > respect it. But it only loads a filtered subset of the on-desk DB into > JavaRosa for all of the performance reasons already discussed. If you end > up with a extremely long list presented to the user, it will still slow > down (and eat loads of memory), unlike the Nafundi-contributed "fast > itemset" implementation. > > Best, > > Chris > > > On Mon, Sep 15, 2014 at 11:48 AM, chartung wrote: > > Hey Martijn, > > You're close, but the external vs. regular itemsets is more like this: > > if select(1) with itemset > let javarosa figure out what to display, then display Collect's select1 > widget with data from javarosa > else if input with type==text and query attribute > don't let javarosa do anything. Display an external-Itemset widget in > Collect, handle all of the db lookups, etc... and only use javarosa to save > the selected answer to the data model in memory. > > The reason we used text instead of select(1) is that javarosa doesn't > allow saving values in a select1 node that are not included in that > select(1)'s list of values. Since we're doing all of the processing > externally, the select(1)'s list is empty, and will never save anything. > > I believe javarosa's implementation is called a "closed" select1, and > there was some talk of adding the option of "open" select1s at some point, > but I don't think anything happened beyond discussion. > > Hope that helps, > -Carl > > > On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote: > > Thanks Chris, Yaw, > > I see some real opportunity here to agree on a common and solid approach > for external data. I'd like to jump in on the performance issue first. > > After the explanation, I imagine fast itemsets work something like this > (please correct me if I'm wrong): > > if select(1) with itemset > parse as *regular* itemset > if input with type=text and query attribute > parse as *fast* itemset > > If the above is more or less the case, would it be possible to change this > to: > > if select(1) with itemset > if external data from csv > parse as *fast* itemset > else > parse as *regular *itemset > > Advantages of the latter: > - XForms syntax remains clean and correct, i.e. a select and select1 > remain what they are, and so do itemsets > - No need for additional query attribute > > Cheers, > Martijn > > P.S. If any Dimagi devs see this, it would be very useful if you could > share your experience on making large itemsets faster if you have done this. > > On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa wrote: > > Martijn, > > Chris' point about performance and cost is why external itemsets are built > the way they are. JavaRosa requires that select one options be loaded into > memory. Further, you can't save a select one option that did not come from > one of those loaded in memory. > > These constraints are why external itemsets overload text input ( > https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete > was removed from Collect (https://code.google.com/p/ope > ndatakit/issues/detail?id=289). > > Agreed that none of this is ideal. The solution is to rewrite the core, > but that will require a lot of effort without a lot of visible benefit to > the user. > > Yaw > -- > Need ODK services? http://nafundi.com provides form design, server setup, > professional support, and software development for ODK. > > On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert wrote: > > Hi Martijn, > > This looks neat. Thanks for putting in the work on it. > > As Yaw suggests, it sounds similar to Nafundi's "fast itemset" > contribution. We hadn't made use of that ourselves because our users author > in Excel or Google Drive, and because we lacked the python expertise to > make the relatively significant pyxform changes necessary to make it > accessible. We try to minimize changes in Collect, JavaRosa, and pyxform, > but our efforts to minimize changes to pyxform are particularly extreme > given that we're a Java development team. Nathan's pyxform contribution, > though, would seem to bring fast itemsets into the pyxform fold, which is > fantastic (this is the contribution linked by Yaw in this thread). > > My primary concerns with this spec are practical in nature: > > *1. Performance. *JavaRosa's handling of XForm-based choice lists (and > filtering) performs such that forms on fast, modern devices cannot > practically exceed 1,000-2,000 choices in total. Since a great many of our > users exceed those limits, we needed something that would perform > significantly better. Obviously, Nafundi's contribution grew from the same > need, and they hewed much closer to the XForm spirit (and maybe even the > specification). Still, I wonder if meeting all of the elements of your > proposed specification (namely the ability to reference and filter via > XPath expressions) would lead to an in-memory representation that would > suffer the same practical limits as originated the need for something > faster. Perhaps some kind of SQLite-based (as opposed to memory-based) > method of storing and accessing items can be devised that meets the various > requirements you propose ("artificial instances" in the Dimagi spec's > language) -- but I'm guessing that it would be pretty hard. (I could be > wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already > meets the key requirements while still performing well with many thousands > of choice items?) > > *2. Cost.* Socially, it's important that the benefits of various > approaches exceed the costs; and practically, it's important that somebody > (or some people) be willing to bear the cost of implementation. If > Nafundi's fast-itemset implementation is already very close to meeting > these proposed requirements, then the cost of implementing your proposal > may be relatively low. That would be great. If the cost becomes high, > though, then somebody would fundraise to support the implementation? > (Please forgive my open-source ignorance here.) > > My secondary concerns regard the user experience: > > 1. It would be a bit of a shame to lose the ability to mix static and > dynamic choices. Lots of SurveyCTO users use this, and the obvious > work-around (include all static choices in the dynamic .csv file) is > awkward for a great many cases. For example, say you pull a list of people > from a .csv file and then want to include a "Don't know" or "Not listed" > option: you could obviously add those static options to the .csv, but it > would mean that you couldn't just directly dump your list of people from a > database or from another form... so the experience for the user would > become much more complicated. I dare say that the potential user-experience > pay-off might warrant devising a method to mix static and dynamic options. > > 2. It's not clear to me how to easily match or beat pulldata()'s > ease-of-use (such as it is!) in the pyxform context. Mitch has proposed > various extensions to the (perhaps over-)simplistic ${fieldname} > referencing scheme, and it seems that we'd need to work referencing of > external instances into one of those extensions. That should be possible, > but it might involve (much) greater surgery to pyxform. Somehow, it needs > to be easy for spreadsheet-based users to grab a cell from one of these > external data sources. (Or pulldata() remains the go-to method for > spreadsheet-based users... in which case it ought to be documented as part > of the spec.) > > > Thanks again, > > Chris > > > On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt wrote: > > Hi ODK XForm enthusiasts, > > I would like to propose to tweak the way external data is added to ODK > XForms and the way dynamic choice lists are generated from external data. > > These changes will not need to break existing forms in ODK Collect as far > as I can see. They build on the valuable work done by SurveyCTO to provide > this very useful functionality. > 1. Adding External Data > > In the currently supported method, the XForm doesn’t provide the source of > the external data and it cannot be queried using XPath, the logic language > of our beloved form format. > > > We may want to store (large) external data in a database, use the CSV > format, and query the client database directly for performance reasons. > This proposal allows a client to keep doing that. It doesn’t describe the > underlying implementation, just the XForm format. > > This proposal is also nothing new. It simply adds a csv source to this > existing spec: https://bitbucket.org/javarosa/javarosa/wiki/ > externalinstances by Dimagi because this specification is elegant, > extensible and inline with existing ODK XForm functionality. > > Adding a CSV file to a form could be done by simply adding a secondary > instance element with a src attribute: > > > > The jr://file-csv/ prefix indicates that: > > - > > the location of the external resource is listed in the xformsManifest > (similar to the currently used jr://images prefix for media files) > - > > the data has the CSV format > > > It is assumed that this (virtually) creates an XML instance in the client. > E.g. jr://file-csv/fruits.csv > > > is interpreted as: > > > > > > mango > > Mango > > > > … > > > > papaya > > Papaya > > > > > > If a ‘sortby’ column is present, the items will be sorted, as is the case > currently. > > The instance can be queried in normal supported XPath, like: > > instance(‘fruits’)/item[value=’papaya’]/label => Papaya > > or with the existing pulldata() function: > > pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya > > Changes in XLSForm: > > We’d need a way to output the external source. Some options: > > - > > the type “select_one external fruits.csv” outputs the instance with > src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at > the csv extension to give the correct src prefix (because in the future we > may want to add other formats) @Mitch is this perhaps what you are hinting > at the end of this blog post > ? > - > > or, add an ‘external data’ column, e.g. in the settings sheet, that > can contain multiple sources (ie. filenames with the .csv extension in this > case). > > > Existing forms using pulldata() can remain fully functional. It’s up to > the client to deal with the missing in old non-compliant XForms > for backwards compatibility sake. The commitment with adopting this new > approach is simply to start requiring the new method for new XForms and for > new XForms to work as described in the new specification. > > The one thing you cannot do with the new method is mix static and external > choices. > > 2. Creating dynamic select choice lists from external data > > Implementing #1 will also mean that users will be able to use the existing > choice-filter (in XLSForms terminology) to create nodesets from external > data. It’s up to the client whether to convert this internally to a db > query or to handle it in pure XPath on XML. For users this will mean that > the difference between external and internal choice-lists is minimal. > Creating e.g. cascade logic will be done in exactly the same way for both. > > For historical reasons, I think we could use the current search() > function inside a predicate (choice-filter), where the first parameter > would become obsolete. > > However, I’d much prefer using the following separate (and much > simplified, more user-friendly) XPath functions instead (2 of these are > native XPath 1.0 functions): > > - > > instance(‘fruits’)/item[starts-with(value, ‘p’)] > - > > instance(‘fruits’)/item[ends-with(value, ‘a’)] > - > > instance(‘fruits’)/item[contains(value, ‘y’)] > > > In XLSForm: > > > What do you think? > > Martijn > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "ODK Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "ODK Developers" group. > To unsubscribe from this topic, visit https://groups.google.com/d/to > pic/opendatakit-developers/9_VZoe7crVY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > opendatakit-developers+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > > > -- > *Did you know that Enketo Smart Paper has now become the #1 tool for data > collection? Don't fall behind. Use it!* > > Enketo | LinkedIn > | GitHub > | > > ... > > -- > You received this message because you are subscribed to a topic in the > Google Groups "ODK Developers" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/opendatakit-developers/9_VZoe7crVY/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > opendatakit-developers+unsubscribe@googlegroups.com . > For more options, visit https://groups.google.com/d/optout. > > > > -- > *Revolutionizing data collection since 2012.* > > Enketo | LinkedIn > | > > ...

--

Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/