I'm not sure if this falls into one of the 3 ways you mentioned. But so
far it seems to solve our problem.
Very cool!
On my side, I have some funding to implement this feature (as described
in the OP) in Enketo Express this month. I consider this the first phase
that would not be performance optimized yet. The second phase would be
performance optimization, e.g. by storing the data in a browser storage
solution and convert XPath queries to query this db. For the latter I have
no funding.
There are no plans to include the additional 2 itemset creation methods
(also no interest in a PR for this as it would make the form format and
form engine needlessly complex). Also no plans to add support in Enketo
Legacy.
What feature are you referring to?
(a) The new CommCare-derived feature as proposed in the OP,
(b) the "SurveyCTO way" of creating itemsets from external csv data, or
(c) the "Nafundi way" of creating itemsets from external csv data?
Cheers,
Martijn
On Friday, February 6, 2015 at 8:18:21 AM UTC-7, Trung wrote:
Hi Martijn,
We intend to develop this external data feature for the web form
(enketo). Can you guide us how and where to start, or we can collaborate in
developing as well.
Many thanks,
Trung.
On Thursday, September 18, 2014 at 9:44:14 PM UTC+7, Martijn van de Rijdt wrote:
Thanks Mitch,
I understand. Hopefully that can be sorted out eventually as I am
worried about the future of the ODK XForm format due the amount of
technical debt that seems to only be increasing - it is after all the core
we all revolve around. I believe Dimagi's branch has full XPath predicate
support as well and does some kind of static analysis of XPath search
queries to address performance of itemsets (though the latter may not be in
JavaRosa - not sure).
When it's time to add external data support to Enketo, I'll have to go
for the third solution, probably as proposed above with a different XLSForm
syntax. As mentioned, on the XForm side this is basically nothing more than
introducing the src attribute on a secondary instance. The potential of
external data is rather promising for users, especially for a web-based
application like Enketo where you could potentially pass custom data
sources to a URL. Therefore, I'll have to make sure it's completely solid
from the beginning and doesn't involve ending up with three completely
different ways to create itemsets.
Cheers,
Martijn
On Wednesday, September 17, 2014 11:25:05 AM UTC-6, Mitch wrote:
Short answer: NO
This is touching the very inner workings of the XPath expression
evaluation code. That is very delicate code. Rather than modifying
Javarosa, it would likely be better to start with a fully validated, fully
tested, fully vetted XPath expression evaluator, and then add hooks to that
to get behaviors that replicate the functionality of javarosa -- i.e.,
scrap the entire JR codebase.
I say this because there are two implementations of that innermost
section of Java code, one that Dimagi uses, and an earlier branch that ODK
uses.
Last time I tried to merge up to the Dimagi branch, I had to back out
the changes because they altered the outcomes of the random forms I was
testing against.
I don't know which codebase is correct, or perhaps both are broken?
The only way we could move forward is to expend a lot of effort setting
up automated tests to confirm the current behavior of the system.
Once we have those tests in place, then you could make dramatic internal
changes with some assurance that they do not affect outcomes.
Mitch
On Tue, Sep 16, 2014 at 9:59 AM, Martijn van de Rijdt <mar...@enketo.org wrote:
Thanks for this Carl. I think I understand now and I don't envy you for
having to make these maneuvers to avoid touching JavaRosa.
If you (Nafundi, ODK, SurveyCTO) are interested in doing fast itemsets /
external data using a clean XForm syntax instead, what would it take on the
JavaRosa side? Would an external contribution to make select a select1
'open', be welcomed and be sufficient? Anything else?
Martijn
On Monday, September 15, 2014 9:59:45 AM UTC-6, Christopher Robert wrote:
Hi all,
And the SurveyCTO-contributed "dynamic search-and-select" feature acts
as a kind of hybrid. It loads a dynamic list into memory so that JavaRosa
will respect it. But it only loads a filtered subset of the on-desk DB into
JavaRosa for all of the performance reasons already discussed. If you end
up with a extremely long list presented to the user, it will still slow
down (and eat loads of memory), unlike the Nafundi-contributed "fast
itemset" implementation.
Best,
Chris
On Mon, Sep 15, 2014 at 11:48 AM, chartung char...@nafundi.com wrote:
Hey Martijn,
You're close, but the external vs. regular itemsets is more like this:
if select(1) with itemset
let javarosa figure out what to display, then display Collect's
select1 widget with data from javarosa
else if input with type==text and query attribute
don't let javarosa do anything. Display an external-Itemset widget in
Collect, handle all of the db lookups, etc... and only use javarosa to save
the selected answer to the data model in memory.
The reason we used text instead of select(1) is that javarosa doesn't
allow saving values in a select1 node that are not included in that
select(1)'s list of values. Since we're doing all of the processing
externally, the select(1)'s list is empty, and will never save anything.
I believe javarosa's implementation is called a "closed" select1, and
there was some talk of adding the option of "open" select1s at some point,
but I don't think anything happened beyond discussion.
Hope that helps,
-Carl
On Friday, September 12, 2014 8:01:40 AM UTC-7, Martijn van de Rijdt wrote:
Thanks Chris, Yaw,
I see some real opportunity here to agree on a common and solid approach
for external data. I'd like to jump in on the performance issue first.
After the explanation, I imagine fast itemsets work something like this
(please correct me if I'm wrong):
if select(1) with itemset
parse as regular itemset
if input with type=text and query attribute
parse as fast itemset
If the above is more or less the case, would it be possible to change
this to:
if select(1) with itemset
if external data from csv
parse as fast itemset
else
parse as *regular *itemset
Advantages of the latter:
- XForms syntax remains clean and correct, i.e. a select and select1
remain what they are, and so do itemsets
- No need for additional query attribute
Cheers,
Martijn
P.S. If any Dimagi devs see this, it would be very useful if you could
share your experience on making large itemsets faster if you have done this.
On Thu, Sep 11, 2014 at 4:55 PM, Yaw Anokwa yan...@nafundi.com wrote:
Martijn,
Chris' point about performance and cost is why external itemsets are
built the way they are. JavaRosa requires that select one options be loaded
into memory. Further, you can't save a select one option that did not come
from one of those loaded in memory.
These constraints are why external itemsets overload text input (
https://github.com/SEL-Columbia/pyxform/pull/120) and why autocomplete
was removed from Collect (https://code.google.com/p/ope
ndatakit/issues/detail?id=289).
Agreed that none of this is ideal. The solution is to rewrite the core,
but that will require a lot of effort without a lot of visible benefit to
the user.
Yaw
Need ODK services? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.
On Mon, Sep 8, 2014 at 11:02 AM, Christopher Robert < cro...@surveycto.com> wrote:
Hi Martijn,
This looks neat. Thanks for putting in the work on it.
As Yaw suggests, it sounds similar to Nafundi's "fast itemset"
contribution. We hadn't made use of that ourselves because our users author
in Excel or Google Drive, and because we lacked the python expertise to
make the relatively significant pyxform changes necessary to make it
accessible. We try to minimize changes in Collect, JavaRosa, and pyxform,
but our efforts to minimize changes to pyxform are particularly extreme
given that we're a Java development team. Nathan's pyxform contribution,
though, would seem to bring fast itemsets into the pyxform fold, which is
fantastic (this is the contribution linked by Yaw in this thread).
My primary concerns with this spec are practical in nature:
*1. Performance. *JavaRosa's handling of XForm-based choice lists (and
filtering) performs such that forms on fast, modern devices cannot
practically exceed 1,000-2,000 choices in total. Since a great many of our
users exceed those limits, we needed something that would perform
significantly better. Obviously, Nafundi's contribution grew from the same
need, and they hewed much closer to the XForm spirit (and maybe even the
specification). Still, I wonder if meeting all of the elements of your
proposed specification (namely the ability to reference and filter via
XPath expressions) would lead to an in-memory representation that would
suffer the same practical limits as originated the need for something
faster. Perhaps some kind of SQLite-based (as opposed to memory-based)
method of storing and accessing items can be devised that meets the various
requirements you propose ("artificial instances" in the Dimagi spec's
language) -- but I'm guessing that it would be pretty hard. (I could be
wrong, though. Yaw? Maybe Nafundi's fast-itemset implementation already
meets the key requirements while still performing well with many thousands
of choice items?)
2. Cost. Socially, it's important that the benefits of various
approaches exceed the costs; and practically, it's important that somebody
(or some people) be willing to bear the cost of implementation. If
Nafundi's fast-itemset implementation is already very close to meeting
these proposed requirements, then the cost of implementing your proposal
may be relatively low. That would be great. If the cost becomes high,
though, then somebody would fundraise to support the implementation?
(Please forgive my open-source ignorance here.)
My secondary concerns regard the user experience:
-
It would be a bit of a shame to lose the ability to mix static and
dynamic choices. Lots of SurveyCTO users use this, and the obvious
work-around (include all static choices in the dynamic .csv file) is
awkward for a great many cases. For example, say you pull a list of people
from a .csv file and then want to include a "Don't know" or "Not listed"
option: you could obviously add those static options to the .csv, but it
would mean that you couldn't just directly dump your list of people from a
database or from another form... so the experience for the user would
become much more complicated. I dare say that the potential user-experience
pay-off might warrant devising a method to mix static and dynamic options.
-
It's not clear to me how to easily match or beat pulldata()'s
ease-of-use (such as it is!) in the pyxform context. Mitch has proposed
various extensions to the (perhaps over-)simplistic ${fieldname}
referencing scheme, and it seems that we'd need to work referencing of
external instances into one of those extensions. That should be possible,
but it might involve (much) greater surgery to pyxform. Somehow, it needs
to be easy for spreadsheet-based users to grab a cell from one of these
external data sources. (Or pulldata() remains the go-to method for
spreadsheet-based users... in which case it ought to be documented as part
of the spec.)
Thanks again,
Chris
On Fri, Sep 5, 2014 at 12:01 PM, Martijn van de Rijdt <mar...@enketo.org wrote:
Hi ODK XForm enthusiasts,
I would like to propose to tweak the way external data is added to ODK
XForms and the way dynamic choice lists are generated from external data.
These changes will not need to break existing forms in ODK Collect as
far as I can see. They build on the valuable work done by SurveyCTO to
provide this very useful functionality.
- Adding External Data
In the currently supported method, the XForm doesn’t provide the source
of the external data and it cannot be queried using XPath, the logic
language of our beloved form format.
We may want to store (large) external data in a database, use the CSV
format, and query the client database directly for performance reasons.
This proposal allows a client to keep doing that. It doesn’t describe the
underlying implementation, just the XForm format.
This proposal is also nothing new. It simply adds a csv source to this
existing spec: https://bitbucket.org/javarosa/javarosa/wiki/
externalinstances by Dimagi because this specification is elegant,
extensible and inline with existing ODK XForm functionality.
Adding a CSV file to a form could be done by simply adding a secondary
instance element with a src attribute:
The jr://file-csv/ prefix indicates that:
the location of the external resource is listed in the
xformsManifest (similar to the currently used jr://images prefix for media
files)
the data has the CSV format
It is assumed that this (virtually) creates an XML instance in the
client. E.g. jr://file-csv/fruits.csv
is interpreted as:
<value>mango</value>
<label>Mango</label>
…
<value>papaya</value>
<label>Papaya</label>
If a ‘sortby’ column is present, the items will be sorted, as is the
case currently.
The instance can be queried in normal supported XPath, like:
instance(‘fruits’)/item[value=’papaya’]/label => Papaya
or with the existing pulldata() function:
pulldata(‘fruits’, ‘label’, ‘value’, ‘papaya’ ) => Papaya
Changes in XLSForm:
We’d need a way to output the external source. Some options:
the type “select_one external fruits.csv” outputs the instance with
src=”jr://file-csv/fruits.csv” and id=”fruits” in the XForm. It looks at
the csv extension to give the correct src prefix (because in the future we
may want to add other formats) @Mitch is this perhaps what you are hinting
at the end of this blog post
http://opendatakit.org/2014/08/odk-v1-4-4-tools-now-available/?
or, add an ‘external data’ column, e.g. in the settings sheet, that
can contain multiple sources (ie. filenames with the .csv extension in this
case).
Existing forms using pulldata() can remain fully functional. It’s up to
the client to deal with the missing in old non-compliant XForms
for backwards compatibility sake. The commitment with adopting this new
approach is simply to start requiring the new method for new XForms and for
new XForms to work as described in the new specification.
The one thing you cannot do with the new method is mix static and
external choices.
- Creating dynamic select choice lists from external data
Implementing #1 will also mean that users will be able to use the
existing choice-filter (in XLSForms terminology) to create nodesets from
external data. It’s up to the client whether to convert this internally to
a db query or to handle it in pure XPath on XML. For users this will
mean that the difference between external and internal choice-lists is
minimal. Creating e.g. cascade logic will be done in exactly the same
way for both.
For historical reasons, I think we could use the current search()
function inside a predicate (choice-filter), where the first parameter
would become obsolete.
However, I’d much prefer using the following separate (and much
simplified, more user-friendly) XPath functions instead (2 of these are
native XPath 1.0 functions):
instance(‘fruits’)/item[starts-with(value, ‘p’)]
instance(‘fruits’)/item[ends-with(value, ‘a’)]
instance(‘fruits’)/item[contains(value, ‘y’)]
In XLSForm:
What do you think?
Martijn
--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/opendatakit-developers/9_VZoe7crVY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Did you know that Enketo Smart Paper has now become the #1 tool for
data collection? Don't fall behind. Use it!
Enketo https://enketo.org | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/MartijnR |
...