Allowing easy upload of revised forms

Hi ODK Developers,

I know that there are work-arounds and that it's not a high priority, but
for our project it is a fairly frequent event that a label's translation is
tweaked, a constraint is tightened, or a form has to otherwise be updated
without actually affecting the form database schema in any way. I suspect
the same is true for many projects. In cases like this, we really want to
be able to just upload the revised form definition and have it quietly
overwrite the existing Aggregate definition -- maintaining the existing
form ID, submission data, etc. (We would then manually download the new
form definitions into Collect.)

I have this on my list to develop for my custom-extended Aggregate version,
but I actually think that this is one of those features that really should
be in the core product. Thus, if I can get it to work, I'd like to share
the patch for possible inclusion in a future release.

That said, this also seems a particularly complex and sensitive bit of code
for somebody not intimately familiar with the database structure, etc. I
wonder if any of you, with more experience, have any quick suggestions
and/or warnings? For example, there may be some sanity-check code somewhere
that does a reasonably good job of comparing schemas, so if you alert me to
that it may help me with the schema-comparison. Or there may be particular
dangers with updating the definition that are not readily apparent. Since
clearly you guys have thought about this form-replacement business and
passed on it numerous times, I can appreciate that there must be gotchas in
there.

Any advice would be greatly appreciated. And again, if I can get it to work
then I am happy to share the patch. If it can be implemented safely then I
think that it could be very useful to many users.

Thanks in advance,

Chris

Chris,

Your enthusiasm is great and we would be willing to add code from
someone into Aggregate and the rest of the ODK tools that verifies the
form has not changed significantly enough to effect the data
structure. The ODK core team would be happy to integrate a piece of
java code that verifies there are not major differences between an
older revision of the form and a new revision, and if we had this
check we would allow overwrites of the forms. In fact there is already
a versioning scheme in the APIs to handle the different copies;
however, no one has written the java verification code to verify that
the form has not changed significantly.

Therefore in regards to coding efforts, I would not necessarily worry
about learning Aggregate's data storage as I would be happy to add
your code into the right places. What we have not had time to do is to
write the code that verifies that nothing substantially has changed.
If something does change that affects the database it could lead to
all kinds of problems in the pipeline or painful merges. If you (or
anyone) could provide the java code that verifies nothing
significantly has changed I will happy to work with the other ODK core
developers to get it integrated in the tools.

So then the question becomes what does this java code to check for
major need to do to make sure that nothing in the underlying xform
structure has changed that would cause a change to the database.
Here is a list off the top of my head (I will investigate more if
someone takes this up).

  1. The instance has not changed in anyway
    (http://opendatakit.org/help/form-design/#instance)
  2. The binding has not changed in a way that affects the database
    structure (http://opendatakit.org/help/form-design/binding/)

Verifying #1 is easier than verifying #2 as you want to be able to
allow some attributes in the bindings to change such as 'calculate',
'relevant, and 'constraint' but not bindings that effect the database.
For example, 'type' cannot change because that is used to select the
database column definition, or 'odk:length' as that is the size of the
varchar in the database. Therefore I think the easiest thing to do to
make this extensible for future changes to the binding attributes (yes
more are coming in future revisions to ODK) is to write the java
verifier so that binding verification part ensures all attributes are
exactly the same except for a list of attributes that ODK developers
specify is acceptable to change. That way the ODK core team can easily
update the verifies list of attributes that do not cause a change to
the database structure (e.g. 'calculate', 'relevant', 'constraint')
allowing for easy future expansion. NOTE: The question area can change
and it doesn't affect the database.

Basically if you write the java code that verifies there are no major
changes in the xform by verifying both the instance or the
non-approved attributes in the bindings have not changed the ODK core
team will add yours (or someone else's) verification code into the
proper places in the ODK tools. We have not had the cycles to write
such a verification system. If someone in the community could provide
this code in java and provide some tests that show their
implementation is correct, I will personally work to get it added to
the core of the tools. In the past we have leaned towards keeping ODK
tools error free by keeping things simple as allowing forms to change
could introduce lots of errors because it's not obvious to people what
changes they make to their xform will break parts of the tools and we
don't want people to lose data because of errors. Therefore any xform
modification verification code needs to be well tested and extensible
to avoid problems.

Hopefully someone in the community will code this because I agree with
Chris this could be a nice addition to make ODK more user friendly.
Happy to provide more details if someone wants to take up the job.

Cheers,
Waylon

··· On Thu, May 10, 2012 at 9:47 PM, Christopher Robert wrote: > Hi ODK Developers, > > I know that there are work-arounds and that it's not a high priority, but > for our project it is a fairly frequent event that a label's translation is > tweaked, a constraint is tightened, or a form has to otherwise be updated > without actually affecting the form database schema in any way. I suspect > the same is true for many projects. In cases like this, we really want to be > able to just upload the revised form definition and have it quietly > overwrite the existing Aggregate definition -- maintaining the existing form > ID, submission data, etc. (We would then manually download the new form > definitions into Collect.) > > I have this on my list to develop for my custom-extended Aggregate version, > but I actually think that this is one of those features that really should > be in the core product. Thus, if I can get it to work, I'd like to share the > patch for possible inclusion in a future release. > > That said, this also seems a particularly complex and sensitive bit of code > for somebody not intimately familiar with the database structure, etc. I > wonder if any of you, with more experience, have any quick suggestions > and/or warnings? For example, there may be some sanity-check code somewhere > that does a reasonably good job of comparing schemas, so if you alert me to > that it may help me with the schema-comparison. Or there may be particular > dangers with updating the definition that are not readily apparent. Since > clearly you guys have thought about this form-replacement business and > passed on it numerous times, I can appreciate that there must be gotchas in > there. > > Any advice would be greatly appreciated. And again, if I can get it to work > then I am happy to share the patch. If it can be implemented safely then I > think that it could be very useful to many users. > > Thanks in advance, > > Chris >

Hi Waylon,

Thanks so much for the extremely helpful reply. This sounds excellent and I
am game to take a shot at it.

Your approach sounds like a good one: I'll compare the instance(s) and
bindings and only allow for an explicit list of allowable differences.
Worst case, as new bindings are added, changes to those new bindings will
be judged disallowed until and unless somebody goes in and explicitly adds
them to the list of allowable differences. That should be the safest way to
go.

Judging by code in the likes of createOrFetchFormId() and isSameForm(),
I'll very naturally have an existing Form/IForm on the one hand, and a new
"XFormParameters rootElementDefn" on the other -- and in principle that
seems plenty to be able to systematically compare the instances and
bindings. I should be able to figure it out. (And Form.isSameForm() already
allows re-upload of identical form definitions. I'll just be loosening that
a bit.)

Thanks again, and I'll report back once I have something working,

Chris

··· On Friday, May 11, 2012, W. Brunette wrote:

Chris,

Your enthusiasm is great and we would be willing to add code from
someone into Aggregate and the rest of the ODK tools that verifies the
form has not changed significantly enough to effect the data
structure. The ODK core team would be happy to integrate a piece of
java code that verifies there are not major differences between an
older revision of the form and a new revision, and if we had this
check we would allow overwrites of the forms. In fact there is already
a versioning scheme in the APIs to handle the different copies;
however, no one has written the java verification code to verify that
the form has not changed significantly.

Therefore in regards to coding efforts, I would not necessarily worry
about learning Aggregate's data storage as I would be happy to add
your code into the right places. What we have not had time to do is to
write the code that verifies that nothing substantially has changed.
If something does change that affects the database it could lead to
all kinds of problems in the pipeline or painful merges. If you (or
anyone) could provide the java code that verifies nothing
significantly has changed I will happy to work with the other ODK core
developers to get it integrated in the tools.

So then the question becomes what does this java code to check for
major need to do to make sure that nothing in the underlying xform
structure has changed that would cause a change to the database.
Here is a list off the top of my head (I will investigate more if
someone takes this up).

  1. The instance has not changed in anyway
    (http://opendatakit.org/help/form-design/#instance)
  2. The binding has not changed in a way that affects the database
    structure (http://opendatakit.org/help/form-design/binding/)

Verifying #1 is easier than verifying #2 as you want to be able to
allow some attributes in the bindings to change such as 'calculate',
'relevant, and 'constraint' but not bindings that effect the database.
For example, 'type' cannot change because that is used to select the
database column definition, or 'odk:length' as that is the size of the
varchar in the database. Therefore I think the easiest thing to do to
make this extensible for future changes to the binding attributes (yes
more are coming in future revisions to ODK) is to write the java
verifier so that binding verification part ensures all attributes are
exactly the same except for a list of attributes that ODK developers
specify is acceptable to change. That way the ODK core team can easily
update the verifies list of attributes that do not cause a change to
the database structure (e.g. 'calculate', 'relevant', 'constraint')
allowing for easy future expansion. NOTE: The question area can change
and it doesn't affect the database.

Basically if you write the java code that verifies there are no major
changes in the xform by verifying both the instance or the
non-approved attributes in the bindings have not changed the ODK core
team will add yours (or someone else's) verification code into the
proper places in the ODK tools. We have not had the cycles to write
such a verification system. If someone in the community could provide
this code in java and provide some tests that show their
implementation is correct, I will personally work to get it added to
the core of the tools. In the past we have leaned towards keeping ODK
tools error free by keeping things simple as allowing forms to change
could introduce lots of errors because it's not obvious to people what
changes they make to their xform will break parts of the tools and we
don't want people to lose data because of errors. Therefore any xform
modification verification code needs to be well tested and extensible
to avoid problems.

Hopefully someone in the community will code this because I agree with
Chris this could be a nice addition to make ODK more user friendly.
Happy to provide more details if someone wants to take up the job.

Cheers,
Waylon

On Thu, May 10, 2012 at 9:47 PM, Christopher Robert <chrislrobert@gmail.com <javascript:;>> wrote:

Hi ODK Developers,

I know that there are work-arounds and that it's not a high priority, but
for our project it is a fairly frequent event that a label's translation
is
tweaked, a constraint is tightened, or a form has to otherwise be updated
without actually affecting the form database schema in any way. I suspect
the same is true for many projects. In cases like this, we really want
to be
able to just upload the revised form definition and have it quietly
overwrite the existing Aggregate definition -- maintaining the existing
form
ID, submission data, etc. (We would then manually download the new form
definitions into Collect.)

I have this on my list to develop for my custom-extended Aggregate
version,
but I actually think that this is one of those features that really
should
be in the core product. Thus, if I can get it to work, I'd like to share
the
patch for possible inclusion in a future release.

That said, this also seems a particularly complex and sensitive bit of
code
for somebody not intimately familiar with the database structure, etc. I
wonder if any of you, with more experience, have any quick suggestions
and/or warnings? For example, there may be some sanity-check code
somewhere
that does a reasonably good job of comparing schemas, so if you alert me
to
that it may help me with the schema-comparison. Or there may be
particular
dangers with updating the definition that are not readily apparent. Since
clearly you guys have thought about this form-replacement business and
passed on it numerous times, I can appreciate that there must be gotchas
in
there.

Any advice would be greatly appreciated. And again, if I can get it to
work
then I am happy to share the patch. If it can be implemented safely then
I
think that it could be very useful to many users.

Thanks in advance,

Chris

Yes, basically, within Form, you need to replace the comparison of the md5
hash of the already-present form definition file and the new file here:

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646

With a more embellished comparison of the two javarosa-parsed form
definition files.

The incoming form definition file is in xmlBytes
The existing form definition file is in byte Form.getFormXml(cc)

Javarosa is now capable of having two active form models, so you should be
able to parse the two files and stepwise compare the element trees.

The FormParserForJavarosa.constructDataModel() steps through the javarosa
parse tree (TreeElement) to construct the internal data structures, so it
should be fairly straightforward to take that code, change it to do a
side-by-side step-through and node-by-node comparison.

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/parser/FormParserForJavaRosa.java?repo=aggregate#1047

Mitch

··· On Fri, May 11, 2012 at 12:57 AM, Christopher Robert <chrislrobert@gmail.com wrote:

Hi Waylon,

Thanks so much for the extremely helpful reply. This sounds excellent and
I am game to take a shot at it.

Your approach sounds like a good one: I'll compare the instance(s) and
bindings and only allow for an explicit list of allowable differences.
Worst case, as new bindings are added, changes to those new bindings will
be judged disallowed until and unless somebody goes in and explicitly adds
them to the list of allowable differences. That should be the safest way to
go.

Judging by code in the likes of createOrFetchFormId() and isSameForm(),
I'll very naturally have an existing Form/IForm on the one hand, and a new
"XFormParameters rootElementDefn" on the other -- and in principle that
seems plenty to be able to systematically compare the instances and
bindings. I should be able to figure it out. (And Form.isSameForm() already
allows re-upload of identical form definitions. I'll just be loosening that
a bit.)

Thanks again, and I'll report back once I have something working,

Chris

On Friday, May 11, 2012, W. Brunette wrote:

Chris,

Your enthusiasm is great and we would be willing to add code from
someone into Aggregate and the rest of the ODK tools that verifies the
form has not changed significantly enough to effect the data
structure. The ODK core team would be happy to integrate a piece of
java code that verifies there are not major differences between an
older revision of the form and a new revision, and if we had this
check we would allow overwrites of the forms. In fact there is already
a versioning scheme in the APIs to handle the different copies;
however, no one has written the java verification code to verify that
the form has not changed significantly.

Therefore in regards to coding efforts, I would not necessarily worry
about learning Aggregate's data storage as I would be happy to add
your code into the right places. What we have not had time to do is to
write the code that verifies that nothing substantially has changed.
If something does change that affects the database it could lead to
all kinds of problems in the pipeline or painful merges. If you (or
anyone) could provide the java code that verifies nothing
significantly has changed I will happy to work with the other ODK core
developers to get it integrated in the tools.

So then the question becomes what does this java code to check for
major need to do to make sure that nothing in the underlying xform
structure has changed that would cause a change to the database.
Here is a list off the top of my head (I will investigate more if
someone takes this up).

  1. The instance has not changed in anyway
    (http://opendatakit.org/help/form-design/#instance)
  2. The binding has not changed in a way that affects the database
    structure (http://opendatakit.org/help/form-design/binding/)

Verifying #1 is easier than verifying #2 as you want to be able to
allow some attributes in the bindings to change such as 'calculate',
'relevant, and 'constraint' but not bindings that effect the database.
For example, 'type' cannot change because that is used to select the
database column definition, or 'odk:length' as that is the size of the
varchar in the database. Therefore I think the easiest thing to do to
make this extensible for future changes to the binding attributes (yes
more are coming in future revisions to ODK) is to write the java
verifier so that binding verification part ensures all attributes are
exactly the same except for a list of attributes that ODK developers
specify is acceptable to change. That way the ODK core team can easily
update the verifies list of attributes that do not cause a change to
the database structure (e.g. 'calculate', 'relevant', 'constraint')
allowing for easy future expansion. NOTE: The question area can change
and it doesn't affect the database.

Basically if you write the java code that verifies there are no major
changes in the xform by verifying both the instance or the
non-approved attributes in the bindings have not changed the ODK core
team will add yours (or someone else's) verification code into the
proper places in the ODK tools. We have not had the cycles to write
such a verification system. If someone in the community could provide
this code in java and provide some tests that show their
implementation is correct, I will personally work to get it added to
the core of the tools. In the past we have leaned towards keeping ODK
tools error free by keeping things simple as allowing forms to change
could introduce lots of errors because it's not obvious to people what
changes they make to their xform will break parts of the tools and we
don't want people to lose data because of errors. Therefore any xform
modification verification code needs to be well tested and extensible
to avoid problems.

Hopefully someone in the community will code this because I agree with
Chris this could be a nice addition to make ODK more user friendly.
Happy to provide more details if someone wants to take up the job.

Cheers,
Waylon

On Thu, May 10, 2012 at 9:47 PM, Christopher Robert chrislrobert@gmail.com wrote:

Hi ODK Developers,

I know that there are work-arounds and that it's not a high priority,
but
for our project it is a fairly frequent event that a label's
translation is
tweaked, a constraint is tightened, or a form has to otherwise be
updated
without actually affecting the form database schema in any way. I
suspect
the same is true for many projects. In cases like this, we really want
to be
able to just upload the revised form definition and have it quietly
overwrite the existing Aggregate definition -- maintaining the existing
form
ID, submission data, etc. (We would then manually download the new form
definitions into Collect.)

I have this on my list to develop for my custom-extended Aggregate
version,
but I actually think that this is one of those features that really
should
be in the core product. Thus, if I can get it to work, I'd like to
share the
patch for possible inclusion in a future release.

That said, this also seems a particularly complex and sensitive bit of
code
for somebody not intimately familiar with the database structure, etc. I
wonder if any of you, with more experience, have any quick suggestions
and/or warnings? For example, there may be some sanity-check code
somewhere
that does a reasonably good job of comparing schemas, so if you alert
me to
that it may help me with the schema-comparison. Or there may be
particular
dangers with updating the definition that are not readily apparent.
Since
clearly you guys have thought about this form-replacement business and
passed on it numerous times, I can appreciate that there must be
gotchas in
there.

Any advice would be greatly appreciated. And again, if I can get it to
work
then I am happy to share the patch. If it can be implemented safely
then I
think that it could be very useful to many users.

Thanks in advance,

Chris

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Mitch,

Thanks so much for the suggestions. I got into it today and it does seem
quite doable. Testing thoroughly will probably be the hardest part.

Thanks again,

Chris

··· On May 11, 2012 6:22 PM, "Mitch S" wrote:

Yes, basically, within Form, you need to replace the comparison of the md5
hash of the already-present form definition file and the new file here:

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646

With a more embellished comparison of the two javarosa-parsed form
definition files.

The incoming form definition file is in xmlBytes
The existing form definition file is in byte Form.getFormXml(cc)

Javarosa is now capable of having two active form models, so you should be
able to parse the two files and stepwise compare the element trees.

The FormParserForJavarosa.constructDataModel() steps through the javarosa
parse tree (TreeElement) to construct the internal data structures, so it
should be fairly straightforward to take that code, change it to do a
side-by-side step-through and node-by-node comparison.

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/parser/FormParserForJavaRosa.java?repo=aggregate#1047

Mitch

On Fri, May 11, 2012 at 12:57 AM, Christopher Robert < chrislrobert@gmail.com> wrote:

Hi Waylon,

Thanks so much for the extremely helpful reply. This sounds excellent and
I am game to take a shot at it.

Your approach sounds like a good one: I'll compare the instance(s) and
bindings and only allow for an explicit list of allowable differences.
Worst case, as new bindings are added, changes to those new bindings will
be judged disallowed until and unless somebody goes in and explicitly adds
them to the list of allowable differences. That should be the safest way to
go.

Judging by code in the likes of createOrFetchFormId() and isSameForm(),
I'll very naturally have an existing Form/IForm on the one hand, and a new
"XFormParameters rootElementDefn" on the other -- and in principle that
seems plenty to be able to systematically compare the instances and
bindings. I should be able to figure it out. (And Form.isSameForm() already
allows re-upload of identical form definitions. I'll just be loosening that
a bit.)

Thanks again, and I'll report back once I have something working,

Chris

On Friday, May 11, 2012, W. Brunette wrote:

Chris,

Your enthusiasm is great and we would be willing to add code from
someone into Aggregate and the rest of the ODK tools that verifies the
form has not changed significantly enough to effect the data
structure. The ODK core team would be happy to integrate a piece of
java code that verifies there are not major differences between an
older revision of the form and a new revision, and if we had this
check we would allow overwrites of the forms. In fact there is already
a versioning scheme in the APIs to handle the different copies;
however, no one has written the java verification code to verify that
the form has not changed significantly.

Therefore in regards to coding efforts, I would not necessarily worry
about learning Aggregate's data storage as I would be happy to add
your code into the right places. What we have not had time to do is to
write the code that verifies that nothing substantially has changed.
If something does change that affects the database it could lead to
all kinds of problems in the pipeline or painful merges. If you (or
anyone) could provide the java code that verifies nothing
significantly has changed I will happy to work with the other ODK core
developers to get it integrated in the tools.

So then the question becomes what does this java code to check for
major need to do to make sure that nothing in the underlying xform
structure has changed that would cause a change to the database.
Here is a list off the top of my head (I will investigate more if
someone takes this up).

  1. The instance has not changed in anyway
    (http://opendatakit.org/help/form-design/#instance)
  2. The binding has not changed in a way that affects the database
    structure (http://opendatakit.org/help/form-design/binding/)

Verifying #1 is easier than verifying #2 as you want to be able to
allow some attributes in the bindings to change such as 'calculate',
'relevant, and 'constraint' but not bindings that effect the database.
For example, 'type' cannot change because that is used to select the
database column definition, or 'odk:length' as that is the size of the
varchar in the database. Therefore I think the easiest thing to do to
make this extensible for future changes to the binding attributes (yes
more are coming in future revisions to ODK) is to write the java
verifier so that binding verification part ensures all attributes are
exactly the same except for a list of attributes that ODK developers
specify is acceptable to change. That way the ODK core team can easily
update the verifies list of attributes that do not cause a change to
the database structure (e.g. 'calculate', 'relevant', 'constraint')
allowing for easy future expansion. NOTE: The question area can change
and it doesn't affect the database.

Basically if you write the java code that verifies there are no major
changes in the xform by verifying both the instance or the
non-approved attributes in the bindings have not changed the ODK core
team will add yours (or someone else's) verification code into the
proper places in the ODK tools. We have not had the cycles to write
such a verification system. If someone in the community could provide
this code in java and provide some tests that show their
implementation is correct, I will personally work to get it added to
the core of the tools. In the past we have leaned towards keeping ODK
tools error free by keeping things simple as allowing forms to change
could introduce lots of errors because it's not obvious to people what
changes they make to their xform will break parts of the tools and we
don't want people to lose data because of errors. Therefore any xform
modification verification code needs to be well tested and extensible
to avoid problems.

Hopefully someone in the community will code this because I agree with
Chris this could be a nice addition to make ODK more user friendly.
Happy to provide more details if someone wants to take up the job.

Cheers,
Waylon

On Thu, May 10, 2012 at 9:47 PM, Christopher Robert chrislrobert@gmail.com wrote:

Hi ODK Developers,

I know that there are work-arounds and that it's not a high priority,
but
for our project it is a fairly frequent event that a label's
translation is
tweaked, a constraint is tightened, or a form has to otherwise be
updated
without actually affecting the form database schema in any way. I
suspect
the same is true for many projects. In cases like this, we really want
to be
able to just upload the revised form definition and have it quietly
overwrite the existing Aggregate definition -- maintaining the
existing form
ID, submission data, etc. (We would then manually download the new form
definitions into Collect.)

I have this on my list to develop for my custom-extended Aggregate
version,
but I actually think that this is one of those features that really
should
be in the core product. Thus, if I can get it to work, I'd like to
share the
patch for possible inclusion in a future release.

That said, this also seems a particularly complex and sensitive bit of
code
for somebody not intimately familiar with the database structure, etc.
I
wonder if any of you, with more experience, have any quick suggestions
and/or warnings? For example, there may be some sanity-check code
somewhere
that does a reasonably good job of comparing schemas, so if you alert
me to
that it may help me with the schema-comparison. Or there may be
particular
dangers with updating the definition that are not readily apparent.
Since
clearly you guys have thought about this form-replacement business and
passed on it numerous times, I can appreciate that there must be
gotchas in
there.

Any advice would be greatly appreciated. And again, if I can get it to
work
then I am happy to share the patch. If it can be implemented safely
then I
think that it could be very useful to many users.

Thanks in advance,

Chris

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Mitch, Waylon,

Okay, so I've implemented and tested a maximally-conservative comparison
function. Accordingly, I updated Form.isSameForm(...) to
call FormParserForJavaRosa.compareXml(this.getFormXml(cc).getBytes("UTF-8"),xmlBytes)
and, based on the result, call xform.setValueFromByteArray(...) to update
the form's XML. Basically, my comparison function indicates whether the XML
is entirely identical and, if not, whether changes are too major to allow
an update. The following bound attribute changes are explicitly allowed on
existing fields only:

relevant
constraint
readonly
required
calculate
appearance
jr:constraintmessage
jr:preload
jr:preloadparams

My impression is that these attributes only affect rendering in Collect.
Also, any and all changes are allowed in the body portion of the XML.

The following changes are explicitly disallowed:

changes in the id
changes in the title
changes in the public key
changes in fields or grouping of fields (i.e., any change to the instance
structure)

The following changes are implicitly disallowed:

changes in type
changes in odk:length
changes in any other attribute, bound or otherwise, not explicitly allowed

This latter set of properties -- the implicit disallowing of most changes
-- was somewhat difficult to implement. This is because bound attributes
are not stored in TreeElements' attribute lists, as one might expect. In
fact, they are not stored at all. Rather, parsed bindings lead JavaRosa to
set TreeElement.required, TreeElement.relevant, etc. I could have easily
walked through the parsed TreeElements and compared such things, but then
when new bindings/features were added the code would default to accepting
changes to new attributes rather than default to rejecting them. This would
not have been desirable. Thus, I performed some very minor surgery
on FormParserForJavaRosa to have it record all raw bindings as they come
through parseBind(...). Then, in addition to walking through TreeElements
to discover structural instance changes and changes to directly-specified
attributes (like id), I also walk through all of the raw bindings. Only
changes to attributes listed in FormParserForJavaRosa.NonSchemaAttributes
are allowed; the whole approach errs on the side of over-rejecting rather
than over-accepting.

I now have a few questions:

  1. To carry this through, I need xform.setValueFromByteArray(...) (or some
    cousin) to update the actual XML when I have decided that such an update
    would be okay. Any ideas?

  2. In total I have proposed updates to 4 source files (though the changes
    to Form.java are still highly provisional), plus 13 test files (a base
    case, an all-allowable-updates case, and then 11 cases that deserve
    rejection). How should I communicate these? (I'm new to Mercurial and the
    whole open-source thing more generally.)

Please just let me know how best to move forward.

Thanks,

Chris

··· On Friday, May 11, 2012, Mitch S wrote:

Yes, basically, within Form, you need to replace the comparison of the md5
hash of the already-present form definition file and the new file here:

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646

With a more embellished comparison of the two javarosa-parsed form
definition files.

The incoming form definition file is in xmlBytes
The existing form definition file is in byte Form.getFormXml(cc)

Javarosa is now capable of having two active form models, so you should be
able to parse the two files and stepwise compare the element trees.

The FormParserForJavarosa.constructDataModel() steps through the javarosa
parse tree (TreeElement) to construct the internal data structures, so it
should be fairly straightforward to take that code, change it to do a
side-by-side step-through and node-by-node comparison.

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/parser/FormParserForJavaRosa.java?repo=aggregate#1047

Mitch

On Fri, May 11, 2012 at 12:57 AM, Christopher Robert < chrislrobert@gmail.com> wrote:

Hi Waylon,

Thanks so much for the extremely helpful reply. This sounds excellent and
I am game to take a shot at it.

Your approach sounds like a good one: I'll compare the instance(s) and
bindings and only allow for an explicit list of allowable differences.
Worst case, as new bindings are added, changes to those new bindings will
be judged disallowed until and unless somebody goes in and explicitly adds
them to the list of allowable differences. That should be the safest way to
go.

Judging by code in the likes of createOrFetchFormId() and isSameForm(),
I'll very naturally have an existing Form/IForm on the one hand, and a new
"XFormParameters rootElementDefn" on the other -- and in principle that
seems plenty to be able to systematically compare the instances and
bindings. I should be able to figure it out. (And Form.isSameForm() already
allows re-upload of identical form definitions. I'll just be loosening that
a bit.)

Thanks again, and I'll report back once I have something working,

Chris

On Friday, May 11, 2012, W. Brunette wrote:

Chris,

Your enthusiasm is great and we would be willing to add code from
someone into Aggregate and the rest of the ODK tools that verifies the
form has not changed significantly enough to effect the data
structure. The ODK core team would be happy to integrate a piece of
java code that verifies there are not major differences between an
older revision of the form and a new revision, and if we had this
check we would allow overwrites of the forms. In fact there is already
a versioning scheme in the APIs to handle the different copies;
however, no one has written the java verification code to verify that
the form has not changed significantly.

Therefore in regards to coding efforts, I would not necessarily worry
about learning Aggregate's data storage as I would be happy to add
your code into the right places. What we have not had time to do is to
write the code that verifies that nothing substantially has changed.
If something does change that affects the database it could lead to
all kinds of problems in the pipeline or painful merges. If you (or
anyone) could provide the java code that verifies nothing
significantly has changed I will happy to work with the other ODK core
developers to get it integrated in the tools.

So then the question becomes what does this java code to check for
major need to do to make sure that nothing in the underlying xform
structure has changed that would cause a change to the database.
Here is a list off the top of my head (I will investigate more if
someone takes this up).

  1. The instance has not changed in anyway
    (http://opendatakit.org/help/form-design/#instance)
  2. The binding has not changed in a way that affects the database
    structure (http://opendatakit.org/help/form-design/binding/)

Verifying #1 is easier than verifying #2 as you want to be able to
allow some attributes in the bindings to change such as 'calculate',
'relevant, and 'constraint' but not bindings that effect the database.
For example, 'type' cannot change because that is used to select the
database column definition, or 'odk:length' as that is the size of the
varchar in the database. Therefore I think the easiest thing to do to
make this extensible for future changes to the binding attributes (yes
more are coming in future revisions to ODK) is to write the java
verifier so that binding verification part ensures all attributes are
exactly the same except for a list of attributes that ODK developers
specify is acceptable to change. That way the ODK core team can easily
update the verifies list of attributes that do not cause a change to
the database structure (e.g. 'calculate', 'relevant', 'constraint')
allowing for easy future expansion. NOTE: The question area can change
and it doesn't affect the database.

Basically if you write the java code that verifies there are no major
changes in the xform by verifying both the instance or the
non-approved attributes in the bindings have not changed the ODK core
team will add yours (or someone else

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Chris,

(1) What do you mean by "changes in any other attribute, bound or
otherwise, not explicitly allowed." I think you can relax this and avoid
adding more complexity in FormParserForJavarosa.

Is this done in an abundance of caution, or do you have specific examples
in mind where this is necessary? As long as odk:length is not allowed to
change, we should be OK to allow other attribute changes. Attributes are
generally ignored -- ODK Aggregate only examines a few attributes on the
top-level tag in a submission, so allowing changes to attributes in the
instance definition, body or bind statements should be OK. Were you
thinking of specific examples where this might be a concern?

(2) We can also allow changes between type="string" and type="select1"
(DATATYPE_CHOICE) (either from string to select1 or from select1 to
string). Note that type="select" (DATATYPE_CHOICE_LIST) cannot be
similarly exchanged as that is a multiple-select field.

(3) We can also allow the title to change. If it does, the update would be
applied in the FormInfo table.

If you can make the above revisions, it is probably easiest to send me a
zip directly ( mitchellsundt@gmail.com ) of your changed files and test
forms.

I don't think it is feasible for you complete the changes needed to update
the various data structures yourself. I think there is some missing code
and perhaps some assumptions later on in the processing that will need to
be examined and adjusted by me or Waylon.

For other future changes, if you were able to produce a full working patch,
the mechanism would be to create your own server-side clone off of ODK
Aggregate, submit your files to that, then share that clone with Waylon and
me. We'd then fold those changes into the main tree, or grant you write
access to the main tree and allow you to merge your changes back into it.

Mitch

··· On Wed, May 16, 2012 at 8:43 AM, Christopher Robert wrote:

Hi Mitch, Waylon,

Okay, so I've implemented and tested a maximally-conservative comparison
function. Accordingly, I updated Form.isSameForm(...) to
call FormParserForJavaRosa.compareXml(this.getFormXml(cc).getBytes("UTF-8"),xmlBytes)
and, based on the result, call xform.setValueFromByteArray(...) to update
the form's XML. Basically, my comparison function indicates whether the XML
is entirely identical and, if not, whether changes are too major to allow
an update. The following bound attribute changes are explicitly allowed on
existing fields only:

relevant
constraint
readonly
required
calculate
appearance
jr:constraintmessage
jr:preload
jr:preloadparams

My impression is that these attributes only affect rendering in Collect.
Also, any and all changes are allowed in the body portion of the XML.

The following changes are explicitly disallowed:

changes in the id
changes in the title
changes in the public key
changes in fields or grouping of fields (i.e., any change to the instance
structure)

The following changes are implicitly disallowed:

changes in type
changes in odk:length
changes in any other attribute, bound or otherwise, not explicitly allowed

This latter set of properties -- the implicit disallowing of most changes
-- was somewhat difficult to implement. This is because bound attributes
are not stored in TreeElements' attribute lists, as one might expect. In
fact, they are not stored at all. Rather, parsed bindings lead JavaRosa to
set TreeElement.required, TreeElement.relevant, etc. I could have easily
walked through the parsed TreeElements and compared such things, but then
when new bindings/features were added the code would default to accepting
changes to new attributes rather than default to rejecting them. This would
not have been desirable. Thus, I performed some very minor surgery
on FormParserForJavaRosa to have it record all raw bindings as they come
through parseBind(...). Then, in addition to walking through TreeElements
to discover structural instance changes and changes to directly-specified
attributes (like id), I also walk through all of the raw bindings. Only
changes to attributes listed in FormParserForJavaRosa.NonSchemaAttributes
are allowed; the whole approach errs on the side of over-rejecting rather
than over-accepting.

I now have a few questions:

  1. To carry this through, I need xform.setValueFromByteArray(...) (or some
    cousin) to update the actual XML when I have decided that such an update
    would be okay. Any ideas?

  2. In total I have proposed updates to 4 source files (though the changes
    to Form.java are still highly provisional), plus 13 test files (a base
    case, an all-allowable-updates case, and then 11 cases that deserve
    rejection). How should I communicate these? (I'm new to Mercurial and the
    whole open-source thing more generally.)

Please just let me know how best to move forward.

Thanks,

Chris

On Friday, May 11, 2012, Mitch S wrote:

Yes, basically, within Form, you need to replace the comparison of the
md5 hash of the already-present form definition file and the new file here:

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646

With a more embellished comparison of the two javarosa-parsed form
definition files.

The incoming form definition file is in xmlBytes
The existing form definition file is in byte Form.getFormXml(cc)

Javarosa is now capable of having two active form models, so you should
be able to parse the two files and stepwise compare the element trees.

The FormParserForJavarosa.constructDataModel() steps through the javarosa
parse tree (TreeElement) to construct the internal data structures, so it
should be fairly straightforward to take that code, change it to do a
side-by-side step-through and node-by-node comparison.

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/parser/FormParserForJavaRosa.java?repo=aggregate#1047

Mitch

On Fri, May 11, 2012 at 12:57 AM, Christopher Robert < chrislrobert@gmail.com> wrote:

Hi Waylon,

Thanks so much for the extremely helpful reply. This sounds excellent and
I am game to take a shot at it.

Your approach sounds like a good one: I'll compare the instance(s) and
bindings and only allow for an explicit list of allowable differences.
Worst case, as new bindings are added, changes to those new bindings will
be judged disallowed until and unless somebody goes in and explicitly adds
them to the list of allowable differences. That should be the safest way to
go.

Judging by code in the likes of createOrFetchFormId() and isSameForm(),
I'll very naturally have an existing Form/IForm on the one hand, and a new
"XFormParameters rootElementDefn" on the other -- and in principle that
seems plenty to be able to systematically compare the instances and
bindings. I should be able to figure it out. (And Form.isSameForm() already
allows re-upload of identical form definitions. I'll just be loosening that
a bit.)

Thanks again, and I'll report back once I have something working,

Chris

On Friday, May 11, 2012, W. Brunette wrote:

Chris,

Your enthusiasm is great and we would be willing to add code from
someone into Aggregate and the rest of the ODK tools that verifies the
form has not changed significantly enough to effect the data
structure. The ODK core team would be happy to integrate a piece of
java code that verifies there are not major differences between an
older revision of the form and a new revision, and if we had this
check we would allow overwrites of the forms. In fact there is already
a versioning scheme in the APIs to handle the different copies;
however, no one has written the java verification code to verify that
the form has not changed significantly.

Therefore in regards to coding efforts, I would not necessarily worry
about learning Aggregate's data storage as I would be happy to add
your code into the right places. What we have not had time to do is to
write the code that verifies that nothing substantially has changed.
If something does change that affects the database it could lead to
all kinds of problems in the pipeline or painful merges. If you (or
anyone) could provide the java code that verifies nothing
significantly has changed I will happy to work with the other ODK core
developers to get it integrated in the tools.

So then the question becomes what does this java code to check for
major need to do to make sure that nothing in the underlying xform
structure has changed that would cause a change to the database.
Here is a list off the top of my head (I will investigate more if
someone takes this up).

  1. The instance has not changed in anyway
    (http://opendatakit.org/help/form-design/#instance)
  2. The binding has not changed in a way that affects the database
    structure (http://opendatakit.org/help/form-design/binding/)

Verifying #1 is easier than verifying #2 as you want to be able to
allow some attributes in the bindings to change such as 'calculate',
'relevant, and 'constraint' but not bindings that effect the database.
For example, 'type' cannot change because that is used to select the
database column definition, or 'odk:length' as that is the size of the
varchar in the database. Therefore I think the easiest thing to do to
make this extensible for future changes to the binding attributes (yes
more are coming in future revisions to ODK) is to write the java
verifier so that binding verification part ensures all attributes are
exactly the same except for a list of attributes that ODK developers
specify is acceptable to change. That way the ODK core team can easily
update the verifies list of attributes that do not cause a change to
the database structure (e.g. 'calculate', 'relevant', 'constraint')
allowing for easy future expansion. NOTE: The question area can change
and it doesn't affect the database.

Basically if you write the java code that verifies there are no major
changes in the xform by verifying both the instance or the
non-approved attributes in the bindings have not changed the ODK core
team will add yours (or someone else

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Mitch,

Thanks for the quick reply.

It seemed that Waylon was suggesting that new bind attributes were coming
and that the code should default to disallowing changes in unknown bind
attributes -- just to be safe. One option would be to leave it this way for
the attributes on tags, but loosen it up on attributes in the
instance definition itself. We could disallow changes in ID separately,
then essentially ignore attributes within .... Would
that seem reasonable?

As for allowing title changes and string<->select1 changes, that's no
problem.

Thx,

Chris

··· On Wednesday, May 16, 2012, Mitch S wrote:

Hi Chris,

(1) What do you mean by "changes in any other attribute, bound or
otherwise, not explicitly allowed." I think you can relax this and avoid
adding more complexity in FormParserForJavarosa.

Is this done in an abundance of caution, or do you have specific examples
in mind where this is necessary? As long as odk:length is not allowed to
change, we should be OK to allow other attribute changes. Attributes are
generally ignored -- ODK Aggregate only examines a few attributes on the
top-level tag in a submission, so allowing changes to attributes in the
instance definition, body or bind statements should be OK. Were you
thinking of specific examples where this might be a concern?

(2) We can also allow changes between type="string" and type="select1"
(DATATYPE_CHOICE) (either from string to select1 or from select1 to
string). Note that type="select" (DATATYPE_CHOICE_LIST) cannot be
similarly exchanged as that is a multiple-select field.

(3) We can also allow the title to change. If it does, the update would be
applied in the FormInfo table.

If you can make the above revisions, it is probably easiest to send me a
zip directly ( mitchellsundt@gmail.com <javascript:_e({}, 'cvml',
'mitchellsundt@gmail.com');> ) of your changed files and test forms.

I don't think it is feasible for you complete the changes needed to update
the various data structures yourself. I think there is some missing code
and perhaps some assumptions later on in the processing that will need to
be examined and adjusted by me or Waylon.

For other future changes, if you were able to produce a full working
patch, the mechanism would be to create your own server-side clone off of
ODK Aggregate, submit your files to that, then share that clone with Waylon
and me. We'd then fold those changes into the main tree, or grant you
write access to the main tree and allow you to merge your changes back into
it.

Mitch

On Wed, May 16, 2012 at 8:43 AM, Christopher Robert < chrislrobert@gmail.com> wrote:

Hi Mitch, Waylon,

Okay, so I've implemented and tested a maximally-conservative comparison
function. Accordingly, I updated Form.isSameForm(...) to
call FormParserForJavaRosa.compareXml(this.getFormXml(cc).getBytes("UTF-8"),xmlBytes)
and, based on the result, call xform.setValueFromByteArray(...) to update
the form's XML. Basically, my comparison function indicates whether the XML
is entirely identical and, if not, whether changes are too major to allow
an update. The following bound attribute changes are explicitly allowed on
existing fields only:

relevant
constraint
readonly
required
calculate
appearance
jr:constraintmessage
jr:preload
jr:preloadparams

My impression is that these attributes only affect rendering in Collect.
Also, any and all changes are allowed in the body portion of the XML.

The following changes are explicitly disallowed:

changes in the id
changes in the title
changes in the public key
changes in fields or grouping of fields (i.e., any change to the instance
structure)

The following changes are implicitly disallowed:

changes in type
changes in odk:length
changes in any other attribute, bound or otherwise, not explicitly allowed

This latter set of properties -- the implicit disallowing of most changes
-- was somewhat difficult to implement. This is because bound attributes
are not stored in TreeElements' attribute lists, as one might expect. In
fact, they are not stored at all. Rather, parsed bindings lead JavaRosa to
set TreeElement.required, TreeElement.relevant, etc. I could have easily
walked through the parsed TreeElements and compared such things, but then
when new bindings/features were added the code would default to accepting
changes to new attributes rather than default to rejecting them. This would
not have been desirable. Thus, I performed some very minor surgery
on FormParserForJavaRosa to have it record all raw bindings as they come
through parseBind(...). Then, in addition to walking through TreeElements
to discover structural instance changes and changes to directly-specified
attributes (like id), I also walk through all of the raw bindings. Only
changes to attributes listed in FormParserForJavaRosa.NonSchemaAttributes
are allowed; the whole approach errs on the side of over-rejecting rather
than over-accepting.

I now have a few questions:

  1. To carry this through, I need xform.setValueFromByteArray(...) (or some
    cousin) to update the actual XML when I have decided that such an update
    would be okay. Any ideas?

  2. In total I have proposed updates to 4 source files (though the changes
    to Form.java are still highly provisional), plus 13 test files (a base
    case, an all-allowable-updates case, and then 11 cases that deserve
    rejection). How should I communicate these? (I'm new to Mercurial and the
    whole open-source thing more generally.)

Please just let me know how best to move forward.

Thanks,

Chris

On Friday, May 11, 2012, Mitch S wrote:

Yes, basically, within Form, you need to replace the comparison of the md5
hash of the already-present form definition file and the new file here:

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646

With a more embellished comparison of the two javarosa-parsed form
definition files.

The incoming form definition file is in xmlBytes
The existing form definition file is in byte Form.getFormXml(cc)

Javarosa is now capable of having two active form models, so you should be
able to parse the two files and stepwise compare the element trees.

The FormParserForJavarosa.constructDataModel() steps through t

OK. This sounds reasonable. Keep it for but don't worry about it
elsewhere. Hopefully that is fairly self-contained.

Changes to the form id (or version) would flag the form as entirely
different, so no worries there.

Mitch

··· On Wed, May 16, 2012 at 11:03 AM, Christopher Robert <chrislrobert@gmail.com wrote:

Hi Mitch,

Thanks for the quick reply.

It seemed that Waylon was suggesting that new bind attributes were coming
and that the code should default to disallowing changes in unknown bind
attributes -- just to be safe. One option would be to leave it this way for
the attributes on tags, but loosen it up on attributes in the
instance definition itself. We could disallow changes in ID separately,
then essentially ignore attributes within .... Would
that seem reasonable?

As for allowing title changes and string<->select1 changes, that's no
problem.

Thx,

Chris

On Wednesday, May 16, 2012, Mitch S wrote:

Hi Chris,

(1) What do you mean by "changes in any other attribute, bound or
otherwise, not explicitly allowed." I think you can relax this and avoid
adding more complexity in FormParserForJavarosa.

Is this done in an abundance of caution, or do you have specific examples
in mind where this is necessary? As long as odk:length is not allowed to
change, we should be OK to allow other attribute changes. Attributes are
generally ignored -- ODK Aggregate only examines a few attributes on the
top-level tag in a submission, so allowing changes to attributes in the
instance definition, body or bind statements should be OK. Were you
thinking of specific examples where this might be a concern?

(2) We can also allow changes between type="string" and type="select1"
(DATATYPE_CHOICE) (either from string to select1 or from select1 to
string). Note that type="select" (DATATYPE_CHOICE_LIST) cannot be
similarly exchanged as that is a multiple-select field.

(3) We can also allow the title to change. If it does, the update would
be applied in the FormInfo table.

If you can make the above revisions, it is probably easiest to send me a
zip directly ( mitchellsundt@gmail.com ) of your changed files and test
forms.

I don't think it is feasible for you complete the changes needed to
update the various data structures yourself. I think there is some missing
code and perhaps some assumptions later on in the processing that will need
to be examined and adjusted by me or Waylon.

For other future changes, if you were able to produce a full working
patch, the mechanism would be to create your own server-side clone off of
ODK Aggregate, submit your files to that, then share that clone with Waylon
and me. We'd then fold those changes into the main tree, or grant you
write access to the main tree and allow you to merge your changes back into
it.

Mitch

On Wed, May 16, 2012 at 8:43 AM, Christopher Robert < chrislrobert@gmail.com> wrote:

Hi Mitch, Waylon,

Okay, so I've implemented and tested a maximally-conservative comparison
function. Accordingly, I updated Form.isSameForm(...) to
call FormParserForJavaRosa.compareXml(this.getFormXml(cc).getBytes("UTF-8"),xmlBytes)
and, based on the result, call xform.setValueFromByteArray(...) to
update the form's XML. Basically, my comparison function indicates whether
the XML is entirely identical and, if not, whether changes are too major to
allow an update. The following bound attribute changes are explicitly
allowed on existing fields only:

relevant
constraint
readonly
required
calculate
appearance
jr:constraintmessage
jr:preload
jr:preloadparams

My impression is that these attributes only affect rendering in Collect.
Also, any and all changes are allowed in the body portion of the XML.

The following changes are explicitly disallowed:

changes in the id
changes in the title
changes in the public key
changes in fields or grouping of fields (i.e., any change to the instance
structure)

The following changes are implicitly disallowed:

changes in type
changes in odk:length
changes in any other attribute, bound or otherwise, not explicitly allowed

This latter set of properties -- the implicit disallowing of most changes
-- was somewhat difficult to implement. This is because bound attributes
are not stored in TreeElements' attribute lists, as one might expect. In
fact, they are not stored at all. Rather, parsed bindings lead JavaRosa to
set TreeElement.required, TreeElement.relevant, etc. I could have easily
walked through the parsed TreeElements and compared such things, but then
when new bindings/features were added the code would default to accepting
changes to new attributes rather than default to rejecting them. This would
not have been desirable. Thus, I performed some very minor surgery
on FormParserForJavaRosa to have it record all raw bindings as they come
through parseBind(...). Then, in addition to walking through TreeElements
to discover structural instance changes and changes to directly-specified
attributes (like id), I also walk through all of the raw bindings. Only
changes to attributes listed in FormParserForJavaRosa.NonSchemaAttributes
are allowed; the whole approach errs on the side of over-rejecting rather
than over-accepting.

I now have a few questions:

  1. To carry this through, I need xform.setValueFromByteArray(...) (or
    some cousin) to update the actual XML when I have decided that such an
    update would be okay. Any ideas?

  2. In total I have proposed updates to 4 source files (though the changes
    to Form.java are still highly provisional), plus 13 test files (a base
    case, an all-allowable-updates case, and then 11 cases that deserve
    rejection). How should I communicate these? (I'm new to Mercurial and the
    whole open-source thing more generally.)

Please just let me know how best to move forward.

Thanks,

Chris

On Friday, May 11, 2012, Mitch S wrote:

Yes, basically, within Form, you need to replace the comparison of the
md5 hash of the already-present form definition file and the new file here:

http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646

With a more embellished comparison of the two javarosa-parsed form
definition files.

The incoming form definition file is in xmlBytes
The existing form definition file is in byte Form.getFormXml(cc)

Javarosa is now capable of having two active form models, so you should
be able to parse the two files and stepwise compare the element trees.

The FormParserForJavarosa.constructDataModel() steps through t

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

To make sure we are all on the same page. I just chatted with Mitch
about this in person and we both agreed that it is better to only
allow changes if the attribute is specifically listed as okay to
change. Basically in the bind stick to disallowing changes in non
approved attributes.

This is in response to
"It seemed that Waylon was suggesting that new bind attributes were
coming and that the code should default to disallowing changes in
unknown bind attributes -- just to be safe"

··· On Wed, May 16, 2012 at 11:51 AM, Mitch S wrote: > OK. This sounds reasonable. Keep it for but don't worry about it > elsewhere. Hopefully that is fairly self-contained. > > Changes to the form id (or version) would flag the form as entirely > different, so no worries there. > > Mitch > > > On Wed, May 16, 2012 at 11:03 AM, Christopher Robert wrote: >> >> Hi Mitch, >> >> Thanks for the quick reply. >> >> It seemed that Waylon was suggesting that new bind attributes were coming >> and that the code should default to disallowing changes in unknown bind >> attributes -- just to be safe. One option would be to leave it this way for >> the attributes on tags, but loosen it up on attributes in the >> instance definition itself. We could disallow changes in ID separately, then >> essentially ignore attributes within .... Would that >> seem reasonable? >> >> As for allowing title changes and string<->select1 changes, that's no >> problem. >> >> Thx, >> >> Chris >> >> >> On Wednesday, May 16, 2012, Mitch S wrote: >>> >>> Hi Chris, >>> >>> (1) What do you mean by "changes in any other attribute, bound or >>> otherwise, not explicitly allowed." I think you can relax this and avoid >>> adding more complexity in FormParserForJavarosa. >>> >>> Is this done in an abundance of caution, or do you have specific examples >>> in mind where this is necessary? As long as odk:length is not allowed to >>> change, we should be OK to allow other attribute changes. Attributes are >>> generally ignored -- ODK Aggregate only examines a few attributes on the >>> top-level tag in a submission, so allowing changes to attributes in the >>> instance definition, body or bind statements should be OK. Were you thinking >>> of specific examples where this might be a concern? >>> >>> (2) We can also allow changes between type="string" and type="select1" >>> (DATATYPE_CHOICE) (either from string to select1 or from select1 to >>> string). Note that type="select" (DATATYPE_CHOICE_LIST) cannot be similarly >>> exchanged as that is a multiple-select field. >>> >>> (3) We can also allow the title to change. If it does, the update would >>> be applied in the FormInfo table. >>> >>> If you can make the above revisions, it is probably easiest to send me a >>> zip directly ( mitchellsundt@gmail.com ) of your changed files and test >>> forms. >>> >>> I don't think it is feasible for you complete the changes needed to >>> update the various data structures yourself. I think there is some missing >>> code and perhaps some assumptions later on in the processing that will need >>> to be examined and adjusted by me or Waylon. >>> >>> For other future changes, if you were able to produce a full working >>> patch, the mechanism would be to create your own server-side clone off of >>> ODK Aggregate, submit your files to that, then share that clone with Waylon >>> and me. We'd then fold those changes into the main tree, or grant you write >>> access to the main tree and allow you to merge your changes back into it. >>> >>> Mitch >>> >>> On Wed, May 16, 2012 at 8:43 AM, Christopher Robert wrote: >>> >>> Hi Mitch, Waylon, >>> >>> Okay, so I've implemented and tested a maximally-conservative comparison >>> function. Accordingly, I updated Form.isSameForm(...) to >>> call FormParserForJavaRosa.compareXml(this.getFormXml(cc).getBytes("UTF-8"),xmlBytes) >>> and, based on the result, call xform.setValueFromByteArray(...) to update >>> the form's XML. Basically, my comparison function indicates whether the XML >>> is entirely identical and, if not, whether changes are too major to allow an >>> update. The following bound attribute changes are explicitly allowed on >>> existing fields only: >>> >>> relevant >>> constraint >>> readonly >>> required >>> calculate >>> appearance >>> jr:constraintmessage >>> jr:preload >>> jr:preloadparams >>> >>> My impression is that these attributes only affect rendering in Collect. >>> Also, any and all changes are allowed in the body portion of the XML. >>> >>> The following changes are explicitly disallowed: >>> >>> changes in the id >>> changes in the title >>> changes in the public key >>> changes in fields or grouping of fields (i.e., any change to the instance >>> structure) >>> >>> >>> The following changes are implicitly disallowed: >>> >>> changes in type >>> changes in odk:length >>> changes in any other attribute, bound or otherwise, not explicitly >>> allowed >>> >>> >>> This latter set of properties -- the implicit disallowing of most changes >>> -- was somewhat difficult to implement. This is because bound attributes are >>> not stored in TreeElements' attribute lists, as one might expect. In fact, >>> they are not stored at all. Rather, parsed bindings lead JavaRosa to set >>> TreeElement.required, TreeElement.relevant, etc. I could have easily walked >>> through the parsed TreeElements and compared such things, but then when new >>> bindings/features were added the code would default to accepting changes to >>> new attributes rather than default to rejecting them. This would not have >>> been desirable. Thus, I performed some very minor surgery >>> on FormParserForJavaRosa to have it record all raw bindings as they come >>> through parseBind(...). Then, in addition to walking through TreeElements to >>> discover structural instance changes and changes to directly-specified >>> attributes (like id), I also walk through all of the raw bindings. Only >>> changes to attributes listed in FormParserForJavaRosa.NonSchemaAttributes >>> are allowed; the whole approach errs on the side of over-rejecting rather >>> than over-accepting. >>> >>> I now have a few questions: >>> >>> 1. To carry this through, I need xform.setValueFromByteArray(...) (or >>> some cousin) to update the actual XML when I have decided that such an >>> update would be okay. Any ideas? >>> >>> 2. In total I have proposed updates to 4 source files (though the changes >>> to Form.java are still highly provisional), plus 13 test files (a base case, >>> an all-allowable-updates case, and then 11 cases that deserve rejection). >>> How should I communicate these? (I'm new to Mercurial and the whole >>> open-source thing more generally.) >>> >>> Please just let me know how best to move forward. >>> >>> Thanks, >>> >>> Chris >>> >>> On Friday, May 11, 2012, Mitch S wrote: >>> >>> Yes, basically, within Form, you need to replace the comparison of the >>> md5 hash of the already-present form definition file and the new file here: >>> >>> >>> http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/aggregate/form/Form.java?repo=aggregate#646 >>> >>> With a more embellished comparison of the two javarosa-parsed form >>> definition files. >>> >>> The incoming form definition file is in xmlBytes[] >>> The existing form definition file is in byte[] Form.getFormXml(cc) >>> >>> Javarosa is now capable of having two active form models, so you should >>> be able to parse the two files and stepwise compare the element trees. >>> >>> The FormParserForJavarosa.constructDataModel() steps through t > > > > > -- > Mitch Sundt > Software Engineer > University of Washington > mitchellsundt@gmail.com

Sounds good. So I made the following changes:

  1. Allow form title to change; at present this causes errors down the line,
    but those can be corrected in the form-updating code.

  2. Allow changes between type string and type select1; this is just a
    hard-wired special-case. No other type changes are allowed.

  3. Allow unknown attributes in the core instance definition to change
    arbitrarily; unknown attributes in s are still disallowed by default.
    In the end, to take advantage of the code I'd already put in and tested, I
    went with the following approach. There is a list of
    "NonchangeableInstanceAttributes" that governs which core-instance
    attributes are explicitly not allowed to change; by default, this includes
    only the "id" attribute, so id attributes can't be changed but all other
    attributes are treated as changeable by default. There is also a
    "ChangeableBindAttributes" list that governs which bind attributes are
    explicitly allowed to change; by default, this includes attributes like
    relevant, constraint, etc. Any changes in unlisted bind attributes will be
    flagged as a major change and a form-update will be disallowed. Overall,
    this sounds like the right balance and the right way to make the code
    easily maintainable.

Mitch, I'll send you the zipped files now.

Thanks again to you both,

Chris

··· On Wednesday, May 16, 2012, W. Brunette wrote:

To make sure we are all on the same page. I just chatted with Mitch
about this in person and we both agreed that it is better to only
allow changes if the attribute is specifically listed as okay to
change. Basically in the bind stick to disallowing changes in non
approved attributes.

This is in response to
"It seemed that Waylon was suggesting that new bind attributes were
coming and that the code should default to disallowing changes in
unknown bind attributes -- just to be safe"

On Wed, May 16, 2012 at 11:51 AM, Mitch S mitchellsundt@gmail.com wrote:

OK. This sounds reasonable. Keep it for but don't worry about it
elsewhere. Hopefully that is fairly self-contained.

Changes to the form id (or version) would flag the form as entirely
different, so no worries there.

Mitch

On Wed, May 16, 2012 at 11:03 AM, Christopher Robert chrislrobert@gmail.com wrote:

Hi Mitch,

Thanks for the quick reply.

It seemed that Waylon was suggesting that new bind attributes were
coming
and that the code should default to disallowing changes in unknown bind
attributes -- just to be safe. One option would be to leave it this way
for
the attributes on tags, but loosen it up on attributes in the
instance definition itself. We could disallow changes in ID separately,
then
essentially ignore attributes within .... Would
that
seem reasonable?

As for allowing title changes and string<->select1 changes, that's no
problem.

Thx,

Chris

On Wednesday, May 16, 2012, Mitch S wrote:

Hi Chris,

(1) What do you mean by "changes in any other attribute, bound or
otherwise, not explicitly allowed." I think you can relax this and
avoid
adding more complexity in FormParserForJavarosa.

Is this done in an abundance of caution, or do you have specific
examples
in mind where this is necessary? As long as odk:length is not allowed
to
change, we should be OK to allow other attribute changes. Attributes
are
generally ignored -- ODK Aggregate only examines a few attributes on
the
top-level tag in a submission, so allowing changes to attributes in the
instance definition, body or bind statements should be OK. Were you
thinking
of specific examples where this might be a concern?

(2) We can also allow changes between type="string" and type="select1"
(DATATYPE_CHOICE) (either from string to select1 or from select1 to
string). Note that type="select" (DATATYPE_CHOICE_LIST) cannot be
similarly
exchanged as that is a multiple-select field.

(3) We can also allow the title to change. If it does, the update would
be applied in the FormInfo table.

If you can make the above revisions, it is probably easiest to send me
a
zip directly ( mitchellsundt@gmail.com ) of your changed files and
test
forms.

I don't think it is feasible for you complete the changes needed to
update the various data structures yourself. I think there is some
missing
code and perhaps some assumptions later on in the processing that will
need
to be examined and adjusted by me or Waylon.

For other future changes, if you were able to produce a full working
patch, the mechanism would be to create your own server-side clone off
of
ODK Aggregate, submit your files to that, then share that clone with
Waylon
and me. We'd then fold those changes into the main tree, or grant you
write
access to the main tree and allow you to merge your changes back into
it.

Mitch

On Wed, May 16, 2012 at 8:43 AM, Christopher Robert chrislrobert@gmail.com wrote:

Hi Mitch, Waylon,

Okay, so I've implemented and tested a maximally-conservative
comparison
function. Accordingly, I updated Form.isSameForm(...) to

call FormParserForJavaRosa.compareXml(this.getFormXml(cc).getBytes("UTF-8"),xmlBytes)

and, based on the result, call xform.setValueFr