Aggregate: retrying failed submissions creates duplicate records

Hi everyone,

Greetings! This is my first time posting to the list, so my apologies if I
make a mistake or have missed something obvious.

I am trying to diagnose a problem where re-submission of failed forms
creates duplicates. We have seen this a lot in production, due to
intermittent network connectivity interrupting submissions.

Our configuration:

  • ODK Aggregate 1.4.7
  • Production on Oracle Java 1.7.0_80-b15
  • MySQL 5.5.44
  • ODK Collect 1.4.5 (1048)

The scenario:

  • Forms are submitted from Collect (using InstanceUploaderActivity)
  • The submission is processed successfully by Aggregate
  • Collect times out trying to read the response, leaving the finalized
    forms intact
  • The failed forms are immediately resubmitted (never leaving the
    InstanceUploaderActivity)

What we expected should happen:

  • The second submission would be successful, but would not create
    another record in MySQL

The behavior we are observing:

  • Both submissions create records with the same information (differing
    by _URI)

After searching this group's archives for a solution, I found these:

Duplicate Form Submissions (network latency?)
https://groups.google.com/d/msg/opendatakit/2RPM_lN1T6k/Iwhw1SlEq9sJ

Generating a Unique ID for a Form Submission
https://groups.google.com/d/msg/opendatakit/93v4lCzf4YI/qGcfIt2SSs4J

I checked that the form definition includes an instanceID in the form
definition, and it does:

<h:html xmlns="http://www.w3.org/2002/xforms"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Location</h:title>



jr:meta
jr:instanceID/
</jr:meta>
....


...

...

I also confirmed that Collect is generating the appropriate instanceIDs by
inspecting the instances on disk:

uuid:acd4e53d-cdec-45fb-bcb9-68c041654026 ....

Lastly, I found that the duplicate records in LOCATION_CORE have a
META_INSTANCE_ID value matching the instanceID above: "
uuid:acd4e53d-cdec-45fb-bcb9-68c041654026".

By attaching a debugger to Aggregate, I discovered that the query used by
SubmissionParser.java:383 is actually trying to find the instance using the
_URI value. This _URI value is distinct for each submission and doesn't
match the instanceID so it never matches.

Is this expected?

Any help would be greatly appreciated. I'm happy to provide more details if
it helps.

Cheers,
Brent Atkinson

The jr namespace is incorrect. It should be http://openrosa.org/xforms

The spec is here:
https://bitbucket.org/javarosa/javarosa/wiki/OpenRosaMetaDataSchema

because you used an unrecognized namespace, ODK Aggregate did not extract
and use the jr:instanceID as the _URI, causing the duplicate.

··· On Wed, Aug 12, 2015 at 1:22 PM, Brent Atkinson wrote:

Hi everyone,

Greetings! This is my first time posting to the list, so my apologies if I
make a mistake or have missed something obvious.

I am trying to diagnose a problem where re-submission of failed forms
creates duplicates. We have seen this a lot in production, due to
intermittent network connectivity interrupting submissions.

Our configuration:

  • ODK Aggregate 1.4.7
  • Production on Oracle Java 1.7.0_80-b15
  • MySQL 5.5.44
  • ODK Collect 1.4.5 (1048)

The scenario:

  • Forms are submitted from Collect (using InstanceUploaderActivity)
  • The submission is processed successfully by Aggregate
  • Collect times out trying to read the response, leaving the finalized
    forms intact
  • The failed forms are immediately resubmitted (never leaving the
    InstanceUploaderActivity)

What we expected should happen:

  • The second submission would be successful, but would not create
    another record in MySQL

The behavior we are observing:

  • Both submissions create records with the same information (differing
    by _URI)

After searching this group's archives for a solution, I found these:

Duplicate Form Submissions (network latency?)
https://groups.google.com/d/msg/opendatakit/2RPM_lN1T6k/Iwhw1SlEq9sJ

Generating a Unique ID for a Form Submission
https://groups.google.com/d/msg/opendatakit/93v4lCzf4YI/qGcfIt2SSs4J

I checked that the form definition includes an instanceID in the form
definition, and it does:

<h:html xmlns="http://www.w3.org/2002/xforms"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Location</h:title>



jr:meta
jr:instanceID/
</jr:meta>
....


...

...

I also confirmed that Collect is generating the appropriate instanceIDs by
inspecting the instances on disk:

n0:instanceIDuuid:acd4e53d-cdec-45fb-bcb9-68c041654026</n0:instanceID>
</n0:meta>
....

Lastly, I found that the duplicate records in LOCATION_CORE have a
META_INSTANCE_ID value matching the instanceID above: "
uuid:acd4e53d-cdec-45fb-bcb9-68c041654026".

By attaching a debugger to Aggregate, I discovered that the query used by
SubmissionParser.java:383 is actually trying to find the instance using the
_URI value. This _URI value is distinct for each submission and doesn't
match the instanceID so it never matches.

Is this expected?

Any help would be greatly appreciated. I'm happy to provide more details
if it helps.

Cheers,
Brent Atkinson

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Mitch,

I'll give that a try. Thank you for your help!

Brent

··· On Wed, Aug 12, 2015 at 4:56 PM, Mitch Sundt wrote:

The jr namespace is incorrect. It should be http://openrosa.org/xforms

The spec is here:
https://bitbucket.org/javarosa/javarosa/wiki/OpenRosaMetaDataSchema

because you used an unrecognized namespace, ODK Aggregate did not extract
and use the jr:instanceID as the _URI, causing the duplicate.

On Wed, Aug 12, 2015 at 1:22 PM, Brent Atkinson brent.atkinson@gmail.com wrote:

Hi everyone,

Greetings! This is my first time posting to the list, so my apologies if
I make a mistake or have missed something obvious.

I am trying to diagnose a problem where re-submission of failed forms
creates duplicates. We have seen this a lot in production, due to
intermittent network connectivity interrupting submissions.

Our configuration:

  • ODK Aggregate 1.4.7
  • Production on Oracle Java 1.7.0_80-b15
  • MySQL 5.5.44
  • ODK Collect 1.4.5 (1048)

The scenario:

  • Forms are submitted from Collect (using InstanceUploaderActivity)
  • The submission is processed successfully by Aggregate
  • Collect times out trying to read the response, leaving the
    finalized forms intact
  • The failed forms are immediately resubmitted (never leaving the
    InstanceUploaderActivity)

What we expected should happen:

  • The second submission would be successful, but would not create
    another record in MySQL

The behavior we are observing:

  • Both submissions create records with the same information
    (differing by _URI)

After searching this group's archives for a solution, I found these:

Duplicate Form Submissions (network latency?)
https://groups.google.com/d/msg/opendatakit/2RPM_lN1T6k/Iwhw1SlEq9sJ

Generating a Unique ID for a Form Submission
https://groups.google.com/d/msg/opendatakit/93v4lCzf4YI/qGcfIt2SSs4J

I checked that the form definition includes an instanceID in the form
definition, and it does:

<h:html xmlns="http://www.w3.org/2002/xforms"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Location</h:title>



jr:meta
jr:instanceID/
</jr:meta>
....


...

...

I also confirmed that Collect is generating the appropriate instanceIDs
by inspecting the instances on disk:

n0:instanceIDuuid:acd4e53d-cdec-45fb-bcb9-68c041654026</n0:instanceID>
</n0:meta>
....

Lastly, I found that the duplicate records in LOCATION_CORE have a
META_INSTANCE_ID value matching the instanceID above: "
uuid:acd4e53d-cdec-45fb-bcb9-68c041654026".

By attaching a debugger to Aggregate, I discovered that the query used by
SubmissionParser.java:383 is actually trying to find the instance using the
_URI value. This _URI value is distinct for each submission and doesn't
match the instanceID so it never matches.

Is this expected?

Any help would be greatly appreciated. I'm happy to provide more details
if it helps.

Cheers,
Brent Atkinson

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to a topic in the
Google Groups "ODK Community" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit/wP51IKNlqO4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.