Duplicate Form Submissions (network latency?)

Hi Everyone,
Well, we've finally gone LIVE for our survey (3000 sample size) with 20
Android phones in Tanzania. The data collectors in the field are submitting
good data, but alas form submissions sometimes choke due to bad/slow
network (allegedly 3G). :slight_smile:
Anyhow, I've trained the team to continue doing surveys AND submit once the
network in their area is a little better -- maybe even in the evening. :slight_smile:

Just curious about duplicate form submissions though...
I'm always in Aggregate (web browser interface) and I'll see the EXACT same
form entry (pic, GPS coordinates & same data for every other field
including end time BUT on an odd occasion {just a few} the END TIME will be
different - off by a few hours for one particular submission). I simply
DELETE the duplicate and all is well.

Any ideas why this happens though? Is it simply the network? Oh, we're
using the latest version of KoboCollect. Should the app recoginize that a
particular form HAS already been submitted? Like I said, the MAJORITY of
submissions are good...just a few duplicates.

A quick note:

  • pics are supposed to be 320X240px but I suspect 1 or 2 data collectors
    were playing with the devices and left the camera setting at 1600X1200px
    (which made the pic 587KB as opposed to 60KB). But then I called them ASAP
    & told them to change the resolution. :slight_smile:
  • I tell the guys "If you can't get to Google in the web browser, then the
    mobile network in your area is most likely the problem so don't worry about
    it."
  • I also tell the guys to submit 1 or 2 forms at a time and NOT to "Toggle
    all" followed by "Send Selected." :slight_smile:

Our Setup:
a) Using a paid AppEngine {$4 daily budget for a F1 setup -- just in case
:slight_smile: }

b) Recent AppEngine Dashboard view
*Errors [image: help]https://developers.google.com/appengine/kb/general#erroruris
*URI Count % Errorslast 17 hrs
/submissionhttps://appengine.google.com/logs?version_id=1.358452918654559566&app_id=s~tccpricetracker&filter_type=labels&filter=path%3A"%2Fsubmission"+status%3A[45]\d\d&severity_level_override=1&view=Search
42327%
/formListhttps://appengine.google.com/logs?version_id=1.358452918654559566&app_id=s~tccpricetracker&filter_type=labels&filter=path%3A"%2FformList"+status%3A[45]\d\d&severity_level_override=1&view=Search
9100%
/view/binaryDatahttps://appengine.google.com/logs?version_id=1.358452918654559566&app_id=s~tccpricetracker&filter_type=labels&filter=path%3A"%2Fview%2FbinaryData"+status%3A[45]\d\d&severity_level_override=1&view=Search
40.3%
/local_login.htmlhttps://appengine.google.com/logs?version_id=1.358452918654559566&app_id=s~tccpricetracker&filter_type=labels&filter=path%3A"%2Flocal_login.html"+status%3A[45]\d\d&severity_level_override=1&view=Search
350%
/Aggregate.htmlhttps://appengine.google.com/logs?version_id=1.358452918654559566&app_id=s~tccpricetracker&filter_type=labels&filter=path%3A"%2FAggregate.html"+status%3A[45]\d\d&severity_level_override=1&view=Search
112%
/crossdomain.xmlhttps://appengine.google.com/logs?version_id=1.358452918654559566&app_id=s~tccpricetracker&filter_type=labels&filter=path%3A"%2Fcrossdomain.xml"+status%3A[45]\d\d&severity_level_override=1&view=Search
1
*Instances [image: help]https://developers.google.com/appengine/docs/adminconsole/instances
*Number of Instances - Detailshttps://appengine.google.com/instances?app_id=s~tccpricetrackerAverage
QPSAverage LatencyAverage Memory3 total0.000Unknown ms92.6 MBytes
Billing Status: Charge Issued: $3.00 ( Daily budget: $4.00 ) - Settingshttps://appengine.google.com/billing/settings?app_id=s~tccpricetracker
Quotas reset every 24 hours. Next reset: 7 hrs [image: help]https://developers.google.com/appengine/docs/quotas
Resource

UsageBillablePriceCost
Frontend Instance Hours
20.11 Instance Hours0.00$0.08/ Hour$0.00
Backend Instance Hours
13.37 Instance Hours4.37$0.08/ Hour$0.35
Datastore Stored Data
0.32 GBytes0.00$0.008/ GByte-day$0.00
Logs Stored Data
0.07 GBytes0.00$0.008/ GByte-day$0.00
Task Queue Stored Task Bytes
0.00 GBytes0.00$0.008/ GByte-day$0.00
Blobstore Stored Data
0.00 GBytes0.00$0.0043/ GByte-day$0.00
Datastore Write Operations
0.16 Million Ops0.11$1.00/ Million Ops$0.11
Datastore Read Operations
0.38 Million Ops0.33$0.70/ Million Ops$0.23
Datastore Small Operations
0.00 Million Ops0.00$0.10/ Million Ops$0.00
Outgoing Bandwidth
0.03 GBytes0.00$0.12/ GByte$0.00
Recipients Emailed
[image: 0%]
0%0 of 1000$0.01/ 100 Recipients$0.00
Stanzas Sent
00$0.10/ 100K Stanzas$0.00
Channels Created
[image: 0%]
0%0 of 95,0400$0.01/ 100 Opens$0.00
Estimated cost for the last 17 hours:
[image: 17%]
$0.68 / $4.00*

Ok, gotta jet from this hotel! Bus leaves Mwanza by Lake Victoria at
4:30am...heading back to Dar. :slight_smile:

Cheers.

~DataMax

Oops, here's a peek of my AppEngine dashboard since the copy & paste didn't
show everything.
I'm outta here...will peek replies on my celly while in transit. :slight_smile:

Cheers!

The critical question is whether you defined an instanceId in your form.

http://groups.google.com/group/opendatakit/browse_thread/thread/f77bf8942cdfe182/12e7e33119a83172?lnk=gst&q=instanceId#12e7e33119a83172

If you did not define an instanceId, ODK Aggregate cannot de-duplicate your
data.

I just noticed that the Opendatakit.org form design pages don't mention
this important aspect of form design. I'll update them next week.

Mitch

··· On Fri, Jun 1, 2012 at 5:53 PM, DataMax wrote:

Oops, here's a peek of my AppEngine dashboard since the copy & paste
didn't show everything.
I'm outta here...will peek replies on my celly while in transit. :slight_smile:

Cheers!

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Mitch! Thanks for your reply.
Well, after looking at the form design, no instanceid field is
defined...only these 3 autofilled fields:

Here are the accompanying bind details for the above 3:


<bind id="end" nodeset="/EAsurvey/end" type="xsd:dateTime"
jr:preload="timestamp" jr:preloadParams="end"/

I took a peek at the linkhttps://groups.google.com/forum/?fromgroups#!topic/opendatakit/93v4lCzf4YIyou provided.
I'm still wondering...if ODK Collect (or KoboCollect in our case) has *
successfully* submitted a form AND its status has changed from FINISHED to
SUBMITTED, then why would it even re-submit said instance -- regardless of
whether an instanceid field is defined or not? Just curious.

SOLUTION?
You stated the following herehttps://groups.google.com/forum/?fromgroups#!topic/opendatakit/93v4lCzf4YI
:
For ODK Aggregate, you don't need to specify the namespace, just have a
group in your form.
With ODK Collect 1.1.7 and later, the bind for this element that replicates
the instanceID that would otherwise be generated by Aggregate would be:

You can construct your own instanceID expressions. However, you should
avoid symbols and punctuation other than colons and dashes since the
parsing logic within Aggregate is likely fragile if you go wild with
punctuation (and that is used later on when retrieving images, repeat
groups, etc.).

So, you're saying that I should edit the xml file and insert the following
code in the appropriate section?


This will prevent submitting duplicates in my case? Will the concat('uuid:',
uuid()) formula be enough for my purposes or do I need to add something
extra? Like I said, it seems to only happen with a bad network.

Thanks again!

~DataMax

··· =========

On Saturday, June 2, 2012 4:08:03 AM UTC+3, Mitch wrote:

The critical question is whether you defined an instanceId in your form.

http://groups.google.com/group/opendatakit/browse_thread/thread/f77bf8942cdfe182/12e7e33119a83172?lnk=gst&q=instanceId#12e7e33119a83172

If you did not define an instanceId, ODK Aggregate cannot de-duplicate
your data.

I just noticed that the Opendatakit.org form design pages don't mention
this important aspect of form design. I'll update them next week.

Mitch

On Fri, Jun 1, 2012 at 5:53 PM, DataMax maxtheitpro@gmail.com wrote:

Oops, here's a peek of my AppEngine dashboard since the copy & paste
didn't show everything.
I'm outta here...will peek replies on my celly while in transit. :slight_smile:

Cheers!

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Here's a triplicate submission. :slight_smile: I deleted 2 of them...all same data.
Wed May 30 12:06:38 UTC 2012Wed May 30 12:13:53 UTC 2012
Wed May 30 12:06:38 UTC 2012Wed May 30 12:13:53 UTC 2012
Wed May 30 12:06:38 UTC 2012Wed May 30 12:13:53 UTC 2012
Same date and time! It's gotta be the network while the data collectors are
probably "toggling" and pressing the "Submit Selected Forms" button. I'll
check!

And this one had 4 duplicate submissions! Same date & time! At least I can
spot them right away since the pic is the same...just deleted 3 of them! :slight_smile:

Wed May 30 11:45:44 UTC 2012Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012Wed May 30 11:49:00 UTC 2012

It would be interesting to see if you can add the uuid section. That should
give you a clue where the problem is happening.

··· On Monday, June 4, 2012, DataMax wrote:

And this one had 4 duplicate submissions! Same date & time! At least I can
spot them right away since the pic is the same...just deleted 3 of them! :slight_smile:

Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

There are two situations where you might see multiple records:

(1) long network delays during submissions. If the phone times-out the
socket connection to the server, or if the network connection drops (e.g.,
you leave a wifi hotspot) after the submission is sent to the server but
before the phone can receive the server's "the filled-in form data has been
successfully recorded" response, then the phone will leave the submission
as finalized but not-yet-sent, and the user will then see it and attempt to
upload it again, causing a duplicate if the request actually ran to
completion on the server.

This is the most likely case. This can also happen if you click cancel
during the transmission. Depending upon where that is handled, it may not
record the outcome of the last transmission before aborting.

(2) media attachments -- if you have over 10MB of media attachments or over
100 media files in a submission, the submission will be split across
multiple uploads. You would end up with multiple submissions, but each
would only contain a portion of the captured media attachments.

Mitch

··· On Mon, Jun 4, 2012 at 6:20 AM, ゴー・ニコライ wrote:

It would be interesting to see if you can add the uuid section. That
should give you a clue where the problem is happening.

On Monday, June 4, 2012, DataMax wrote:

And this one had 4 duplicate submissions! Same date & time! At least I
can spot them right away since the pic is the same...just deleted 3 of
them! :slight_smile:

Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012
Wed May 30 11:45:44 UTC 2012 Wed May 30 11:49:00 UTC 2012

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

This can also happen if you click cancel during the transmission.
Depending upon where that is handled, it may not record the outcome of the
last transmission before aborting.

Oh damn!! I noticed that, sometimes, after pressing SEND SELECTED, Kobo
would keep 'spinning away' for a few minutes AND then if I press CANCEL,
the process stops BUT the queue is EMPTY so I assumed all was well because
the form's status would ALWAYS change to SUBMITTED. I thought I found a
little trick to get out of those little situations when the app would be
caught up in a loop trying to submit a form. So you're saying that even
though it shows up as SUBMITTED that it could be sent (submitted) again?
Arghhh! :slight_smile:

And I "taught" that trick to the 20 data collectors out in the field. But
like I said, usually the network is fine & Kobo is able to submit with that
"SUCCESS" message at the end.

Okay, then what do we do WHEN Kobo goes into a spinning loop trying to
submit a form?? Will it eventually stop? Do we wait for it to stop? To a
typical end user, "something's wrong!" and the urge to press CANCEL is
great. I'm talking of that thing spinning around for minutes -- probably
using up precious battery cycles. Perhaps ODK Collect (and KoboCollect) can
have a max timeout setting so that we don't have to hit CANCEL?

Should I tell the guys in the field to stop hitting CANCEL after it takes a
long time to submit? Like I said, the form does have it's status changed
from FINISHED to SUBMITTED so I always thought all was okay.

Thanks for your reply, Mitch!

ODK Collect should not report anything as Sent (and empty the queue) unless
it has received confirmation that the server has accepted the submission.
If you ever notice any such data loss, let us know.

The issue I raised was that the cancel in ODK Collect may have sent data to
the server and cancelled without recording a successful response from the
server, leaving the submission in ODK Collect still flagged as finalized
and not-yet-sent -- and therefore showing when the user tries to Send data
to the server later on. I.e., we only mark the submission as sent after we
receive a successful response from the server, and never at the initiation
of the transmission. Cancelling early can therefore lead to
double-submissions of the data to the server.

Mitch

··· On Tue, Jun 5, 2012 at 12:01 AM, DataMax wrote:

This can also happen if you click cancel during the transmission.

Depending upon where that is handled, it may not record the outcome of the
last transmission before aborting.

Oh damn!! I noticed that, sometimes, after pressing SEND SELECTED, Kobo
would keep 'spinning away' for a few minutes AND then if I press CANCEL,
the process stops BUT the queue is EMPTY so I assumed all was well because
the form's status would ALWAYS change to SUBMITTED. I thought I found a
little trick to get out of those little situations when the app would be
caught up in a loop trying to submit a form. So you're saying that even
though it shows up as SUBMITTED that it could be sent (submitted) again?
Arghhh! :slight_smile:

And I "taught" that trick to the 20 data collectors out in the field. But
like I said, usually the network is fine & Kobo is able to submit with that
"SUCCESS" message at the end.

Okay, then what do we do WHEN Kobo goes into a spinning loop trying to
submit a form?? Will it eventually stop? Do we wait for it to stop? To a
typical end user, "something's wrong!" and the urge to press CANCEL is
great. I'm talking of that thing spinning around for minutes -- probably
using up precious battery cycles. Perhaps ODK Collect (and KoboCollect) can
have a max timeout setting so that we don't have to hit CANCEL?

Should I tell the guys in the field to stop hitting CANCEL after it takes
a long time to submit? Like I said, the form does have it's status changed
from FINISHED to SUBMITTED so I always thought all was okay.

Thanks for your reply, Mitch!

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com