Sending batch inform in submission header

Hi ODK devs,

I'm responding to a request to make it easier for a server to find out how
many batches a (large) submission is divided into
. This allows an OpenRosa
server to determine when a record can be considered complete, without
having to analyze the XForm and XML submission to identify any missing
files and keep track of these.

My solution is simply to add 2 headers to each submission:

X-Enketo-Batch-Index: 0 //--> 0-based index
X-Enketo-Batch-Total: 4 //--> number of batches

This particular OpenRosa server does not use ODK Collect, but I wanted to
check with you if is a requirement you've run into or would like to support
in ODK Collect for other reasons. If so, we could perhaps name the headers
X-OpenRosa-Batch-Index and X-OpenRosa-Batch-Total. Any alternative
solutions would be very welcome too of course!

Cheers,
Martijn

ODK Collect already supports this by adding an "isIncomplete" form-part
to the submission:

StringBody sb = new StringBody("yes",

ContentType.TEXT_PLAIN.withCharset(Charset.forName("UTF-8")));

builder.addPart("isIncomplete", sb);

The presence of an "isIncomplete" string form part indicates that
there will be another request to follow.

I don't recall why this was not made part of the OpenRosa spec.

··· On Mon, May 9, 2016 at 2:48 PM, Martijn van de Rijdt wrote:

Hi ODK devs,

I'm responding to a request to make it easier for a server to find out how
many batches a (large) submission is divided into
. This allows an
OpenRosa server to determine when a record can be considered complete,
without having to analyze the XForm and XML submission to identify any
missing files and keep track of these.

My solution is simply to add 2 headers to each submission:

X-Enketo-Batch-Index: 0 //--> 0-based index
X-Enketo-Batch-Total: 4 //--> number of batches

This particular OpenRosa server does not use ODK Collect, but I wanted to
check with you if is a requirement you've run into or would like to support
in ODK Collect for other reasons. If so, we could perhaps name the headers
X-OpenRosa-Batch-Index and X-OpenRosa-Batch-Total. Any alternative
solutions would be very welcome too of course!

Cheers,
Martijn

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Thanks for that, Mitch.

I'll discuss whether that meets their requirement and if so adopt that. I
think it may not, as it doesn't enable the server to check for missing
0...n-1 batches.

A slight deviation of topic: Does ODK Aggregate require this form part to
be added if a record is divided into batches? It seems that it doesn't.
However, I did experience a critical issue with submission batches in
Aggregate, but I thought it had to do with not sending batches sequentially
(this issue https://github.com/opendatakit/opendatakit/issues/1195). Now,
I'm wondering if I need to also add this form part to avoid any issues.

··· On Mon, May 9, 2016 at 7:35 PM, Mitch Sundt wrote:

ODK Collect already supports this by adding an "isIncomplete"
form-part to the submission:

StringBody sb = new StringBody("yes",

ContentType.TEXT_PLAIN.withCharset(Charset.forName("UTF-8")));

builder.addPart("isIncomplete", sb);

The presence of an "isIncomplete" string form part indicates that there will be another request to follow.

I don't recall why this was not made part of the OpenRosa spec.

On Mon, May 9, 2016 at 2:48 PM, Martijn van de Rijdt martijn@enketo.org wrote:

Hi ODK devs,

I'm responding to a request to make it easier for a server to find out how
many batches a (large) submission is divided into
. This allows an
OpenRosa server to determine when a record can be considered complete,
without having to analyze the XForm and XML submission to identify any
missing files and keep track of these.

My solution is simply to add 2 headers to each submission:

X-Enketo-Batch-Index: 0 //--> 0-based index
X-Enketo-Batch-Total: 4 //--> number of batches

This particular OpenRosa server does not use ODK Collect, but I wanted to
check with you if is a requirement you've run into or would like to support
in ODK Collect for other reasons. If so, we could perhaps name the headers
X-OpenRosa-Batch-Index and X-OpenRosa-Batch-Total. Any alternative
solutions would be very welcome too of course!

Cheers,
Martijn

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/vsBlXvoa9Xk/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

Correct. ODK Aggregate does not pay attention to this field. It evaluates
completeness by inspecting the content of the submission.

In general, when submitting data from a device, it does not make sense to
flood your communication channel with all N parts of a submission - or to
flood it with M different submissions.

The limiting factor for mobile devices is always bandwidth. And this is
especially the case with 2G, 3G and satellite connections.

While flooding the channel with many parts and/or submissions does improve
overall bandwidth for data transmission, it does not help at the server --
ODK Aggregate has only 60 seconds to process a single request before Google
AppEngine forcibly terminates it (and Tomcat and other web hosting
environments have similar overall-request-duration timeouts). This limit
prevents DDoS attacks were many connections are opened to the server and
bytes are slowly dribbled in over many minutes; the clock starts upon
receipt of the first byte.

If each part takes 1 unit of time to send over the channel, and you flood
that channel with N interleaved parts, it will take ~N units of time to
complete the transmission of a part that would otherwise have taken 1 unit
of time. That additional (N-1) units of time significantly cuts into the 60
second budget.

The 201 and data corruption are disturbing. There is a mutex that should
prevent the second request from advancing. This explains some failures I
have seen.

··· On Tue, May 10, 2016 at 9:12 AM, Martijn van de Rijdt wrote:

Thanks for that, Mitch.

I'll discuss whether that meets their requirement and if so adopt that. I
think it may not, as it doesn't enable the server to check for missing
0...n-1 batches.

A slight deviation of topic: Does ODK Aggregate require this form part to
be added if a record is divided into batches? It seems that it doesn't.
However, I did experience a critical issue with submission batches in
Aggregate, but I thought it had to do with not sending batches sequentially
(this issue https://github.com/opendatakit/opendatakit/issues/1195).
Now, I'm wondering if I need to also add this form part to avoid any issues.

On Mon, May 9, 2016 at 7:35 PM, Mitch Sundt mitchellsundt@gmail.com wrote:

ODK Collect already supports this by adding an "isIncomplete"
form-part to the submission:

StringBody sb = new StringBody("yes",

ContentType.TEXT_PLAIN.withCharset(Charset.forName("UTF-8")));

builder.addPart("isIncomplete", sb);

The presence of an "isIncomplete" string form part indicates that there will be another request to follow.

I don't recall why this was not made part of the OpenRosa spec.

On Mon, May 9, 2016 at 2:48 PM, Martijn van de Rijdt martijn@enketo.org wrote:

Hi ODK devs,

I'm responding to a request to make it easier for a server to find out how
many batches a (large) submission is divided into
. This allows an
OpenRosa server to determine when a record can be considered complete,
without having to analyze the XForm and XML submission to identify any
missing files and keep track of these.

My solution is simply to add 2 headers to each submission:

X-Enketo-Batch-Index: 0 //--> 0-based index
X-Enketo-Batch-Total: 4 //--> number of batches

This particular OpenRosa server does not use ODK Collect, but I wanted
to check with you if is a requirement you've run into or would like to
support in ODK Collect for other reasons. If so, we could perhaps name the
headers X-OpenRosa-Batch-Index and X-OpenRosa-Batch-Total. Any
alternative solutions would be very welcome too of course!

Cheers,
Martijn

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/vsBlXvoa9Xk/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Thanks,

Yes, I agree that sequential makes most sense. Sending them simultaneously
wasn't deliberate (I think). Good to know about the 60 second request
timeout in Aggregate.

Best regards,
Martijn

··· On Tue, May 10, 2016 at 10:31 AM, Mitch Sundt wrote:

Correct. ODK Aggregate does not pay attention to this field. It evaluates
completeness by inspecting the content of the submission.

In general, when submitting data from a device, it does not make sense to
flood your communication channel with all N parts of a submission - or to
flood it with M different submissions.

The limiting factor for mobile devices is always bandwidth. And this is
especially the case with 2G, 3G and satellite connections.

While flooding the channel with many parts and/or submissions does improve
overall bandwidth for data transmission, it does not help at the server --
ODK Aggregate has only 60 seconds to process a single request before Google
AppEngine forcibly terminates it (and Tomcat and other web hosting
environments have similar overall-request-duration timeouts). This limit
prevents DDoS attacks were many connections are opened to the server and
bytes are slowly dribbled in over many minutes; the clock starts upon
receipt of the first byte.

If each part takes 1 unit of time to send over the channel, and you flood
that channel with N interleaved parts, it will take ~N units of time to
complete the transmission of a part that would otherwise have taken 1 unit
of time. That additional (N-1) units of time significantly cuts into the 60
second budget.

The 201 and data corruption are disturbing. There is a mutex that should
prevent the second request from advancing. This explains some failures I
have seen.

On Tue, May 10, 2016 at 9:12 AM, Martijn van de Rijdt martijn@enketo.org wrote:

Thanks for that, Mitch.

I'll discuss whether that meets their requirement and if so adopt that. I
think it may not, as it doesn't enable the server to check for missing
0...n-1 batches.

A slight deviation of topic: Does ODK Aggregate require this form part to
be added if a record is divided into batches? It seems that it doesn't.
However, I did experience a critical issue with submission batches in
Aggregate, but I thought it had to do with not sending batches sequentially
(this issue https://github.com/opendatakit/opendatakit/issues/1195).
Now, I'm wondering if I need to also add this form part to avoid any issues.

On Mon, May 9, 2016 at 7:35 PM, Mitch Sundt mitchellsundt@gmail.com wrote:

ODK Collect already supports this by adding an "isIncomplete"
form-part to the submission:

StringBody sb = new StringBody("yes",

ContentType.TEXT_PLAIN.withCharset(Charset.forName("UTF-8")));

builder.addPart("isIncomplete", sb);

The presence of an "isIncomplete" string form part indicates that there will be another request to follow.

I don't recall why this was not made part of the OpenRosa spec.

On Mon, May 9, 2016 at 2:48 PM, Martijn van de Rijdt <martijn@enketo.org wrote:

Hi ODK devs,

I'm responding to a request to make it easier for a server to find out how
many batches a (large) submission is divided into
. This allows an
OpenRosa server to determine when a record can be considered complete,
without having to analyze the XForm and XML submission to identify any
missing files and keep track of these.

My solution is simply to add 2 headers to each submission:

X-Enketo-Batch-Index: 0 //--> 0-based index
X-Enketo-Batch-Total: 4 //--> number of batches

This particular OpenRosa server does not use ODK Collect, but I wanted
to check with you if is a requirement you've run into or would like to
support in ODK Collect for other reasons. If so, we could perhaps name the
headers X-OpenRosa-Batch-Index and X-OpenRosa-Batch-Total. Any
alternative solutions would be very welcome too of course!

Cheers,
Martijn

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter
https://twitter.com/enketo | Blog http://blog.enketo.org/

--
You received this message because you are subscribed to the Google
Groups "ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/vsBlXvoa9Xk/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--
You received this message because you are subscribed to a topic in the
Google Groups "ODK Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit-developers/vsBlXvoa9Xk/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Revolutionizing data collection since 2012.

Enketo https://enketo.org/ | LinkedIn
http://www.linkedin.com/company/enketo-llc | GitHub
https://github.com/enketo | Twitter https://twitter.com/enketo
| Blog http://blog.enketo.org/