Google AppEngine limits requests to a max size of 32 MB


This limitation is affecting Collect users that are adding individual binary attachments (image, audio, video, file) larger than 32 MB. Sending those submissions to an Aggregate instance hosted in AppEngine will fail with a Collect upload error: write error: ssl=0x9d03b600: I/O error during system call, broken pipe error and a log stacktrace:

11-26 13:03:25.507 6169-6308/ E/HttpClientConnection: Write error: ssl=0xa93ec580: I/O error during system call, Broken pipe
        at Method)
        at org.opendatakit.httpclientandroidlib.entity.mime.content.FileBody.writeTo(
        at org.opendatakit.httpclientandroidlib.entity.mime.AbstractMultipartForm.doWriteTo(
        at org.opendatakit.httpclientandroidlib.entity.mime.AbstractMultipartForm.writeTo(
        at org.opendatakit.httpclientandroidlib.entity.mime.MultipartFormEntity.writeTo(
        at org.opendatakit.httpclientandroidlib.impl.DefaultBHttpClientConnection.sendRequestEntity(
        at org.opendatakit.httpclientandroidlib.impl.conn.CPoolProxy.sendRequestEntity(
        at org.opendatakit.httpclientandroidlib.protocol.HttpRequestExecutor.doSendRequest(
        at org.opendatakit.httpclientandroidlib.protocol.HttpRequestExecutor.execute(
        at org.opendatakit.httpclientandroidlib.impl.execchain.MainClientExec.execute(
        at org.opendatakit.httpclientandroidlib.impl.execchain.ProtocolExec.execute(
        at org.opendatakit.httpclientandroidlib.impl.execchain.RetryExec.execute(
        at org.opendatakit.httpclientandroidlib.impl.execchain.RedirectExec.execute(
        at org.opendatakit.httpclientandroidlib.impl.client.InternalHttpClient.doExecute(
        at org.opendatakit.httpclientandroidlib.impl.client.CloseableHttpClient.execute(
        at org.opendatakit.httpclientandroidlib.impl.client.CloseableHttpClient.execute(
        at android.os.AsyncTask$
        at android.os.AsyncTask$SerialExecutor$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$

(source Collect upload error: write error: ssl=0x9d03b600: I/O error during system call, broken pipe)

Some insights into this issue:

  • This could also happen to submissions without attachments, but it seems very unlikely to have an only text submission larger than 32MB.
  • Aggregate can tell Collect how big requests should be.
    • A user could set this to any value (e.g. 1GB) supported by the infrastructure where Aggregate is deployed
    • Setting this to a value higher than 32MB while serving Aggregate in AppEngine wouldn't make any sense since the infrastructure will enforce a 32MB limit.
  • Collect divides a submission into several smaller requests. If the server doesn't tell how big request should be, it defaults to 10MB requests. An example:
    • Let's say there's a submission with four videos of size 4 MB, 5 MB, 6 MB, and 5MB respectively
    • Collect will make 3 requests:
      1. submission XML + videos #1 & #2 for a total of ~9MB
      2. submission XML + video #3 for a total of ~6MB
      3. submission XML + video #4 for a total of ~5MB
  • Collect will send complete attachments (it won't make smaller chunks of them). This means that when an attachment is 50MB big and the server's max request size is 32MB, it can't make a first 32MB chunk and a second chunk of 18MB to work around the limitation.

As discussed in the @TSC, we want to start an open discussion about this and try to explore the solution space for this issue. This is a non comprehensive list of topics we think we should discuss:

  • Workarounds:
    • How to prevent users from adding too large binary attachments
    • How to design forms that will prevent this situation
  • Alternative hosting providers for Aggregate

So, I'll start with some stuff we talked about in the last TSC meeting:

How to prevent users from adding too large binary attachments

Enketo already does that by preemptively querying the server's limitation before letting the user add attachments and preventing the user of adding attachments that are too big.

Tradeoffs and issues of this approach are:

  • Collect can (and will) run in scenarios without connectivity, which would make it impossible to check server limitations.
  • Once a blank form is pulled from a server, Collect's workflow can be 100% offline (combined with Briefcase). In this scenario, Collect might apply limitations that are not required because submissions won't ever be sent to any server.
  • Telling users that they can't add an attachment can be frustrating, especially if they can't solve the situation. For example, can users cut or compress videos and audios with their devices?

How to design forms that will prevent this situation

If we end up suggesting attaching smaller attachments, instead of having one binary field, forms could be designed with repeat groups of one binary field to let users attach a sequence of audios, videos, photos...

Alternative hosting providers for Aggregate

  • We're already releasing a virtual machine OVA file. How can this be leveraged in hosting providers?
  • We also have Aggregate Docker images and Docker Compose setups. Is there any way to reuse them?
  • AWS (Amazon)
1 Like

Short term

  1. We update the docs and the installer and warn folks that attachments to App Engine must be smaller to 32 MBs. In the docs, we suggest using multiple questions if appropriate.

  2. We update Collect's media/file widgets so that we see an appspot URL (in the settings or in the submission_url) and a user adds an attachment greater than 32 MB, we add a small warning to the widget.

    This attachment is x MB. Attachments greater than 32 MB will be rejected by Google App Engine.

    The above change might miss some GAE servers with custom domains, but I'd expect that number to be small. Moreover, those folks could transparently change their backend if necessary.

  3. We update Collect's error messages to provide a more useful error message if that submission is sent. Worst case scenario, users can get that data using Briefcase.

Long term

After 10 years of making changes to support App Engine, I'm mostly allergic to making any more.

  1. I think we should help people get turnkey setups on Linode/DigitalOcean/Vultr and doing that through Docker seems like the easiest option.
  2. For AWS users, a CloudFormation template seems better in that regard. For both of these, we'd include an SSL setup and default to PostgreSQL.

I'm not as interested in AMIs because it's yet another thing we have to manage.