Problems Uploading Form with Large Number of Images

1. What is the problem? Be very detailed.
We are having problems uploading forms with many (100+, 8-10mb each) image files. We've tried uploading remotely over wifi, as we do for most of our forms, but the submission fails or times out at some point. This isn't totally unexpected as we are looking at +1GB instances. Pulling the data using Briefcase works fine, but it's throwing an error when I try to push the data back to the aggregate server (see below). It's not ideal that we can't upload these forms remotely, but I at least need Briefcase to be able to handle it.

2. What app or server are you using and on what device and operating system? Include version numbers.
Collect 1.28
Aggregate 1.4.15
Briefcase 1.18

3. What you have you tried to fix the problem?
Pushing data with Briefcase, manually uploading submission in Aggregate (nothing happens after clicking to upload the submission)

4. What steps can we take to reproduce the problem?

5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.
2020-12-01 10:38:55,037 [ForkJoinPool-5-worker-1] INFO XFormParser - Creating FormDef from parsed XML finished in 5.216 ms
2020-12-01 10:39:02,478 [ForkJoinPool-6-worker-1] INFO o.o.b.p.a.PushToAggregateTracker - Push Cultural Heritage Form - Start pushing form and submissions
2020-12-01 10:39:02,489 [ForkJoinPool-6-worker-1] INFO o.o.b.p.a.PushToAggregateTracker - Push Cultural Heritage Form - Form already exists in Aggregate
2020-12-01 10:39:02,495 [ForkJoinPool-6-worker-3] INFO o.o.b.p.a.PushToAggregateTracker - Push Cultural Heritage Form - Sending submission 1 of 4 (1/42)
2020-12-01 10:39:02,495 [ForkJoinPool-6-worker-1] INFO o.o.b.p.a.PushToAggregateTracker - Push Cultural Heritage Form - Sending submission 2 of 4 (1/44)
2020-12-01 10:39:02,511 [ForkJoinPool-6-worker-2] INFO o.o.b.p.a.PushToAggregateTracker - Push Cultural Heritage Form - Sending submission 3 of 4 (1/141)
2020-12-01 10:39:02,513 [ForkJoinPool-6-worker-4] INFO o.o.b.p.a.PushToAggregateTracker - Push Cultural Heritage Form - Sending submission 4 of 4 (1/131)
2020-12-01 10:39:03,158 [ForkJoinPool-6-worker-1] ERROR o.o.briefcase.reused.job.JobsRunner - Error running Job
java.lang.OutOfMemoryError: null
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source)
at java.util.concurrent.ForkJoinTask.reportException(Unknown Source)
at java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(Unknown Source)
at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.util.stream.ReferencePipeline.forEach(Unknown Source)
at java.util.stream.ReferencePipeline$Head.forEach(Unknown Source)
at org.opendatakit.briefcase.push.aggregate.PushToAggregate.lambda$push$7(PushToAggregate.java:95)
at org.opendatakit.briefcase.reused.job.Job.lambda$thenAccept$8(Job.java:134)
at org.opendatakit.briefcase.reused.job.Job.lambda$thenRun$6(Job.java:109)
at org.opendatakit.briefcase.reused.job.JobsRunner.lambda$launchAsync$1(JobsRunner.java:65)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
at java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Unknown Source)
at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space

In general, most people don't need 8-10MB images. Use the max-pixels appearance to scale down the image.

Next look through the Aggregate logs to see if there are any hints as to why those large images are failing.

As to the issue at hand, as the logs suggest, Briefcase is out of RAM. How much RAM does your computer have?

Thanks @yanokwa, you got me on the right track. My computer has 32GB of RAM so it shouldn't have been running out. Turns out I had the 32bit version of Java so it couldn't access it all. Briefcase is working now that I've installed the 64bit version.

1 Like

I worked on a similar problem with a huge roll out, where I did not put image restriction and ended up with 600GB of data, and each submission was around 30+MB. I had to work on a lot of fronts to handle the whole eco-system; from upgrading the server massively (AWS cloud) to patching MySQL, PHP timeouts, tomcat timeouts, moving MySQL server files to another disk separate from OS, the list went on and on. But the good thing at the end was that none of the components gave up. ODK Aggregate, Tomcat, MySQL, infrastructure, all stood up firmly to handle that amount of data. Though all my data backup routines got messed up (you don't take 600GB data backup via briefcase, etc.), but still everything stood its ground firmly!

That was the hard lesson I learnt in that project, for never leaving the image size field uncontrolled :slight_smile:

Cheers,
Saad

1 Like

Thanks for the encouragement, Saad! Our database is now 200GB+ so I'm dealing with many of the same issues you had, I'm sure. As a person with no IT background it's certainly been a learning experience.

If you have the option of taking a break in the field work, I would recommend upgrading to ODK Central. Among other benefits, the biggest one is that it stores the images in the form of files, and not in database as BLOB. This will save you considerable resources on disk size, database/dashboard speed, as well as backup routines.

Secondly, for backups, use briefcase CLI from the command-line on the same server, and use localhost disk to store images. For a database of 200GB, I think briefcase will first take around 30+mins in getting formlist from the server, and then move forward. Just open a screen (if linux) and leave it running in it.

Let me know if I can help any more in any way.

Cheers,
Saad

Thanks for sharing your lessons learned and offering to help, @Saad! You should write up a Showcase piece about your roll out so we have it all in one place.

Central does store images in the database as BLOBs. What we don't do is split the submissions into key/values the way that Aggregate does.

If you are on Linux, you can also run it with cron so it runs as a scheduled task. Oh, and no need to leave the screen open. Use Bash's job control to background/foreground the job or tmux to keep the job running even after you close the window.