Thundering herd of devices trying to upload images when team gets to office

I am planning to use ODK Collect to collect a large number of responses, including one photo, in areas with no connectivity. The tablets will then be taken back to the office where they will sync over a very inconsistent and fragile internet connection to a server in the capital.

If the image resizing change 1210 isn't ready in time or doesn't go small enough, I plan to roll a custom version of ODK collect to get those images down to around 8kb each. Each tablet will need to upload around 35 records a night, each with an image. There will be around 35 tablets. This will happen 6 days a week for a period of around three months. (This three month putsch will happen about 10 times over three to four years in different parts of the country.)

I am concerned about the Thundering Herd problem - every time the connection returns, all tablets will presumably start trying to upload, potentially flooding the available bandwidth and knocking each other's connections out. I think there are a few solutions - from random retry delays to the server allocating unique retry windows. Is this something that has been encountered before?

From a very ignorant look at receivers/NetworkReceiver.java and tasks/InstanceServerUploader.java it looks like the upload is attempted on detecting connectivity but not retried? (I've never worked with Android and haven't worked with Java for decades so any tips very welcome and no tip is too condescending.) Adding a retry delay would mean something here and something in the Rosa server code?

Huge thanks

NetworkReceiver link (I was forbidden from posting more than two links.

Has anything else been tried around this? I read about a Nafundi fork that copied data between devices using NFC - is this an alternative? If it possible that there is a glitch in connectivity that will break the transmission but not trigger a NetworkReceiver.onReceive?

IF you are using an Android device there are multiple ways you can do secure file transfers. I did something similar to this with secure data from a survey. We used a sft protocol to transmit the data securely. I set up the devices to Download the data sets to the secure server only when they were in good WiFi coverage. Because of the size of the data we didn't want to run up our data usage cost. That is why we used WiFi only. This can be user controlled our it can be automated. If you want more information on the process let me know.

Thanks,
Brad

Thanks Brad, that's reassuring, I hadn't seen any of this, how did you automate your SFTP? Did you have some other process on the device accessing ODK Collect's folders of XML and media?

it isn't the security I'm worried about so much as making sure the devices manage to upload reliably every day, over a very fragile shared connection, in as automated a way as possible, as we'll have many tablets and backlogs will cause problems. Specifically I'm worried that the tablets will all flood any connection in concert then fail.

Hi Chris, I haven't encountered this issue before, but I think it might be an issue for you.

First, I'd confirm that the behavior you expect will actually happen. So for example, put a few devices with huge submissions on a slow network, turn off the connection, turn it back on and see what happens. My guess is that they won't all recognize the connection at the same time.

I quickly looked at the source and I think the behavior of auto-send is that if you finalize the form and there is a connection, it'll try to send. If there is no connection, then it won't send, but it also means that when the connection comes back, it'll try to resend.

It seems like the cleanest solution would be adding a random delay when auto-send happens (say between 0-5000 ms). Another thing to add would be a retry that has an exponential back off (with random delay) on that retry. Both features would be PRs that would be welcome in Collect!

Thanks very much Yaw for your input.

I may also need a self-restarting service to trigger the sync, as if the tablets are waiting hours before connectivity comes, the apps may be killed by the OS.

I will post back here with some results and then look at a PR. Thanks again.

Brad anything you can provide me on this process would be really useful thanks. Did you SFTP the raw data files in sdcard/odk/instances? How did you then import that? How did you set up the devices to wait for good wifi?

This could be really handy as here in Ethiopia internet is occasionally disconnected for weeks or months so we'd ideally be able to backup and transport SD cards rather than have to swap whole tablets.