ODK Briefcase CLI - Not pulling newest submissions on first try

ODK Briefcase v 1.12 CLI

I'm using Briefcase CLI to pull data as per the below code and experiencing a strange behaviour. Seems to be an effect since moving to v1.12.0 as this wasn't happening when I was previously using v1.10.1

When I run the code, the software does not pull the newest data that was submitted on the current calendar date

For instance
Number of forms yesterday : n = 16,001
Pulled today using script : n = 16,001
Pulled again straight after : n = 16,147

So 146 records got pulled second time.

Looking at the CSV output of the form, there are 146 records with today's submission date, but time precedes the time of the first pull by several hours, so these are not new submissions sent to server since first pull.

I wonder if anyone can shed light on this.

#!/usr/bin/env bash

rm /Users/$USER/tasks/FOO.log
rm -rf /Users/$USER/FOO/output
rm /Users/$USER/FOO.csv.files.tar.*

## declare an array variable
declare -a arr=(
"FORMID1"
"FORMID2"
"FORMID3"
"FORMID4"
                )

## now loop through the above array
for i in "${arr[@]}"
        do
                j="/Users/$USER/FOO/output/$i.csv"
                until ( test -e "$j"); 
                do
                        echo $j 
                        java -jar odkbriefcase_1.12.jar -plla -pp -e --form_id "$i" -f "$i".csv --storage_directory /Users/$USER/FOO --aggregate_url https://FOO.server.xx/FOO --odk_username admin --odk_password PASSWORD -ed /Users/$USER/FOO/output -pf FOO.PRIVATE.KEY.pem -oc
                done
        done
date +"%Y_%m_%d_%H_%M" >  /Users/$USER/FOO/output/000_timestamp.txt
tar -cvzf /Users/$USER/FOO.csv.files.tar.gz /Users/$USER/FOO/output

Hi @chrissyhroberts!

Some questions:

  • Are you saying that there are exactly 16001 submissions in Aggregate and you're getting 146 extra?
  • Are those 146 extra duplicates of other submissions?
  • Do you get 146 extra forms consistently if you reproduce the experiment?
  • Is this happening in all your forms?

No sorry, I didn't articulate this well.

  1. I have a bunch of submissions in the ODK briefcase folder
  2. I run the script
  3. The submissions in the ODK briefcase folder remain the same - i.e. nothing added
  4. I run the script
  5. There's now new submissions in the folder

So the actual numbers change depending on how many submissions have been sent since the last pull, but the basic behaviour is that on the first run, nothing seems to happen.

No problems with duplicates

Thanks for the clarification. I thought I was going crazy until I've realized that you actually are pulling and exporting in just one Briefcase execution. I didn't realize you could do that! :slight_smile:

(emphasis mine)

I'm going to review how the pulling process works... In the meantime, I have some questions

  • Could you tell me in what time zone are you running Briefcase?
  • If you run the script again (provided no one sent new submissions), do you get the same amount of exported submissions?

Thanks @ggalmazor,

I'm in UTC/GMT timezone
Running script for a third time doesn't add more files as far as I can tell from previous runs

I can't actually test this as I am currently having some different issues with i/o errors and server non-response.

2018-11-05 13:53:42,575 [briefcase-pull-1-thread-12] INFO o.a.http.impl.execchain.RetryExec - I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {s}->https://xxx.ac.uk:443: The target server failed to respond

I think that possibly these issues are happening because we now have so many submissions on the server (>25000) and maybe both problems have the same root cause.

Do you think it might be that briefcase is querying the database too many times/too fast?

I've tried to reproduce the error with a test form with 50k submissions (only the export part, not pulling) and I think we can rule out the export part of the process as the cause of this issue.

I think we might have to focus on the server, but I need to check how Briefcase and Aggregate play together in this context (lots of submissions)

3 posts were split to a new topic: Dealing with 25000 submissions on Aggregate

Hi again!

I've tried with a 50k submissions form and I can't reproduce this issue. I think I'd need more info about the setup:

  • Aggregate version, Tomcat version, mysql/postgresql
  • Maybe the blank form so that I can reproduce the same environment here
  • Is Briefcase running in the same machine as Aggregate? Is it accessing through a fast local network? over the internet?

Thanks for trying so hard to solve this. Much appreciated.
I'm looking in to this and will get back to you asap with details.
Our IT team thinks that this might all have something to do with the load balancer. At present they are trying to set up a version of the server that circumnavigates the LB and if this solves the problem then I think we might have nailed the cause, if not the actual reason.
Watch this space.
Thanks again
C.

Further info.

We're setting up another copy of aggregate that queries the same underlying database as the main aggregate instance. This one only looks inwards on our local network, so can be queried directly over LAN without the firewall and load balancer in the way. Should be running in next 48 hours and I will update when tested.