Tips for pulling (relatively) large projects using Briefcase?

Hi all --

I've just started working with a program which deployed a registration form
which required a photo of every person in the household. They went forth
and collected registration data from over 60,000 households, and now (given
poor internet) are unable to access their data. General best practice tips
aside (such as regularly pulling and reviewing instances) - are there any
solutions which would all a pull of all this data without requiring a
continuous connection to the internet?

I'm on a solid connection (in the US) but only able to pull about 15-25
forms a minute (using briefcase). At this rate, I'd need about 50 hours to
download the entire project. While this is doable, I'm wondering if its
possible to pause a 'pull' and to restart it (such as at the field level
when internet is not stable).

Thanks for any tips.

~Lloyd

apologies for the typo:

...solutions which would allow a pull of all this data without
requiring...

....

··· On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote: > > Hi all -- > > I've just started working with a program which deployed a registration > form which required a photo of every person in the household. They went > forth and collected registration data from over 60,000 households, and now > (given poor internet) are unable to access their data. General best > practice tips aside (such as regularly pulling and reviewing instances) - > are there any solutions which would all a pull of all this data without > requiring a continuous connection to the internet? > > I'm on a solid connection (in the US) but only able to pull about 15-25 > forms a minute (using briefcase). At this rate, I'd need about 50 hours to > download the entire project. While this is doable, I'm wondering if its > possible to pause a 'pull' and to restart it (such as at the field level > when internet is not stable). > > Thanks for any tips. > > ~Lloyd >

Hi Lloyd,

Is this on a Google AppEngine instance, or is it self-hosted?
If you're self-hosted (and thus have direct database access) it's possible
to write scripts to access the images in the database and reduce their size
and save for downloading through some other direct means (not through
Briefcase). It takes some technical and database knowledge though.

Regards,
Andrew

··· On Monday, 27 June 2016 19:05:15 UTC+2, Lloyd Owen Banwart wrote: > > apologies for the typo: > > ...solutions which would *allow* a pull of all this data without > requiring... > > .... > > On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote: >> >> Hi all -- >> >> I've just started working with a program which deployed a registration >> form which required a photo of every person in the household. They went >> forth and collected registration data from over 60,000 households, and now >> (given poor internet) are unable to access their data. General best >> practice tips aside (such as regularly pulling and reviewing instances) - >> are there any solutions which would all a pull of all this data without >> requiring a continuous connection to the internet? >> >> I'm on a solid connection (in the US) but only able to pull about 15-25 >> forms a minute (using briefcase). At this rate, I'd need about 50 hours to >> download the entire project. While this is doable, I'm wondering if its >> possible to pause a 'pull' and to restart it (such as at the field level >> when internet is not stable). >> >> Thanks for any tips. >> >> ~Lloyd >> >

Thanks Andrew. I should have noted this is hosted on Google AppEngine.

~lb

··· On Tuesday, June 28, 2016 at 11:42:52 AM UTC-4, Andrew wrote: > > Hi Lloyd, > > Is this on a Google AppEngine instance, or is it self-hosted? > If you're self-hosted (and thus have direct database access) it's possible > to write scripts to access the images in the database and reduce their size > and save for downloading through some other direct means (not through > Briefcase). It takes some technical and database knowledge though. > > Regards, > Andrew > > On Monday, 27 June 2016 19:05:15 UTC+2, Lloyd Owen Banwart wrote: >> >> apologies for the typo: >> >> ...solutions which would *allow* a pull of all this data without >> requiring... >> >> .... >> >> On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote: >>> >>> Hi all -- >>> >>> I've just started working with a program which deployed a registration >>> form which required a photo of every person in the household. They went >>> forth and collected registration data from over 60,000 households, and now >>> (given poor internet) are unable to access their data. General best >>> practice tips aside (such as regularly pulling and reviewing instances) - >>> are there any solutions which would all a pull of all this data without >>> requiring a continuous connection to the internet? >>> >>> I'm on a solid connection (in the US) but only able to pull about 15-25 >>> forms a minute (using briefcase). At this rate, I'd need about 50 hours to >>> download the entire project. While this is doable, I'm wondering if its >>> possible to pause a 'pull' and to restart it (such as at the field level >>> when internet is not stable). >>> >>> Thanks for any tips. >>> >>> ~Lloyd >>> >>

Briefcase is certainly restartable.

For larger datasets, there is a performance gain by, after Pulling, issuing
a Push to the same server (which is a no-op on the server, but updates some
state within ODK Briefcase to make the next Pull a bit more efficient).

··· On Tue, Jun 28, 2016 at 8:51 AM, Lloyd Owen Banwart <lloyd.banwart@gmail.com wrote:

Thanks Andrew. I should have noted this is hosted on Google AppEngine.

~lb

On Tuesday, June 28, 2016 at 11:42:52 AM UTC-4, Andrew wrote:

Hi Lloyd,

Is this on a Google AppEngine instance, or is it self-hosted?
If you're self-hosted (and thus have direct database access) it's
possible to write scripts to access the images in the database and reduce
their size and save for downloading through some other direct means (not
through Briefcase). It takes some technical and database knowledge though.

Regards,
Andrew

On Monday, 27 June 2016 19:05:15 UTC+2, Lloyd Owen Banwart wrote:

apologies for the typo:

...solutions which would allow a pull of all this data without
requiring...

....

On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote:

Hi all --

I've just started working with a program which deployed a registration
form which required a photo of every person in the household. They went
forth and collected registration data from over 60,000 households, and now
(given poor internet) are unable to access their data. General best
practice tips aside (such as regularly pulling and reviewing instances) -
are there any solutions which would all a pull of all this data without
requiring a continuous connection to the internet?

I'm on a solid connection (in the US) but only able to pull about 15-25
forms a minute (using briefcase). At this rate, I'd need about 50 hours to
download the entire project. While this is doable, I'm wondering if its
possible to pause a 'pull' and to restart it (such as at the field level
when internet is not stable).

Thanks for any tips.

~Lloyd

--
--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Thanks Mitch. I was running into the hurdle of each time I restarted,
waiting hours for briefcase to resolve already downloaded instances.

I am still facing a 'FAILED' message -- (see attached screenshot). I
mistakenly stated that I was pulling from Google App engine before in this
chain (probably because 95% of the time I am), but for this instance I'm
pulling from a survey deployed on Kobo (I'm cross posting this on the Kobo
forum as well). Based on the Kobo dashboard I know the deployment only has
62,920 instances. Briefcase's status window (before the 'FAILED' notice),
cites it has fetched 62,920 instances.

Even though it has a 'Failed' notice, I have tried to export, but get a
second error message resulting from an expected .jpg that isn't found --
allowing me to export less than 200 cases.

Any insights, tips, thoughts, to obtain this data would be very welcome.
[side note of relevance: I'm unable to download the data directly from
Kobo, I get a perpetual 'pending...click to refresh' message when
attempting to download any format of the data].

~Lloyd

image

··· On Thursday, July 7, 2016 at 7:55:05 PM UTC-4, Mitch Sundt wrote: > > Briefcase is certainly restartable. > > For larger datasets, there is a performance gain by, after Pulling, > issuing a Push to the same server (which is a no-op on the server, but > updates some state within ODK Briefcase to make the next Pull a bit more > efficient). > > > > On Tue, Jun 28, 2016 at 8:51 AM, Lloyd Owen Banwart <lloyd....@gmail.com > wrote: > >> Thanks Andrew. I should have noted this is hosted on Google AppEngine. >> >> ~lb >> >> On Tuesday, June 28, 2016 at 11:42:52 AM UTC-4, Andrew wrote: >>> >>> Hi Lloyd, >>> >>> Is this on a Google AppEngine instance, or is it self-hosted? >>> If you're self-hosted (and thus have direct database access) it's >>> possible to write scripts to access the images in the database and reduce >>> their size and save for downloading through some other direct means (not >>> through Briefcase). It takes some technical and database knowledge though. >>> >>> Regards, >>> Andrew >>> >>> On Monday, 27 June 2016 19:05:15 UTC+2, Lloyd Owen Banwart wrote: >>>> >>>> apologies for the typo: >>>> >>>> ...solutions which would *allow* a pull of all this data without >>>> requiring... >>>> >>>> .... >>>> >>>> On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote: >>>>> >>>>> Hi all -- >>>>> >>>>> I've just started working with a program which deployed a registration >>>>> form which required a photo of every person in the household. They went >>>>> forth and collected registration data from over 60,000 households, and now >>>>> (given poor internet) are unable to access their data. General best >>>>> practice tips aside (such as regularly pulling and reviewing instances) - >>>>> are there any solutions which would all a pull of all this data without >>>>> requiring a continuous connection to the internet? >>>>> >>>>> I'm on a solid connection (in the US) but only able to pull about >>>>> 15-25 forms a minute (using briefcase). At this rate, I'd need about 50 >>>>> hours to download the entire project. While this is doable, I'm wondering >>>>> if its possible to pause a 'pull' and to restart it (such as at the field >>>>> level when internet is not stable). >>>>> >>>>> Thanks for any tips. >>>>> >>>>> ~Lloyd >>>>> >>>> -- >> -- >> Post: opend...@googlegroups.com >> Unsubscribe: opendatakit...@googlegroups.com >> Options: http://groups.google.com/group/opendatakit?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "ODK Community" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to opendatakit...@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Mitch Sundt > Software Engineer > University of Washington > mitche...@gmail.com >

That's a question for Kobo.

ODK Aggregate will not make a submission available for download until it
has all of its image attachments or until you edit the submission and
destroy the links to the missing attachments (via the Forms Management /
Submission Admin sub-tab).

Sounds like Kobo does not have that protective step. ODK Briefcase expects
all referenced attachments to be available.

Also sounds like Kobo might not return the responses that ODK Briefcase
expects when there are no more submissions to be downloaded.

··· On Mon, Jul 11, 2016 at 6:00 AM, Lloyd Owen Banwart <lloyd.banwart@gmail.com wrote:

Thanks Mitch. I was running into the hurdle of each time I restarted,
waiting hours for briefcase to resolve already downloaded instances.

I am still facing a 'FAILED' message -- (see attached screenshot). I
mistakenly stated that I was pulling from Google App engine before in this
chain (probably because 95% of the time I am), but for this instance I'm
pulling from a survey deployed on Kobo (I'm cross posting this on the Kobo
forum as well). Based on the Kobo dashboard I know the deployment only has
62,920 instances. Briefcase's status window (before the 'FAILED' notice),
cites it has fetched 62,920 instances.

Even though it has a 'Failed' notice, I have tried to export, but get a
second error message resulting from an expected .jpg that isn't found --
allowing me to export less than 200 cases.

Any insights, tips, thoughts, to obtain this data would be very welcome.
[side note of relevance: I'm unable to download the data directly from
Kobo, I get a perpetual 'pending...click to refresh' message when
attempting to download any format of the data].

~Lloyd

On Thursday, July 7, 2016 at 7:55:05 PM UTC-4, Mitch Sundt wrote:

Briefcase is certainly restartable.

For larger datasets, there is a performance gain by, after Pulling,
issuing a Push to the same server (which is a no-op on the server, but
updates some state within ODK Briefcase to make the next Pull a bit more
efficient).

On Tue, Jun 28, 2016 at 8:51 AM, Lloyd Owen Banwart lloyd....@gmail.com wrote:

Thanks Andrew. I should have noted this is hosted on Google AppEngine.

~lb

On Tuesday, June 28, 2016 at 11:42:52 AM UTC-4, Andrew wrote:

Hi Lloyd,

Is this on a Google AppEngine instance, or is it self-hosted?
If you're self-hosted (and thus have direct database access) it's
possible to write scripts to access the images in the database and reduce
their size and save for downloading through some other direct means (not
through Briefcase). It takes some technical and database knowledge though.

Regards,
Andrew

On Monday, 27 June 2016 19:05:15 UTC+2, Lloyd Owen Banwart wrote:

apologies for the typo:

...solutions which would allow a pull of all this data without
requiring...

....

On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote:

Hi all --

I've just started working with a program which deployed a
registration form which required a photo of every person in the household.
They went forth and collected registration data from over 60,000
households, and now (given poor internet) are unable to access their data.
General best practice tips aside (such as regularly pulling and reviewing
instances) - are there any solutions which would all a pull of all this
data without requiring a continuous connection to the internet?

I'm on a solid connection (in the US) but only able to pull about
15-25 forms a minute (using briefcase). At this rate, I'd need about 50
hours to download the entire project. While this is doable, I'm wondering
if its possible to pause a 'pull' and to restart it (such as at the field
level when internet is not stable).

Thanks for any tips.

~Lloyd

--
--
Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google
Groups "ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitche...@gmail.com

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Lloyd,
As I wrote on the KoBo thread
https://groups.google.com/forum/#!topic/kobo-users/iArUmvnOGDw,
downloading all instances with Briefcase is not a problem and works just
like in Aggregate (attachments and all). The reason it's taking a long time
is just because of the huge number of attachments, which is slowed down by
AWS serving as the file storage. I just tried resuming a regular project
without any attachments but 10k submissions and it only took a few seconds
to confirm that all instances had already been downloaded.

In short, all good on the Briefcase front.

(Thanks, Mitch!)

Best,
Tino

··· On Mon, Jul 11, 2016 at 3:00 PM Mitch Sundt wrote:

That's a question for Kobo.

ODK Aggregate will not make a submission available for download until it
has all of its image attachments or until you edit the submission and
destroy the links to the missing attachments (via the Forms Management /
Submission Admin sub-tab).

Sounds like Kobo does not have that protective step. ODK Briefcase expects
all referenced attachments to be available.

Also sounds like Kobo might not return the responses that ODK Briefcase
expects when there are no more submissions to be downloaded.

On Mon, Jul 11, 2016 at 6:00 AM, Lloyd Owen Banwart < lloyd.banwart@gmail.com> wrote:

Thanks Mitch. I was running into the hurdle of each time I restarted,
waiting hours for briefcase to resolve already downloaded instances.

I am still facing a 'FAILED' message -- (see attached screenshot). I
mistakenly stated that I was pulling from Google App engine before in this
chain (probably because 95% of the time I am), but for this instance I'm
pulling from a survey deployed on Kobo (I'm cross posting this on the Kobo
forum as well). Based on the Kobo dashboard I know the deployment only has
62,920 instances. Briefcase's status window (before the 'FAILED' notice),
cites it has fetched 62,920 instances.

Even though it has a 'Failed' notice, I have tried to export, but get a
second error message resulting from an expected .jpg that isn't found --
allowing me to export less than 200 cases.

Any insights, tips, thoughts, to obtain this data would be very welcome.
[side note of relevance: I'm unable to download the data directly from
Kobo, I get a perpetual 'pending...click to refresh' message when
attempting to download any format of the data].

~Lloyd

On Thursday, July 7, 2016 at 7:55:05 PM UTC-4, Mitch Sundt wrote:

Briefcase is certainly restartable.

For larger datasets, there is a performance gain by, after Pulling,
issuing a Push to the same server (which is a no-op on the server, but
updates some state within ODK Briefcase to make the next Pull a bit more
efficient).

On Tue, Jun 28, 2016 at 8:51 AM, Lloyd Owen Banwart <lloyd....@gmail.com wrote:

Thanks Andrew. I should have noted this is hosted on Google AppEngine.

~lb

On Tuesday, June 28, 2016 at 11:42:52 AM UTC-4, Andrew wrote:

Hi Lloyd,

Is this on a Google AppEngine instance, or is it self-hosted?
If you're self-hosted (and thus have direct database access) it's
possible to write scripts to access the images in the database and reduce
their size and save for downloading through some other direct means (not
through Briefcase). It takes some technical and database knowledge though.

Regards,
Andrew

On Monday, 27 June 2016 19:05:15 UTC+2, Lloyd Owen Banwart wrote:

apologies for the typo:

...solutions which would allow a pull of all this data without
requiring...

....

On Monday, June 27, 2016 at 12:52:08 PM UTC-4, Lloyd Owen Banwart wrote:

Hi all --

I've just started working with a program which deployed a
registration form which required a photo of every person in the household.
They went forth and collected registration data from over 60,000
households, and now (given poor internet) are unable to access their data.
General best practice tips aside (such as regularly pulling and reviewing
instances) - are there any solutions which would all a pull of all this
data without requiring a continuous connection to the internet?

I'm on a solid connection (in the US) but only able to pull about
15-25 forms a minute (using briefcase). At this rate, I'd need about 50
hours to download the entire project. While this is doable, I'm wondering
if its possible to pause a 'pull' and to restart it (such as at the field
level when internet is not stable).

Thanks for any tips.

~Lloyd

--
--
Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google
Groups "ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitche...@gmail.com

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.