ODK Build 'looped' fields not exporting properly from Aggregate

I have built several forms in ODK Build where I have a group with the
'looped' option selected - xml below.
Form is rendered fine on ODK Collect (ie, I can 'Add New Group' etc) and
uploads fine to Aggregate.

However, when I export the results to csv (ie go to Form List, hit Export,
choose csv), the fields in the looped data are displayed as (what I think
is) a URL... hopefully this renders correctly

intro_group:date intro_group:id intro_group:initials R1_readonly
R1_group Tue
Mar 06 00:00:00 UTC 2012 Test Cd null
https://kemrishamba.appspot.com/view/formMultipleValue?formId=build_R-ARV-PILL-COUNT_1330909646[%40version%3Dnull+and+%40uiVersion%3Dnull]%2Fdata[%40key%3Duuid%3A3b0461b3-ad60-448b-85ed-1fb41fec5e06]%2FR1_group
This is an issue for every form that I have built with looped content.
Is there some kind of issue with Aggregate whereby it does not handle
repeat data? Or did I do something wrong in the form building?

Thanks in advance
Colin

XML

<h:title>R_ARV_PILL_COUNT</h:title>
<model>
  <instance>
    <data id="build_R-ARV-PILL-COUNT_1330909646">
      <intro_group>
        <date/>
        <id/>
        <initials/>
      </intro_group>
      <R1_readonly/>
      <R1_group jr:template="">
        <R1_1_group>
          <R1/>
          <R1_1/>
        </R1_1_group>
        <R1_2/>
        <R1_3/>
        <R1_4/>
        <R1_5/>
        <R1_6/>
      </R1_group>
    </data>
  </instance>

...











...










1

...

··· -- Colin

Use ODK Briefcase to download the data from ODK Aggregate and generate a
csv file for the top-level group and separate csv files for each repeat
group underneath.

ODK Aggregate's Export to CSV works as designed.
The URL will display the repeat group data for this row.

We punted generating multiple csv files (such as is done by ODK Briefcase)
because Google AppEngine does not allow writing data to a temp directory,
so the resulting zip we would otherwise want to construct becomes very
difficult to do. In the future, we might implement a "flattened" csv file
with all the data, but that has its own design and technical issues.

Mitch

··· On Tue, Mar 6, 2012 at 3:00 PM, Colin McCann wrote:

I have built several forms in ODK Build where I have a group with the
'looped' option selected - xml below.
Form is rendered fine on ODK Collect (ie, I can 'Add New Group' etc) and
uploads fine to Aggregate.

However, when I export the results to csv (ie go to Form List, hit Export,
choose csv), the fields in the looped data are displayed as (what I think
is) a URL... hopefully this renders correctly

intro_group:date intro_group:id intro_group:initials R1_readonly
R1_group Tue Mar 06 00:00:00 UTC 2012 Test Cd null
https://kemrishamba.appspot.com/view/formMultipleValue?formId=build_R-ARV-PILL-COUNT_1330909646[%40version%3Dnull+and+%40uiVersion%3Dnull]%2Fdata[%40key%3Duuid%3A3b0461b3-ad60-448b-85ed-1fb41fec5e06]%2FR1_group
This is an issue for every form that I have built with looped content.
Is there some kind of issue with Aggregate whereby it does not handle
repeat data? Or did I do something wrong in the form building?

Thanks in advance
Colin

XML

<h:title>R_ARV_PILL_COUNT</h:title>
<model>
  <instance>
    <data id="build_R-ARV-PILL-COUNT_1330909646">
      <intro_group>
        <date/>
        <id/>
        <initials/>
      </intro_group>
      <R1_readonly/>
      <R1_group jr:template="">
        <R1_1_group>
          <R1/>
          <R1_1/>
        </R1_1_group>
        <R1_2/>
        <R1_3/>
        <R1_4/>
        <R1_5/>
        <R1_6/>
      </R1_group>
    </data>
  </instance>

...











...










1

...

--
Colin

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Hi Mitch,

··· On Tue, 6 Mar 2012, Mitch S wrote:

We punted generating multiple csv files (such as is done by ODK
Briefcase) because Google AppEngine does not allow writing data to a
temp directory, so the resulting zip we would otherwise want to
construct becomes very difficult to do. In the future, we might
implement a "flattened" csv file with all the data, but that has its own
design and technical issues.

Isn't it possible to stream out a ZIP file with each CSV file in turn,
without writing anything to disk, using JarOutputStream and ZipEntry?

http://rita.logscluster.org/browser/rita/src/org/wfp/rita/web/controller/ExecutableJarServlet.java

Cheers, Chris.

Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

Hi Mitch
I tried using ODK Briefcase to download data from ODK Aggregate but i did not get any CSV files at all. These are the steps i took;

  • installed ODK briefcase and created the ODK Briefcase storage folder.
  • I pulled 1195 instances of a form from the ODK aggregate server.
  • i chose the export directory and clicked the export button
  • But the directory which i created to hold the exported data, has an empty media folder and an empty file with no extension and 0kb file size.
    Have i missed a step? Please assist.
··· On Wednesday, March 7, 2012 3:25:32 AM UTC+3, Mitch wrote: > Use ODK Briefcase to download the data from ODK Aggregate and generate a csv file for the top-level group and separate csv files for each repeat group underneath. > > ODK Aggregate's Export to CSV works as designed. > > The URL will display the repeat group data for this row. > > We punted generating multiple csv files (such as is done by ODK Briefcase) because Google AppEngine does not allow writing data to a temp directory, so the resulting zip we would otherwise want to construct becomes very difficult to do. In the future, we might implement a "flattened" csv file with all the data, but that has its own design and technical issues. > > > Mitch > > > On Tue, Mar 6, 2012 at 3:00 PM, Colin McCann wrote: > > I have built several forms in ODK Build where I have a group with the 'looped' option selected - xml below. > Form is rendered fine on ODK Collect (ie, I can 'Add New Group' etc) and uploads fine to Aggregate. > > > > > However, when I export the results to csv (ie go to Form List, hit Export, choose csv), the fields in the looped data are displayed as (what I think is) a URL... hopefully this renders correctly > > > > > > > > > > > > > > > > > > > intro_group:date > intro_group:id > intro_group:initials > R1_readonly > R1_group > > > Tue Mar 06 00:00:00 UTC 2012 > Test > Cd > null > https://kemrishamba.appspot.com/view/formMultipleValue?formId=build_R-ARV-PILL-COUNT_1330909646%5B%40version%3Dnull+and+%40uiVersion%3Dnull%5D%2Fdata%5B%40key%3Duuid%3A3b0461b3-ad60-448b-85ed-1fb41fec5e06%5D%2FR1_group > > > > > > > > > > > This is an issue for every form that I have built with looped content. > Is there some kind of issue with Aggregate whereby it does not handle repeat data? Or did I do something wrong in the form building? > > > > > Thanks in advance > Colin > > > XML > > R_ARV_PILL_COUNT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > 1 > > ... > > -- > Colin > > > > > -- > Mitch Sundt > Software Engineer > University of Washington > mitche...@gmail.com

Yes. However, the Jar writing libraries assume you are writing one file at
a time.
To generate the data, ODK Aggregate would traverse the dataset once,
appending data to multiple file streams (the top-level CSV and a CSV for
each repeat group nested underneath). Then it would need to iterate through
those, appending them onto the JarOutputStream.

The crux of the problem is that ODK Aggregate lacks a facility equivalent
to writing files to a temp directory. The alternative, which we currently
do, is to hold all the data in memory, but that limits the size of the
dataset that can be exported (I estimate somewhere below 6000 records).
Adding repeat group jar support would reduce that size by at least 50%
because of the need to maintain a double copy of the data (once as CSV(s),
once in the constructed JAR) before writing it as complete blobs to the
datastore.

If we had a stream implementation that sent the data into the datastore
(making it act like a temp directory), we could do the repeat groups and
JAR construction, but that is a chunk of work.

Mitch

··· On Wed, Mar 7, 2012 at 9:59 AM, Chris Wilson wrote:

Hi Mitch,

On Tue, 6 Mar 2012, Mitch S wrote:

We punted generating multiple csv files (such as is done by ODK

Briefcase) because Google AppEngine does not allow writing data to a temp
directory, so the resulting zip we would otherwise want to construct
becomes very difficult to do. In the future, we might implement a
"flattened" csv file with all the data, but that has its own design and
technical issues.

Isn't it possible to stream out a ZIP file with each CSV file in turn,
without writing anything to disk, using JarOutputStream and ZipEntry?

http://rita.logscluster.org/**browser/rita/src/org/wfp/rita/**
web/controller/**ExecutableJarServlet.javahttp://rita.logscluster.org/browser/rita/src/org/wfp/rita/web/controller/ExecutableJarServlet.java

Cheers, Chris.

Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

To follow up a bit to provide context, Google App Engine is
restrictive on what you can do. When the project started 2 years ago
it was extremely restrictive and it was not possible. Over the course
of the last 2 years they continue to add features that solve some
issues. Separate CSV files while now feasible (previously wasn't even
feasible) would require a bunch of coding to fit into Google App
Engine(GAE) model. The question then just becomes will GAE just
release something in 3 months that makes 1 month of development
working around the nuances to generate multiple CSV at once obsolete,
and the unfortunate answer is probably.

The ODK team has limited resources to add features and this has been
deemed lower priority until more Google App Engine features are added,
especially since ODK Briefcase already gives you the ability to have
what you want. Does it require an extra step? Unfortunately yes, but
our impression from the community as a whole is they would rather have
more features added to ODK this month and wait for Google App Engine
improvement than to spend a bunch of time on code that might become
useless when they are known work arounds.

If this is not the case and the community deems it high enough
priority we can pull development off other features. However, people
have seemed to indicate they want encryption, multi-selects that can
be queried from a database, etc before adding a feature we already
have a work around for.

However this is an open source project so feel free to submit a patch
to fix the problem as we would love to have this solved! Alternatively
you could convince the rest of the community this issue is more
important than other issues so the limited ODK developer resources
will be shifted to fixing this.

Personally, I am not against fixing the problem, but the amount of
resources it will take to fix compared to what other features can be
added (especially since there is an alternative solution), it has been
deemed to be lower priority than other requested features.

Cheers,
Waylon

··· On Wed, Mar 7, 2012 at 10:51 AM, Mitch S wrote: > Yes. However, the Jar writing libraries assume you are writing one file at a > time. > To generate the data, ODK Aggregate would traverse the dataset once, > appending data to multiple file streams (the top-level CSV and a CSV for > each repeat group nested underneath). Then it would need to iterate through > those, appending them onto the JarOutputStream. > > The crux of the problem is that ODK Aggregate lacks a facility equivalent to > writing files to a temp directory. The alternative, which we currently do, > is to hold all the data in memory, but that limits the size of the dataset > that can be exported (I estimate somewhere below 6000 records). Adding > repeat group jar support would reduce that size by at least 50% because of > the need to maintain a double copy of the data (once as CSV(s), once in the > constructed JAR) before writing it as complete blobs to the datastore. > > If we had a stream implementation that sent the data into the datastore > (making it act like a temp directory), we could do the repeat groups and JAR > construction, but that is a chunk of work. > > Mitch > > > > On Wed, Mar 7, 2012 at 9:59 AM, Chris Wilson wrote: >> >> Hi Mitch, >> >> >> On Tue, 6 Mar 2012, Mitch S wrote: >> >>> We punted generating multiple csv files (such as is done by ODK >>> Briefcase) because Google AppEngine does not allow writing data to a temp >>> directory, so the resulting zip we would otherwise want to construct becomes >>> very difficult to do. In the future, we might implement a "flattened" csv >>> file with all the data, but that has its own design and technical issues. >> >> >> Isn't it possible to stream out a ZIP file with each CSV file in turn, >> without writing anything to disk, using JarOutputStream and ZipEntry? >> >> >> http://rita.logscluster.org/browser/rita/src/org/wfp/rita/web/controller/ExecutableJarServlet.java >> >> Cheers, Chris. >> -- >> Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 >> The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES >> >> Aptivate is a not-for-profit company registered in England and Wales >> with company number 04980791. >> > > > > -- > Mitch Sundt > Software Engineer > University of Washington > mitchellsundt@gmail.com

We are working on a fairly complex project (20+ forms, between 1 and
15 pages each), which makes it pretty unwieldy to begin with -
additional csvs for each loop just makes it more so. However, after
playing with Briefcase for a while (and generally cutting down on the
loops), I am comfortable saying that the solutions proposed here are
workable. I'm just glad I'm not doing the data analysis :slight_smile: I would
be interested in hearing from how other users have approached
wrangling the various csvs into a more usable form (analysis with
SPSS, in this case). Are there responses/threads about this?

This was our first time using ODK, and we have been generally very
happy with most of the coding choices made, and have been very
impressed with the community, level of support, etc. We look forward
to using the ODK suite again in the future.

Thanks
Colin

··· On Mar 7, 5:42 pm, "W. Brunette" wrote: > To follow up a bit to provide context, Google App Engine is > restrictive on what you can do. When the project started 2 years ago > it was extremely restrictive and it was not possible. Over the course > of the last 2 years they continue to add features that solve some > issues. Separate CSV files while now feasible (previously wasn't even > feasible) would require a bunch of coding to fit into Google App > Engine(GAE) model. The question then just becomes will GAE just > release something in 3 months that makes 1 month of development > working around the nuances to generate multiple CSV at once obsolete, > and the unfortunate answer is probably. > > The ODK team has limited resources to add features and this has been > deemed lower priority until more Google App Engine features are added, > especially since ODK Briefcase already gives you the ability to have > what you want. Does it require an extra step? Unfortunately yes, but > our impression from the community as a whole is they would rather have > more features added to ODK this month and wait for Google App Engine > improvement than to spend a bunch of time on code that might become > useless when they are known work arounds. > > If this is not the case and the community deems it high enough > priority we can pull development off other features. However, people > have seemed to indicate they want encryption, multi-selects that can > be queried from a database, etc before adding a feature we already > have a work around for. > > However this is an open source project so feel free to submit a patch > to fix the problem as we would love to have this solved! Alternatively > you could convince the rest of the community this issue is more > important than other issues so the limited ODK developer resources > will be shifted to fixing this. > > Personally, I am not against fixing the problem, but the amount of > resources it will take to fix compared to what other features can be > added (especially since there is an alternative solution), it has been > deemed to be lower priority than other requested features. > > Cheers, > Waylon > > > > > > > > On Wed, Mar 7, 2012 at 10:51 AM, Mitch S wrote: > > Yes. However, the Jar writing libraries assume you are writing one file at a > > time. > > To generate the data, ODK Aggregate would traverse the dataset once, > > appending data to multiple file streams (the top-level CSV and a CSV for > > each repeat group nested underneath). Then it would need to iterate through > > those, appending them onto the JarOutputStream. > > > The crux of the problem is that ODK Aggregate lacks a facility equivalent to > > writing files to a temp directory. The alternative, which we currently do, > > is to hold all the data in memory, but that limits the size of the dataset > > that can be exported (I estimate somewhere below 6000 records). Adding > > repeat group jar support would reduce that size by at least 50% because of > > the need to maintain a double copy of the data (once as CSV(s), once in the > > constructed JAR) before writing it as complete blobs to the datastore. > > > If we had a stream implementation that sent the data into the datastore > > (making it act like a temp directory), we could do the repeat groups and JAR > > construction, but that is a chunk of work. > > > Mitch > > > On Wed, Mar 7, 2012 at 9:59 AM, Chris Wilson wrote: > > >> Hi Mitch, > > >> On Tue, 6 Mar 2012, Mitch S wrote: > > >>> We punted generating multiple csv files (such as is done by ODK > >>> Briefcase) because Google AppEngine does not allow writing data to a temp > >>> directory, so the resulting zip we would otherwise want to construct becomes > >>> very difficult to do. In the future, we might implement a "flattened" csv > >>> file with all the data, but that has its own design and technical issues. > > >> Isn't it possible to stream out a ZIP file with each CSV file in turn, > >> without writing anything to disk, using JarOutputStream and ZipEntry? > > >>http://rita.logscluster.org/browser/rita/src/org/wfp/rita/web/control... > > >> Cheers, Chris. > >> -- > >> Aptivate |http://www.aptivate.org| Phone: +44 1223 760887 > >> The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES > > >> Aptivate is a not-for-profit company registered in England and Wales > >> with company number 04980791. > > > -- > > Mitch Sundt > > Software Engineer > > University of Washington > > mitchellsu...@gmail.com

Can you confirm that you are seeing the downloaded instances in your "ODK
briefcase storage" directory?

☞§※☼:airplane::open_umbrella::slight_smile:
~Neil

··· On Thu, Jun 27, 2013 at 5:36 AM, wrote:

Hi Mitch
I tried using ODK Briefcase to download data from ODK Aggregate but i did
not get any CSV files at all. These are the steps i took;

  • installed ODK briefcase and created the ODK Briefcase storage folder.
  • I pulled 1195 instances of a form from the ODK aggregate server.
  • i chose the export directory and clicked the export button
  • But the directory which i created to hold the exported data, has an
    empty media folder and an empty file with no extension and 0kb file
    size.
    Have i missed a step? Please assist.

On Wednesday, March 7, 2012 3:25:32 AM UTC+3, Mitch wrote:

Use ODK Briefcase to download the data from ODK Aggregate and generate a
csv file for the top-level group and separate csv files for each repeat
group underneath.

ODK Aggregate's Export to CSV works as designed.

The URL will display the repeat group data for this row.

We punted generating multiple csv files (such as is done by ODK
Briefcase) because Google AppEngine does not allow writing data to a temp
directory, so the resulting zip we would otherwise want to construct
becomes very difficult to do. In the future, we might implement a
"flattened" csv file with all the data, but that has its own design and
technical issues.

Mitch

On Tue, Mar 6, 2012 at 3:00 PM, Colin McCann colind...@gmail.com wrote:

I have built several forms in ODK Build where I have a group with the
'looped' option selected - xml below.
Form is rendered fine on ODK Collect (ie, I can 'Add New Group' etc) and
uploads fine to Aggregate.

However, when I export the results to csv (ie go to Form List, hit
Export, choose csv), the fields in the looped data are displayed as (what I
think is) a URL... hopefully this renders correctly

                  intro_group:date
                  intro_group:id
                  intro_group:initials
                  R1_readonly
                  R1_group


                  Tue Mar 06 00:00:00 UTC 2012
                  Test
                  Cd
                  null

https://kemrishamba.appspot.com/view/formMultipleValue?formId=build_R-ARV-PILL-COUNT_1330909646[%40version%3Dnull+and+%40uiVersion%3Dnull]%2Fdata[%40key%3Duuid%3A3b0461b3-ad60-448b-85ed-1fb41fec5e06]%2FR1_group

This is an issue for every form that I have built with looped content.
Is there some kind of issue with Aggregate whereby it does not handle
repeat data? Or did I do something wrong in the form building?

Thanks in advance
Colin

XML

<h:title>R_ARV_PILL_COUNT</h:title>
<model>
  <instance>
    <data id="build_R-ARV-PILL-COUNT_1330909646">



      <intro_group>
        <date/>
        <id/>
        <initials/>
      </intro_group>
      <R1_readonly/>
      <R1_group jr:template="">



        <R1_1_group>
          <R1/>
          <R1_1/>
        </R1_1_group>
        <R1_2/>
        <R1_3/>
        <R1_4/>
        <R1_5/>



        <R1_6/>
      </R1_group>
    </data>
  </instance>

...

  <bind nodeset="/data/intro_group/id" type="string"

required="true()" constraint="(regex(., "^.{4,4}$"))"/>

  <bind nodeset="/data/intro_group/initials" type="string"

required="true()"/>

  <bind nodeset="/data/R1_readonly" type="string" readonly="true()"/>
  <bind nodeset="/data/R1_group/R1_1_group/R1" type="select1"

required="true()"/>

  <bind nodeset="/data/R1_group/R1_1_group/R1_1" type="string"/>
  <bind nodeset="/data/R1_group/R1_2" type="int" required="true()"/>
  <bind nodeset="/data/R1_group/R1_3" type="int" required="true()"/>



  <bind nodeset="/data/R1_group/R1_4" type="date"/>
  <bind nodeset="/data/R1_group/R1_5" type="int" required="true()"/>
  <bind nodeset="/data/R1_group/R1_6" type="int" required="true()"/>

...



      <label ref="jr:itext('/data/R1_group/R1_1_group:label')"/>
      <select1 ref="/data/R1_group/R1_1_group/R1">
        <label ref="jr:itext('/data/R1_group/R1_1_group/R1:label')"/>



        <hint ref="jr:itext('/data/R1_group/R1_1_group/R1:hint')"/>
        <item>
          <label

ref="jr:itext('/data/R1_group/R1_1_group/R1:option0')"/>

          <value>1</value>
        </item>

...

--
Colin

--
Mitch Sundt
Software Engineer
University of Washington
mitche...@gmail.com

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I mentioned this in another thread and have put in a feature request for
the issue tracker, but I suggest that the data be exportable in JSON format
as well as CSV. CSV is fine for something that is basically just a bunch
of questions, but isn't very useful for anything that loops, especially if
it is a cross reference of CSV files.

I suggested JSON since it is used quite a bit these days and there are
parsers for the format in almost every language out there. But it is
probably not a supported format for older software packages like SPSS or
Matlab. Actually, having never used something like SPSS, what formats do
work for it?

Sim

··· On Mon, Apr 9, 2012 at 2:17 PM, Colin wrote:

We are working on a fairly complex project (20+ forms, between 1 and
15 pages each), which makes it pretty unwieldy to begin with -
additional csvs for each loop just makes it more so. However, after
playing with Briefcase for a while (and generally cutting down on the
loops), I am comfortable saying that the solutions proposed here are
workable. I'm just glad I'm not doing the data analysis :slight_smile: I would
be interested in hearing from how other users have approached
wrangling the various csvs into a more usable form (analysis with
SPSS, in this case). Are there responses/threads about this?

This was our first time using ODK, and we have been generally very
happy with most of the coding choices made, and have been very
impressed with the community, level of support, etc. We look forward
to using the ODK suite again in the future.

Thanks
Colin

On Mar 7, 5:42 pm, "W. Brunette" wbrune...@gmail.com wrote:

To follow up a bit to provide context, Google App Engine is
restrictive on what you can do. When the project started 2 years ago
it was extremely restrictive and it was not possible. Over the course
of the last 2 years they continue to add features that solve some
issues. Separate CSV files while now feasible (previously wasn't even
feasible) would require a bunch of coding to fit into Google App
Engine(GAE) model. The question then just becomes will GAE just
release something in 3 months that makes 1 month of development
working around the nuances to generate multiple CSV at once obsolete,
and the unfortunate answer is probably.

The ODK team has limited resources to add features and this has been
deemed lower priority until more Google App Engine features are added,
especially since ODK Briefcase already gives you the ability to have
what you want. Does it require an extra step? Unfortunately yes, but
our impression from the community as a whole is they would rather have
more features added to ODK this month and wait for Google App Engine
improvement than to spend a bunch of time on code that might become
useless when they are known work arounds.

If this is not the case and the community deems it high enough
priority we can pull development off other features. However, people
have seemed to indicate they want encryption, multi-selects that can
be queried from a database, etc before adding a feature we already
have a work around for.

However this is an open source project so feel free to submit a patch
to fix the problem as we would love to have this solved! Alternatively
you could convince the rest of the community this issue is more
important than other issues so the limited ODK developer resources
will be shifted to fixing this.

Personally, I am not against fixing the problem, but the amount of
resources it will take to fix compared to what other features can be
added (especially since there is an alternative solution), it has been
deemed to be lower priority than other requested features.

Cheers,
Waylon

On Wed, Mar 7, 2012 at 10:51 AM, Mitch S mitchellsu...@gmail.com wrote:

Yes. However, the Jar writing libraries assume you are writing one
file at a
time.
To generate the data, ODK Aggregate would traverse the dataset once,
appending data to multiple file streams (the top-level CSV and a CSV
for
each repeat group nested underneath). Then it would need to iterate
through
those, appending them onto the JarOutputStream.

The crux of the problem is that ODK Aggregate lacks a facility
equivalent to
writing files to a temp directory. The alternative, which we currently
do,
is to hold all the data in memory, but that limits the size of the
dataset
that can be exported (I estimate somewhere below 6000 records). Adding
repeat group jar support would reduce that size by at least 50%
because of
the need to maintain a double copy of the data (once as CSV(s), once
in the
constructed JAR) before writing it as complete blobs to the datastore.

If we had a stream implementation that sent the data into the datastore
(making it act like a temp directory), we could do the repeat groups
and JAR
construction, but that is a chunk of work.

Mitch

On Wed, Mar 7, 2012 at 9:59 AM, Chris Wilson ch...@aptivate.org wrote:

Hi Mitch,

On Tue, 6 Mar 2012, Mitch S wrote:

We punted generating multiple csv files (such as is done by ODK
Briefcase) because Google AppEngine does not allow writing data to a
temp
directory, so the resulting zip we would otherwise want to construct
becomes
very difficult to do. In the future, we might implement a
"flattened" csv
file with all the data, but that has its own design and technical
issues.

Isn't it possible to stream out a ZIP file with each CSV file in turn,
without writing anything to disk, using JarOutputStream and ZipEntry?

http://rita.logscluster.org/browser/rita/src/org/wfp/rita/web/control.
..

Cheers, Chris.

Aptivate |http://www.aptivate.org| Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsu...@gmail.com

I've always imported as CSV, but I know SPSS also takes a bunch of the
MS Office products (Excel and Access for sure, maybe Word?).

Also, Sim, I think JSON export is a great idea, given current levels
of uptake (at least in the areas I'm working in). I'm not sure what
parsers exist that would meet requirements, but I find it way easier
to work with than CSV.

··· On Apr 9, 3:25 pm, Sim Harbert wrote: > I mentioned this in another thread and have put in a feature request for > the issue tracker, but I suggest that the data be exportable in JSON format > as well as CSV. CSV is fine for something that is basically just a bunch > of questions, but isn't very useful for anything that loops, especially if > it is a cross reference of CSV files. > > I suggested JSON since it is used quite a bit these days and there are > parsers for the format in almost every language out there. But it is > probably not a supported format for older software packages like SPSS or > Matlab. Actually, having never used something like SPSS, what formats do > work for it? > > Sim > > > > > > > > On Mon, Apr 9, 2012 at 2:17 PM, Colin wrote: > > We are working on a fairly complex project (20+ forms, between 1 and > > 15 pages each), which makes it pretty unwieldy to begin with - > > additional csvs for each loop just makes it more so. However, after > > playing with Briefcase for a while (and generally cutting down on the > > loops), I am comfortable saying that the solutions proposed here are > > workable. I'm just glad I'm not doing the data analysis :) I would > > be interested in hearing from how other users have approached > > wrangling the various csvs into a more usable form (analysis with > > SPSS, in this case). Are there responses/threads about this? > > > This was our first time using ODK, and we have been generally very > > happy with most of the coding choices made, and have been very > > impressed with the community, level of support, etc. We look forward > > to using the ODK suite again in the future. > > > Thanks > > Colin > > > On Mar 7, 5:42 pm, "W. Brunette" wrote: > > > To follow up a bit to provide context, Google App Engine is > > > restrictive on what you can do. When the project started 2 years ago > > > it was extremely restrictive and it was not possible. Over the course > > > of the last 2 years they continue to add features that solve some > > > issues. Separate CSV files while now feasible (previously wasn't even > > > feasible) would require a bunch of coding to fit into Google App > > > Engine(GAE) model. The question then just becomes will GAE just > > > release something in 3 months that makes 1 month of development > > > working around the nuances to generate multiple CSV at once obsolete, > > > and the unfortunate answer is probably. > > > > The ODK team has limited resources to add features and this has been > > > deemed lower priority until more Google App Engine features are added, > > > especially since ODK Briefcase already gives you the ability to have > > > what you want. Does it require an extra step? Unfortunately yes, but > > > our impression from the community as a whole is they would rather have > > > more features added to ODK this month and wait for Google App Engine > > > improvement than to spend a bunch of time on code that might become > > > useless when they are known work arounds. > > > > If this is not the case and the community deems it high enough > > > priority we can pull development off other features. However, people > > > have seemed to indicate they want encryption, multi-selects that can > > > be queried from a database, etc before adding a feature we already > > > have a work around for. > > > > However this is an open source project so feel free to submit a patch > > > to fix the problem as we would love to have this solved! Alternatively > > > you could convince the rest of the community this issue is more > > > important than other issues so the limited ODK developer resources > > > will be shifted to fixing this. > > > > Personally, I am not against fixing the problem, but the amount of > > > resources it will take to fix compared to what other features can be > > > added (especially since there is an alternative solution), it has been > > > deemed to be lower priority than other requested features. > > > > Cheers, > > > Waylon > > > > On Wed, Mar 7, 2012 at 10:51 AM, Mitch S wrote: > > > > Yes. However, the Jar writing libraries assume you are writing one > > file at a > > > > time. > > > > To generate the data, ODK Aggregate would traverse the dataset once, > > > > appending data to multiple file streams (the top-level CSV and a CSV > > for > > > > each repeat group nested underneath). Then it would need to iterate > > through > > > > those, appending them onto the JarOutputStream. > > > > > The crux of the problem is that ODK Aggregate lacks a facility > > equivalent to > > > > writing files to a temp directory. The alternative, which we currently > > do, > > > > is to hold all the data in memory, but that limits the size of the > > dataset > > > > that can be exported (I estimate somewhere below 6000 records). Adding > > > > repeat group jar support would reduce that size by at least 50% > > because of > > > > the need to maintain a double copy of the data (once as CSV(s), once > > in the > > > > constructed JAR) before writing it as complete blobs to the datastore. > > > > > If we had a stream implementation that sent the data into the datastore > > > > (making it act like a temp directory), we could do the repeat groups > > and JAR > > > > construction, but that is a chunk of work. > > > > > Mitch > > > > > On Wed, Mar 7, 2012 at 9:59 AM, Chris Wilson wrote: > > > > >> Hi Mitch, > > > > >> On Tue, 6 Mar 2012, Mitch S wrote: > > > > >>> We punted generating multiple csv files (such as is done by ODK > > > >>> Briefcase) because Google AppEngine does not allow writing data to a > > temp > > > >>> directory, so the resulting zip we would otherwise want to construct > > becomes > > > >>> very difficult to do. In the future, we might implement a > > "flattened" csv > > > >>> file with all the data, but that has its own design and technical > > issues. > > > > >> Isn't it possible to stream out a ZIP file with each CSV file in turn, > > > >> without writing anything to disk, using JarOutputStream and ZipEntry? > > > > >>http://rita.logscluster.org/browser/rita/src/org/wfp/rita/web/control. > > .. > > > > >> Cheers, Chris. > > > >> -- > > > >> Aptivate |http://www.aptivate.org|Phone: +44 1223 760887 > > > >> The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES > > > > >> Aptivate is a not-for-profit company registered in England and Wales > > > >> with company number 04980791. > > > > > -- > > > > Mitch Sundt > > > > Software Engineer > > > > University of Washington > > > > mitchellsu...@gmail.com