Aggregate Capacity

Hello Community,

What is the maximum number of records that "Aggregate" can handle. We want
to do a survey and we have a population of about 2 million and about 500
000 households on average.
We're thinking that maybe we're going to have to use 8 forms per person;
that's 8 x 2 million = 16 million forms. Can ODK handle that?

Regards,
Makhate Makhate

See: https://opendatakit.org/use/aggregate/deployment-planning/
for some guidance.

With large datasets, the big question is:
Where and/or how are you going to analyze the results?

Once datasets get large, the Visualization tools in ODK Aggregate will stop
working, and the file export functionality will stop working (both require
holding all the data in memory), and the submissions list will cease to be
operationally useful.

When that happens, the two options for working with the data are to either
(1) publish the data to an analysis server (e.g., Fusion Tables, Google
Spreadsheets, etc.) or (2) for you to use ODK Briefcase to download the
dataset to your computer, generate the CSVs on your local computer, and do
the analysis on those CSVs.

ยทยทยท -----------

For smaller datasets, ODK Aggregate can be used as a datastore-of-record
for your survey efforts. I.e., it can hold all collected records. As
datasets get larger and as the analysis and visualization become too
complex for ODK Aggregate, it may be more appropriate to view ODK Aggregate
as a waypoint in the flow of these records into a more functional data
analysis server.

In that usage scenario, the "Purge Submissions" functionality found on the
Forms Management / Submission Admin sub-tab may be useful. You can remove
older submissions once they have successfully been moved onto the data
analysis server.


The very short answer to your question is that there is no limitation in
the software. It will continue to operate up until:
(1) it runs out of memory or
(2) runs too slowly to complete a form submission or other interaction
within 60 seconds (at which point ODK Collect will be unable to submit data
into the system).

Not using visualizations or file-export actions and handling the data via
ODK Briefcase or via publishers places very little demand on memory (making
(1) not a concern); but loading a server with 100's of forms does -- each
form definition is held in memory for performance reasons. The only remedy,
as you go to 100's of forms, is to go to a larger machine and larger JVM
size.

Slow operations may begin to appear as the data access queries to filter
the data to a specific set of rows take longer and longer to execute.
There is very little that can be done on AppEngine to speed that up. If you
are running MySQL or PostgreSQL, there are many things a good DBA can do;
and purging the older collected data will also address this issue.

On Wed, Jun 10, 2015 at 11:12 AM, Makhate Makhate makhatemakhate@gmail.com wrote:

Hello Community,

What is the maximum number of records that "Aggregate" can handle. We want
to do a survey and we have a population of about 2 million and about 500
000 households on average.
We're thinking that maybe we're going to have to use 8 forms per person;
that's 8 x 2 million = 16 million forms. Can ODK handle that?

Regards,
Makhate Makhate

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Thank you very much for the response. It really helped.