Applying indexes to odk_prod tables

Ayub · March 25, 2015, 5:56am

Hi ODK Devs,

Recently we have collected large amount of data (text, images) and ran into
performance issues while some data exported as CSV file(s). Since indexes
on columns are used to optimize SELECT query, my question is that if
certain indexes are set on the database columns while survey is conducted
(data from ODK Collect) does it effect ODK Aggregate or Data Collection in
general in any way possible?

It would be really great if recommendations to enhance the performance of
SELECT queries on odk_prod (MySql) are shared as well especially images
retrieval (xyz_blb).

Thanks for all your support,

Ayub

Mitch_S · March 25, 2015, 3:17pm

With MySQL and PostgreSQL, you can define indices on the data tables and
those database systems will automatically utilize them when searching your
data tables.

The biggest issue with MySQL is that the data will often be stored across
many tables, so if you need to filter the query on a column in the 3rd
table in order to access the 1st table efficiently, you will not be able to
do that. ODK Aggregate pulls the 1st table first, then pulls each
subsequent table using a column that has an index (so the rest of the
access should be very fast).

If you are Exporting to CSV or KML from within ODK Aggregate, the greatest
performance hit will be in Java garbage collections. You will likely get
greater speed-up by substantially increasing your JVM size than by creating
indices on the database server.

···

On Tue, Mar 24, 2015 at 10:56 PM, Ayub wrote:

Hi ODK Devs,

Recently we have collected large amount of data (text, images) and ran
into performance issues while some data exported as CSV file(s). Since
indexes on columns are used to optimize SELECT query, my question is that
if certain indexes are set on the database columns while survey is
conducted (data from ODK Collect) does it effect ODK Aggregate or Data
Collection in general in any way possible?

It would be really great if recommendations to enhance the performance of
SELECT queries on odk_prod (MySql) are shared as well especially images
retrieval (xyz_blb).

Thanks for all your support,

Ayub

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Ayub · March 26, 2015, 6:38am

Mitch,

I am using php script that Joins multiple tables e.g. core, core1, core2
... (using _URI from core and _TOP_LEVEL_AURI in core1, core 2 ...). So
defining index on _TOP_LEVEL_AURI should speed things up and have no effect
on overall data collection process right?

Mitch_S · March 27, 2015, 8:59pm

Correct.

Note that you should declare this index as a HASH or UNORDERED index, as
that column contains UUIDs and creating an ordered index on those can
adversely affect performance as your dataset increases.

Also, you might consider declaring a VIEW instead of putting all the code
to do the joins into php. If you do declare a view, you will probably need
to increase your MySQL communications buffer size to accommodate the wider
data row widths.

···

On Wed, Mar 25, 2015 at 11:38 PM, Ayub wrote:

Mitch,

I am using php script that Joins multiple tables e.g. core, core1, core2
... (using _URI from core and _TOP_LEVEL_AURI in core1, core 2 ...). So
defining index on _TOP_LEVEL_AURI should speed things up and have no effect
on overall data collection process right?

--
You received this message because you are subscribed to the Google Groups
"ODK Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com

Ayub · April 1, 2015, 5:37am

Thanks Mitch, this significantly improved performance.

···

>