Queries filter locally and hence time out

Hi,

I'm looking for help to get my Aggregate queries to run.

I'm trying to execute a date range query that I've implemented.

It seems that QueryImpl.java has fetch limit disabled. Which causes my
queries to time out and fail since I have 20,000 form instances.

Setting "filterLocally" to false caused a DatastoreNeedIndexException.
Which is baffling because it seems that GAE should autogenerate the
index.

There's a comment in here, that says something about multi-value
queries not being supported. Is that the reason?
http://code.google.com/p/opendatakit/source/browse/src/main/java/org/opendatakit/common/persistence/engine/gae/QueryImpl.java?repo=aggregate&name=uiexperiment

    /**
             * GAE 1.4.2 has changed the way it handles indices so

that the actual
* query construction (prepareQuery) no longer throws
a
* DatastoreNeedIndexException, but, rather, that
exception is thrown
* at the point where the cursor is accessed.

··· * * For now, just skip all multi-value querying and do the * filtering and sorting locally against the dataset returned * by the first filter condition. */

Do you have any insight into this issue? Is it possible to support
multi-value queries somehow?

Thanks!

The workaround is to modify QueryImpl to change filterAndSortLocally
to false.

When you run the query, GAE will generate an error, but the error
message will suggest what index you need to add.

Paste the suggested datastore-index into datastore-indexes.xml,
redeploy and then wait until the index is built before trying again.
(look at the Database Indexes link on the your GAE management page)

This seemed to work.

··· On Sep 1, 11:49 am, GregM wrote: > Hi, > > I'm looking for help to get my Aggregate queries to run. > > I'm trying to execute a date range query that I've implemented. > > It seems that QueryImpl.java has fetch limit disabled. Which causes my > queries to time out and fail since I have 20,000 form instances. > > Setting "filterLocally" to false caused a DatastoreNeedIndexException. > Which is baffling because it seems that GAE should autogenerate the > index. > > There's a comment in here, that says something about multi-value > queries not being supported. Is that the reason?http://code.google.com/p/opendatakit/source/browse/src/main/java/org/... > > /** > * GAE 1.4.2 has changed the way it handles indices so > that the actual > * query construction (prepareQuery) no longer throws > a > * DatastoreNeedIndexException, but, rather, that > exception is thrown > * at the point where the cursor is accessed. > * > * For now, just skip all multi-value querying and do > the > * filtering and sorting locally against the dataset > returned > * by the first filter condition. > */ > > Do you have any insight into this issue? Is it possible to support > multi-value queries somehow? > > Thanks!

Note that setting this flag to false will require adding a number of indices
for the data tables used internally by ODK Aggregate. You should exercise
the export, publish and permissions settings features to ensure that you are
discovering all the multi-column filters you need. Also, if a user
dynamically constructs a filter, it may fail if you set this flag to false
unless there is an index that has all the columns used within the filter.

This highlights an extremely frustrating gap in Google's datastore API.

Specifically, there is no support for defining and managing table indices
through API calls. The only way to define a multi-column index is through
the XML configuration file specified when you first deploy your application
to your appspot instance. Because each form's data is stored in its own
table and the names of the columns that are significant to a user are
different and specific to each form, we cannot construct an XML
configuration file at the time we deploy to Google AppEngine that will be
appropriate for the forms a user has and the queries they use against those
tables (because we haven't yet seen the forms the user will upload to ODK
Aggregate or the queries the user will perform against those forms).

For now, ODK Aggregate filters only by the first value in the query, and
does the rest of the filtering locally (on the server). Beta 4 improves
this somewhat by applying all the filter and sorting clauses that it can on
that first value. We still need to further improve the efficiency by
returning only the first N ordered matches back to the browser and reworking
the deepest layer of the query construction to work around the underlying
problems caused by the missing functionality.

Mitch

··· On Thu, Sep 1, 2011 at 9:39 AM, GregM wrote:

The workaround is to modify QueryImpl to change filterAndSortLocally
to false.

When you run the query, GAE will generate an error, but the error
message will suggest what index you need to add.

Paste the suggested datastore-index into datastore-indexes.xml,
redeploy and then wait until the index is built before trying again.
(look at the Database Indexes link on the your GAE management page)

This seemed to work.

On Sep 1, 11:49 am, GregM grego...@gmail.com wrote:

Hi,

I'm looking for help to get my Aggregate queries to run.

I'm trying to execute a date range query that I've implemented.

It seems that QueryImpl.java has fetch limit disabled. Which causes my
queries to time out and fail since I have 20,000 form instances.

Setting "filterLocally" to false caused a DatastoreNeedIndexException.
Which is baffling because it seems that GAE should autogenerate the
index.

There's a comment in here, that says something about multi-value
queries not being supported. Is that the reason?
http://code.google.com/p/opendatakit/source/browse/src/main/java/org/...

    /**
             * GAE 1.4.2 has changed the way it handles indices so

that the actual
* query construction (prepareQuery) no longer throws
a
* DatastoreNeedIndexException, but, rather, that
exception is thrown
* at the point where the cursor is accessed.
*
* For now, just skip all multi-value querying and do
the
* filtering and sorting locally against the dataset
returned
* by the first filter condition.
*/

Do you have any insight into this issue? Is it possible to support
multi-value queries somehow?

Thanks!

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com