Aggregate 1.0 Alpha 5 - MySQL schema for binary objects?

Is there a plan for documenting the MySQL tables? I know the ODK team
are good designers, but querying the tables for binary data seems a
little un-straightforward.

For instance, it would be nice to know why, for 1 instance of a form
with 1 image captured by the camera, I would have 13 records for the
same _TOP_LEVEL_AURI in the following tables:

_blb
_ref

when the following tables have the expected 1 record each:

_bn
_vbn

My guess here is the design accommodates very large blobs by splitting
them into parts. But I am not having any success digging in the code
base to figure out how the parts are queried or put back together.

Would love some detailed documentation here, as my goal is to write a
very form-specific reporting feature for non-Aggregate users to easily
visualize (yes, with embedded images).

Hi Stuart,

Yes, we are actually going to implement a Java manipulator class for
handling binary objects.

The short answer to your question is as follows:

The tables are chained together downward, from top to bottom.

_bin -> _vbn -> _ref -> _blb

This set of 4 tables stores one binary data field in the form. Other binary
data fields are stored in their own unique set of 4 tables. So you don't
need to worry about figuring out what field the data corresponds to --
that's determined from the table name.

_bin
vbn
ref
_blb

The naming of the table refers to what it describes:

_bn -- BINARY CONTENT
the unrooted filename of the binary data, if any.

··· On Mon, Feb 7, 2011 at 2:43 PM, Stuart Moffatt wrote:

Is there a plan for documenting the MySQL tables? I know the ODK team
are good designers, but querying the tables for binary data seems a
little un-straightforward.

For instance, it would be nice to know why, for 1 instance of a form
with 1 image captured by the camera, I would have 13 records for the
same _TOP_LEVEL_AURI in the following tables:

_blb
_ref

when the following tables have the expected 1 record each:

_bn
_vbn

My guess here is the design accommodates very large blobs by splitting
them into parts. But I am not having any success digging in the code
base to figure out how the parts are queried or put back together.

Would love some detailed documentation here, as my goal is to write a
very form-specific reporting feature for non-Aggregate users to easily
visualize (yes, with embedded images).

--
Mitch Sundt
Software Engineer

University of Washington
mitchellsundt@gmail.com

ARGH. Continuing where I left off:

_bin -- Binary content (BinaryContent.java) -- holds the filename (if any)
of the binary data and the version code for the current version of the
binary data.

Binary content fields within an xform can have only one binary attachment
per field, but this table allows multiple binary attachments within a single
field. This is currently used for multimedia forms, where the files in the
media directory are stored in one binary content field as separate records
(distinguished by their ordinals).

_vbn -- Versioned binary content (VersionedBinaryContent.java) -- holds the
version code, content length, type and content hash for a specific version
of a binary attachment.

Versioned binary content allows multiple versions of an attachment to be
saved, and allows for atomic switching across versions by updating the _bin
record with the new active version. The versioned binary content record with
the version code that matches that in the _bin table is active content for
that field.

_ref -- Versioned binary content ref Blob
(VersionedBinaryContentRefBlob.java) -- for a given binary attachment,
maintains the ordering of the segments of the _blb table that are used to
reconstruct the blob. (_DOM_AURI == PK into _vbn; _SUB_AURI == PK into
_blb; PART = 1..n the ordering of the _blb record in the reconstruction of
the blob).

_blb -- Ref Blob (RefBlob.java) -- a segment of the binary data (VALUE).

··· ===== Mitch

On Mon, Feb 7, 2011 at 2:59 PM, Mitch Sundt msundt@cs.washington.eduwrote:

Hi Stuart,

Yes, we are actually going to implement a Java manipulator class for
handling binary objects.

The short answer to your question is as follows:

The tables are chained together downward, from top to bottom.

_bin -> _vbn -> _ref -> _blb

This set of 4 tables stores one binary data field in the form. Other
binary data fields are stored in their own unique set of 4 tables. So you
don't need to worry about figuring out what field the data corresponds to --
that's determined from the table name.

_bin
vbn
ref
_blb

The naming of the table refers to what it describes:

_bn -- BINARY CONTENT
the unrooted filename of the binary data, if any.

On Mon, Feb 7, 2011 at 2:43 PM, Stuart Moffatt stuartmoffatt@gmail.comwrote:

Is there a plan for documenting the MySQL tables? I know the ODK team
are good designers, but querying the tables for binary data seems a
little un-straightforward.

For instance, it would be nice to know why, for 1 instance of a form
with 1 image captured by the camera, I would have 13 records for the
same _TOP_LEVEL_AURI in the following tables:

_blb
_ref

when the following tables have the expected 1 record each:

_bn
_vbn

My guess here is the design accommodates very large blobs by splitting
them into parts. But I am not having any success digging in the code
base to figure out how the parts are queried or put back together.

Would love some detailed documentation here, as my goal is to write a
very form-specific reporting feature for non-Aggregate users to easily
visualize (yes, with embedded images).

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

Oh, and to loop back to your question about the complexity -- if we
supported only one binary attachment per binary content field, and if we
didn't support multiple versions of the binary attachment, we could collapse
this into just the _ref and _blb tables. It was an early design choice (by
me) to add that complexity, and that may or may not have been warranted. A
year from now it should be clear whether it was a dead end or has become
something useful.

Mitch

··· On Mon, Feb 7, 2011 at 3:17 PM, Mitch Sundt wrote:

ARGH. Continuing where I left off:

_bin -- Binary content (BinaryContent.java) -- holds the filename (if any)
of the binary data and the version code for the current version of the
binary data.

Binary content fields within an xform can have only one binary attachment
per field, but this table allows multiple binary attachments within a single
field. This is currently used for multimedia forms, where the files in the
media directory are stored in one binary content field as separate records
(distinguished by their ordinals).

_vbn -- Versioned binary content (VersionedBinaryContent.java) -- holds the
version code, content length, type and content hash for a specific version
of a binary attachment.

Versioned binary content allows multiple versions of an attachment to be
saved, and allows for atomic switching across versions by updating the _bin
record with the new active version. The versioned binary content record with
the version code that matches that in the _bin table is active content for
that field.

_ref -- Versioned binary content ref Blob
(VersionedBinaryContentRefBlob.java) -- for a given binary attachment,
maintains the ordering of the segments of the _blb table that are used to
reconstruct the blob. (_DOM_AURI == PK into _vbn; _SUB_AURI == PK into
_blb; PART = 1..n the ordering of the _blb record in the reconstruction of
the blob).

_blb -- Ref Blob (RefBlob.java) -- a segment of the binary data (VALUE).

=====
Mitch

On Mon, Feb 7, 2011 at 2:59 PM, Mitch Sundt msundt@cs.washington.eduwrote:

Hi Stuart,

Yes, we are actually going to implement a Java manipulator class for
handling binary objects.

The short answer to your question is as follows:

The tables are chained together downward, from top to bottom.

_bin -> _vbn -> _ref -> _blb

This set of 4 tables stores one binary data field in the form. Other
binary data fields are stored in their own unique set of 4 tables. So you
don't need to worry about figuring out what field the data corresponds to --
that's determined from the table name.

_bin
vbn
ref
_blb

The naming of the table refers to what it describes:

_bn -- BINARY CONTENT
the unrooted filename of the binary data, if any.

On Mon, Feb 7, 2011 at 2:43 PM, Stuart Moffatt stuartmoffatt@gmail.comwrote:

Is there a plan for documenting the MySQL tables? I know the ODK team
are good designers, but querying the tables for binary data seems a
little un-straightforward.

For instance, it would be nice to know why, for 1 instance of a form
with 1 image captured by the camera, I would have 13 records for the
same _TOP_LEVEL_AURI in the following tables:

_blb
_ref

when the following tables have the expected 1 record each:

_bn
_vbn

My guess here is the design accommodates very large blobs by splitting
them into parts. But I am not having any success digging in the code
base to figure out how the parts are queried or put back together.

Would love some detailed documentation here, as my goal is to write a
very form-specific reporting feature for non-Aggregate users to easily
visualize (yes, with embedded images).

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

Mitch,

Thanks for the details on the schema. As I test out some raw SQL, I may have
follow-up questions.

Actually, I can think of one right now: I deleted a form from Aggregate,
which deleted the tables * tables. Then I uploaded a
new form with similar questions, and the following tables were created:

_blb
bn
vbn2
_ref2

Any idea how those *2 names got in there? It really did appear as if the
former form was completed deleted, but perhaps there was an index/reference
to the same formid in another core Aggregate table that caused the *2
version?

sfm

··· On Mon, Feb 7, 2011 at 4:26 PM, Mitch Sundt wrote:

Oh, and to loop back to your question about the complexity -- if we
supported only one binary attachment per binary content field, and if we
didn't support multiple versions of the binary attachment, we could collapse
this into just the _ref and _blb tables. It was an early design choice (by
me) to add that complexity, and that may or may not have been warranted. A
year from now it should be clear whether it was a dead end or has become
something useful.

Mitch

On Mon, Feb 7, 2011 at 3:17 PM, Mitch Sundt msundt@cs.washington.eduwrote:

ARGH. Continuing where I left off:

_bin -- Binary content (BinaryContent.java) -- holds the filename (if any)
of the binary data and the version code for the current version of the
binary data.

Binary content fields within an xform can have only one binary attachment
per field, but this table allows multiple binary attachments within a single
field. This is currently used for multimedia forms, where the files in the
media directory are stored in one binary content field as separate records
(distinguished by their ordinals).

_vbn -- Versioned binary content (VersionedBinaryContent.java) -- holds
the version code, content length, type and content hash for a specific
version of a binary attachment.

Versioned binary content allows multiple versions of an attachment to be
saved, and allows for atomic switching across versions by updating the _bin
record with the new active version. The versioned binary content record with
the version code that matches that in the _bin table is active content for
that field.

_ref -- Versioned binary content ref Blob
(VersionedBinaryContentRefBlob.java) -- for a given binary attachment,
maintains the ordering of the segments of the _blb table that are used to
reconstruct the blob. (_DOM_AURI == PK into _vbn; _SUB_AURI == PK into
_blb; PART = 1..n the ordering of the _blb record in the reconstruction of
the blob).

_blb -- Ref Blob (RefBlob.java) -- a segment of the binary data (VALUE).

=====
Mitch

On Mon, Feb 7, 2011 at 2:59 PM, Mitch Sundt msundt@cs.washington.eduwrote:

Hi Stuart,

Yes, we are actually going to implement a Java manipulator class for
handling binary objects.

The short answer to your question is as follows:

The tables are chained together downward, from top to bottom.

_bin -> _vbn -> _ref -> _blb

This set of 4 tables stores one binary data field in the form. Other
binary data fields are stored in their own unique set of 4 tables. So you
don't need to worry about figuring out what field the data corresponds to --
that's determined from the table name.

_bin
vbn
ref
_blb

The naming of the table refers to what it describes:

_bn -- BINARY CONTENT
the unrooted filename of the binary data, if any.

On Mon, Feb 7, 2011 at 2:43 PM, Stuart Moffatt stuartmoffatt@gmail.comwrote:

Is there a plan for documenting the MySQL tables? I know the ODK team
are good designers, but querying the tables for binary data seems a
little un-straightforward.

For instance, it would be nice to know why, for 1 instance of a form
with 1 image captured by the camera, I would have 13 records for the
same _TOP_LEVEL_AURI in the following tables:

_blb
_ref

when the following tables have the expected 1 record each:

_bn
_vbn

My guess here is the design accommodates very large blobs by splitting
them into parts. But I am not having any success digging in the code
base to figure out how the parts are queried or put back together.

Would love some detailed documentation here, as my goal is to write a
very form-specific reporting feature for non-Aggregate users to easily
visualize (yes, with embedded images).

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

This is one of the issues I've been banging my head against. It is as if
the Spring JDBC connection management layer is caching or retaining a stale
view of the current state of the database. If I recall correctly, it even
persists across bounces of tomcat (but you might want to double-check that
-- it has been a while). Any ideas?

The naming is determined dynamically (so there is never a collision across
multiple forms). What you are seeing is the jdbc layer saying that the
database still has a table named __vbn, so the code
starts tacking on a number N (=2), and incrementing it until the jdbc layer
says that table name doesn't exist.

Mitch

··· On Mon, Feb 7, 2011 at 5:14 PM, Stuart Moffatt wrote:

Mitch,

Thanks for the details on the schema. As I test out some raw SQL, I may
have follow-up questions.

Actually, I can think of one right now: I deleted a form from Aggregate,
which deleted the tables * tables. Then I uploaded a
new form with similar questions, and the following tables were created:

_blb
bn
vbn2
_ref2

Any idea how those *2 names got in there? It really did appear as if the
former form was completed deleted, but perhaps there was an index/reference
to the same formid in another core Aggregate table that caused the *2
version?

sfm

On Mon, Feb 7, 2011 at 4:26 PM, Mitch Sundt msundt@cs.washington.eduwrote:

Oh, and to loop back to your question about the complexity -- if we
supported only one binary attachment per binary content field, and if we
didn't support multiple versions of the binary attachment, we could collapse
this into just the _ref and _blb tables. It was an early design choice (by
me) to add that complexity, and that may or may not have been warranted. A
year from now it should be clear whether it was a dead end or has become
something useful.

Mitch

On Mon, Feb 7, 2011 at 3:17 PM, Mitch Sundt msundt@cs.washington.eduwrote:

ARGH. Continuing where I left off:

_bin -- Binary content (BinaryContent.java) -- holds the filename (if
any) of the binary data and the version code for the current version of the
binary data.

Binary content fields within an xform can have only one binary attachment
per field, but this table allows multiple binary attachments within a single
field. This is currently used for multimedia forms, where the files in the
media directory are stored in one binary content field as separate records
(distinguished by their ordinals).

_vbn -- Versioned binary content (VersionedBinaryContent.java) -- holds
the version code, content length, type and content hash for a specific
version of a binary attachment.

Versioned binary content allows multiple versions of an attachment to be
saved, and allows for atomic switching across versions by updating the _bin
record with the new active version. The versioned binary content record with
the version code that matches that in the _bin table is active content for
that field.

_ref -- Versioned binary content ref Blob
(VersionedBinaryContentRefBlob.java) -- for a given binary attachment,
maintains the ordering of the segments of the _blb table that are used to
reconstruct the blob. (_DOM_AURI == PK into _vbn; _SUB_AURI == PK into
_blb; PART = 1..n the ordering of the _blb record in the reconstruction of
the blob).

_blb -- Ref Blob (RefBlob.java) -- a segment of the binary data (VALUE).

=====
Mitch

On Mon, Feb 7, 2011 at 2:59 PM, Mitch Sundt msundt@cs.washington.eduwrote:

Hi Stuart,

Yes, we are actually going to implement a Java manipulator class for
handling binary objects.

The short answer to your question is as follows:

The tables are chained together downward, from top to bottom.

_bin -> _vbn -> _ref -> _blb

This set of 4 tables stores one binary data field in the form. Other
binary data fields are stored in their own unique set of 4 tables. So you
don't need to worry about figuring out what field the data corresponds to --
that's determined from the table name.

_bin
vbn
ref
_blb

The naming of the table refers to what it describes:

_bn -- BINARY CONTENT
the unrooted filename of the binary data, if any.

On Mon, Feb 7, 2011 at 2:43 PM, Stuart Moffatt <stuartmoffatt@gmail.com wrote:

Is there a plan for documenting the MySQL tables? I know the ODK team
are good designers, but querying the tables for binary data seems a
little un-straightforward.

For instance, it would be nice to know why, for 1 instance of a form
with 1 image captured by the camera, I would have 13 records for the
same _TOP_LEVEL_AURI in the following tables:

_blb
_ref

when the following tables have the expected 1 record each:

_bn
_vbn

My guess here is the design accommodates very large blobs by splitting
them into parts. But I am not having any success digging in the code
base to figure out how the parts are queried or put back together.

Would love some detailed documentation here, as my goal is to write a
very form-specific reporting feature for non-Aggregate users to easily
visualize (yes, with embedded images).

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com