The maximum field width you set does not affect storage efficiency all that
much (less than 2% overhead beyond what is required to represent the actual
data value).
I believe pretty much all modern databases, when they store string fields,
store them as length-delimited fields, so they use only a few bytes more
than the number of bytes required to store the string itself.
That said, there are some key limitations to understand:
(1) Google AppEngine's BigTable only stores strings up to 500 unicode
characters long. If you specify a string of 16000 UTF-8 characters, we
create a shadow Text datatype to store the entire string (for up to 1
megabyte of text). This is all transparent to the user, but impacts
efficiency. In fact, this 500-character limit was increased from its
original size, so we actually do this dual-storage if the maximum length of
the string field is defined to be longer than 255 characters (so even if
you store a 1-character string in a field defined to be 256 characters
long, we will store it twice, once under the string, once in a Text field
(which is stored separately from the row contents)).
(2) Google AppEngine's BigTable datastore has a 1,000,000 byte limit to the
maximum length of any row in any data table.
(3) MySQL databases have a 65536-byte limit to the maximum length of any
row in any data table.
(4) PostgreSQL databases have a 1-megabyte limit to the maximum length of
any row in any data table.
Because of (2), (3), and (4), when ODK Aggregate creates database tables in
which to store a form, it needs to count the maximum number of characters
and bytes that may be needed to store the value of each field (column in
the table), and use that maximum length to split the fields across as many
backing database tables as required by these limitations.
Furthermore, each database table has a few columns of metadata -- a maximum
of about 800 bytes.
For MySQL, the impact of this is dramatic: If you defined every string
field to be 16,000 bytes long, that means that MySQL databases would only
be able to store the answers to 4 string fields per database table (because
of the 65,536 byte limit on each row). So if you had 100 questions, you
would end up having 25 database tables to store all those answers. If you
only had 255 bytes per string field, then you would only need 1 table.
(Google AppEngine and PostgreSQL databases would require 2 tables vs 1
table).
And the more tables you have, the more queries need to be issued across all
those tables to retrieve your data.
Finally, because filtering logic across tables is nearly impossible to do
with Google AppEngine (you have to do all the join logic within your
application layer), there is a limitation within ODK Aggregate that when we
define value filters, only the filters on the 1st backing database table
are applied.
Mitch
···
On Tue, Feb 19, 2013 at 1:39 AM, Joseph wrote:
Hi Mitch,
I have question regarding worst data storage efficiency within data store,
if I increase string length more than 255 char. Could you please explain a
bit about what you mean by storage efficiency? Why can't you assign 16000
UTF-8 characters as default, this certainly helps surveys based on
qualitative (narrative) data?
Thank you
Joseph
On Wednesday, February 13, 2013 12:58:55 AM UTC+5:30, Mitch wrote:
Wrong link -- see: http://opendatakit.org/help/**form-design/guidelines/http://opendatakit.org/help/form-design/guidelines/
Scroll down to Datastore String Length
The 255-character limit that can be changed.
Databases need to know the maximum amount of space to reserve for data.
We chose 255 as it is about a screenful of text (on a phone) and because
it is also the default used by most database software.
Mitch
On Tue, Feb 12, 2013 at 3:24 AM, ゴー・ニコライ nikol...@gmail.com wrote:
I think there's a 255-character limit default per field, then it gets
truncated. You may use length binding to set a bigger size.
http://opendatakit.org/help/**form-design/binding/http://opendatakit.org/help/form-design/binding/
/r
Nik
On Tuesday, February 12, 2013, Joseph wrote:
Hello all,
I was wondering about word limit for Text fields in ODK collect. So far
I noticed that maximum 45 words as limit and what ever the data which was
entered there after, was just vanished.
Is it normal that text fields has a limit in words? Any kind of help is
appreciated.
Thanks in advance
Joseph
--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@**googlegroups.com
Options: http://groups.google.com/**group/opendatakit?hl=enhttp://groups.google.com/group/opendatakit?hl=en
You received this message because you are subscribed to the Google
Groups "ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit+unsubscribe@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.
--
Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@**googlegroups.com
Options: http://groups.google.com/**group/opendatakit?hl=enhttp://groups.google.com/group/opendatakit?hl=en
You received this message because you are subscribed to the Google
Groups "ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to opendatakit...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.
--
Mitch Sundt
Software Engineer
University of Washington
mitche...@gmail.com
--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en
You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com