Issue: Stuttering record writing

ODeeKers,

Having just completed data collection, 4500 surveys, we're cleaning the
dataset and found a funky error.

0.2 % of the time, ODK will write data into the wrong node of the instance.

It is a rare issue, but I thought I should bring it to light. You will only
notice it if you are doing a large enough survey, and carefully scour your
data for errors.

The way we found this error is looking for outliers, an answer in a node
that is unlike all the other answers. For example, it question 99 is a
Yes/No question, then in the instance, the data should look like
Yes or No

This error will manifest by writing the data from the adjacent question into
that node.

If the adjacent question, q98, is "What is a lucky number" you may see the
answer '7' written into the node for question 99.
7

It sticks out because it is an impossible answer for that node.

I find the error in 10 instances out of 4500.

If adjacent questions have similar values, like two scalar questions next to
each other, both with scales of 1-5; this error is invisible.

It seems to self correct within the same record, so that after the error,
nodes and data are re-aligned. I had some fear that it was like buttoning
your shirt, if you miss the bottom button, all the buttons are wrong...but
it isn't like that.

I am not sure if this is a due to running out of memory, it appears to be a
quick stutter in the data-write, it doesn't spoil the whole record, just a
piece of it. I see the error both in MyTouch 32a and 32b, which have
different amounts of internal memory.

Your thoughts on this issue?

From Sajj House restaurant, Monrovia, best pizza in town~

Neil

Hey Neil,

Is it always the adjacent question, or does it look random?

Also, what version were you using?

In Collect 1.1.4 there is a bug that sometimes occurs when using the 'jump
to' feature. Occasionally when you jump from Question 1 to Question 5, the
answer to Question 1 will get saved into Question 5.

The issue should be resolved in versions 1.1.5 and beyond.

However, it seems unlikely that people would often 'jump to' an adjacent
question, so if it's only/always the adjacent question we may need to look a
little deeper.
-Carl

··· On Mon, Dec 20, 2010 at 10:17 AM, Neil Hendrick wrote:

ODeeKers,

Having just completed data collection, 4500 surveys, we're cleaning the
dataset and found a funky error.

0.2 % of the time, ODK will write data into the wrong node of the instance.

It is a rare issue, but I thought I should bring it to light. You will only
notice it if you are doing a large enough survey, and carefully scour your
data for errors.

The way we found this error is looking for outliers, an answer in a node
that is unlike all the other answers. For example, it question 99 is a
Yes/No question, then in the instance, the data should look like
Yes or No

This error will manifest by writing the data from the adjacent question
into that node.

If the adjacent question, q98, is "What is a lucky number" you may see the
answer '7' written into the node for question 99.
7

It sticks out because it is an impossible answer for that node.

I find the error in 10 instances out of 4500.

If adjacent questions have similar values, like two scalar questions next
to each other, both with scales of 1-5; this error is invisible.

It seems to self correct within the same record, so that after the error,
nodes and data are re-aligned. I had some fear that it was like buttoning
your shirt, if you miss the bottom button, all the buttons are wrong...but
it isn't like that.

I am not sure if this is a due to running out of memory, it appears to be a
quick stutter in the data-write, it doesn't spoil the whole record, just a
piece of it. I see the error both in MyTouch 32a and 32b, which have
different amounts of internal memory.

Your thoughts on this issue?

From Sajj House restaurant, Monrovia, best pizza in town~

Neil

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.comopendatakit%2Bunsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

We had a similar issue come up. Our form was fairly long (+410
questions) and we totalled about 6000 (that order of magnitude anyway)
submissions. Of the submissions ~180 had errors in them related to
this issue.

We found that the misplaced answer could be from anywhere else in the
form, the questions that got corrupted (and the origin question where
the corrupt answer came from) was more or less random. Ie: q25
corrupt with answer from q3, or q321 corrupt with answer from q47,
etc. Sometimes one submission would have multiple corrupt answers.

You're right, it's very hard to tell if an answer is corrupt for
regular text input type questions, unless you analyze each submission
by hand, so likely the error rate was higher than what we detected.

In our case all 40+ phones were HTC Tattoos. The bulk of the corrupt
submissions came from 1 user/phone (with a few from a handful of other
phones).

That last point lends credence to the possibility that it was the jump-
to bug (or some other user induced phenomenon), and that that one
specific user was making heavy use of that feature. I'm not 100%
convinced that the problem has been fixed though. At the time I was
analyzing I found it impossible to reproduce the bug, so it would be
hard to verify the fixedness of the issue. Everyone should probably
keep a close eye on their data for any weird answers... just to be
safe.

Cheers,
Anton

··· On Dec 20, 5:52 am, Carl Hartung wrote: > Hey Neil, > > Is it always the adjacent question, or does it look random? > > Also, what version were you using? > > In Collect 1.1.4 there is a bug that sometimes occurs when using the 'jump > to' feature. Occasionally when you jump from Question 1 to Question 5, the > answer to Question 1 will get saved into Question 5. > > The issue should be resolved in versions 1.1.5 and beyond. > > However, it seems unlikely that people would often 'jump to' an adjacent > question, so if it's only/always the adjacent question we may need to look a > little deeper. > -Carl > > > > > > > > On Mon, Dec 20, 2010 at 10:17 AM, Neil Hendrick wrote: > > ODeeKers, > > > Having just completed data collection, 4500 surveys, we're cleaning the > > dataset and found a funky error. > > > 0.2 % of the time, ODK will write data into the wrong node of the instance. > > > It is a rare issue, but I thought I should bring it to light. You will only > > notice it if you are doing a large enough survey, and carefully scour your > > data for errors. > > > The way we found this error is looking for outliers, an answer in a node > > that is unlike all the other answers. For example, it question 99 is a > > Yes/No question, then in the instance, the data should look like > > Yes or No > > > This error will manifest by writing the data from the adjacent question > > into that node. > > > If the adjacent question, q98, is "What is a lucky number" you may see the > > answer '7' written into the node for question 99. > > 7 > > > It sticks out because it is an impossible answer for that node. > > > I find the error in 10 instances out of 4500. > > > If adjacent questions have similar values, like two scalar questions next > > to each other, both with scales of 1-5; this error is invisible. > > > It seems to self correct within the same record, so that after the error, > > nodes and data are re-aligned. I had some fear that it was like buttoning > > your shirt, if you miss the bottom button, all the buttons are wrong...but > > it isn't like that. > > > I am not sure if this is a due to running out of memory, it appears to be a > > quick stutter in the data-write, it doesn't spoil the whole record, just a > > piece of it. I see the error both in MyTouch 32a and 32b, which have > > different amounts of internal memory. > > > Your thoughts on this issue? > > > From Sajj House restaurant, Monrovia, best pizza in town~ > > > Neil > > > -- > > Post: opendatakit@googlegroups.com > > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > > Options:http://groups.google.com/group/opendatakit?hl=en