Language encoding issues

Hi all,

I've been trying to create a form which includes Amharic (Ethiopian)
characters. The uploaded form (as far as I can tell) is encoded
correctly using UTF-8, at least, it displays the characters correctly
when I open the file in my text editor (gedit on Ubuntu).

However after uploading the form if I view (on the Form XML viewer
page) or download it via ODKAggregate it doesn't recognise the Amharic
characters. You can see the form at: https://hew-datacollect.appspot.com/formXml?formId=Amharic_test.

I saw another post regarding issues with cyrillic scripts (http://
groups.google.com/group/opendatakit/browse_thread/thread/
d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language
+encoding#d727d6e415c5695f), so assume this is is the same issue. The
html headers when I download the form show that utf-8 is set correctly
and the xml header sets it to be utf-8 too ... for info the http
headers when I download the form are:

https://hew-datacollect.appspot.com/formXml?formId=Amharic_test

GET /formXml?formId=Amharic_test HTTP/1.1

Host: hew-datacollect.appspot.com

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101
Firefox/5.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/
*;q=0.8

Accept-Language: en-gb,en;q=0.5

Accept-Encoding: gzip, deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

DNT: 1

Connection: keep-alive

Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ

HTTP/1.1 200 OK

Content-Type: text/xml; charset=utf-8

Content-Disposition: attachment; filename="TestingAmharic.xml";

Content-Encoding: gzip

Server: Google Frontend

Cache-Control: private

Content-Length: 487

On the other side, I can submit a form response (via ODKCollect) which
includes Amharic script and these are displayed fine when I view the
form responses on ODKAggregate. So to me, it seems there is maybe an
issue with the encoding that the form uploader accepts?

I'd really like to get a fix for this, but not sure where to start
looking in the ODKAggregate code, if someone can give me some
pointers, am happy to see if I can figure out where the problem is.

Cheers,

Alex

··· Date: Tue, 02 Aug 2011 22:20:11 GMT

As a follow up... if I put the form on my locally hosted ODKAggregate
server then it will download fine with the correct characters encoding

  • so seems this may be an issue with ODKAggregate hosted on GAE?

(both my locally hosted server and GAE versions have been installed in
the past couple of days using the same download of ODK Aggregate)

Alex

··· On Aug 3, 12:34 am, Alex Little wrote: > Hi all, > > I've been trying to create a form which includes Amharic (Ethiopian) > characters. The uploaded form (as far as I can tell) is encoded > correctly using UTF-8, at least, it displays the characters correctly > when I open the file in my text editor (gedit on Ubuntu). > > However after uploading the form if I view (on the Form XML viewer > page) or download it via ODKAggregate it doesn't recognise the Amharic > characters. You can see the form at:https://hew-datacollect.appspot.com/formXml?formId=Amharic_test. > > I saw another post regarding issues with cyrillic scripts (http:// > groups.google.com/group/opendatakit/browse_thread/thread/ > d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language > +encoding#d727d6e415c5695f), so assume this is is the same issue. The > html headers when I download the form show that utf-8 is set correctly > and the xml header sets it to be utf-8 too ... for info the http > headers when I download the form are: > > https://hew-datacollect.appspot.com/formXml?formId=Amharic_test > > GET /formXml?formId=Amharic_test HTTP/1.1 > > Host: hew-datacollect.appspot.com > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 > Firefox/5.0 > > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/ > *;q=0.8 > > Accept-Language: en-gb,en;q=0.5 > > Accept-Encoding: gzip, deflate > > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 > > DNT: 1 > > Connection: keep-alive > > Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ > > HTTP/1.1 200 OK > > Content-Type: text/xml; charset=utf-8 > > Content-Disposition: attachment; filename="TestingAmharic.xml"; > > Content-Encoding: gzip > > Date: Tue, 02 Aug 2011 22:20:11 GMT > > Server: Google Frontend > > Cache-Control: private > > Content-Length: 487 > > On the other side, I can submit a form response (via ODKCollect) which > includes Amharic script and these are displayed fine when I view the > form responses on ODKAggregate. So to me, it seems there is maybe an > issue with the encoding that the form uploader accepts? > > I'd really like to get a fix for this, but not sure where to start > looking in the ODKAggregate code, if someone can give me some > pointers, am happy to see if I can figure out where the problem is. > > Cheers, > > Alex

Hi Alex,

If you are seeing a difference between a Tomcat-hosted aggregate and a
GAE-hosted aggregate, it is likely an issue with the GAE infrastructure.
There might be an issue already opened against GAE on that; if you can find
it, please update the ticket Yaw opened (
http://code.google.com/p/opendatakit/issues/detail?id=285 ). I'll take a
look at this later today. I definitely see the issue on GAE 1.0 alpha 3; I
haven't confirmed the difference in behavior on a local instance; there
might be an issue with the upload-form code and its handling of attachment
character sets.

The downloaded form is retrieved and returned in the:

org.opendatakit.aggregate.servlet.FormXmlServlet

This same code is used for the human-readable display (
http://opendatakit.appspot.com/www/formXml?readable=true&formId=Miramare )
and the XML downloaded by ODK Collect (
http://opendatakit.appspot.com/formXml?formId=Miramare )

The form upload page is rendered and handled by:

org.opendatakit.aggregate.servlet.FormUploadServlet

Mitch

··· On Tue, Aug 2, 2011 at 4:01 PM, Alex Little wrote:

As a follow up... if I put the form on my locally hosted ODKAggregate
server then it will download fine with the correct characters encoding

  • so seems this may be an issue with ODKAggregate hosted on GAE?

(both my locally hosted server and GAE versions have been installed in
the past couple of days using the same download of ODK Aggregate)

Alex

On Aug 3, 12:34 am, Alex Little a...@alexlittle.net wrote:

Hi all,

I've been trying to create a form which includes Amharic (Ethiopian)
characters. The uploaded form (as far as I can tell) is encoded
correctly using UTF-8, at least, it displays the characters correctly
when I open the file in my text editor (gedit on Ubuntu).

However after uploading the form if I view (on the Form XML viewer
page) or download it via ODKAggregate it doesn't recognise the Amharic
characters. You can see the form at:
https://hew-datacollect.appspot.com/formXml?formId=Amharic_test.

I saw another post regarding issues with cyrillic scripts (http://
groups.google.com/group/opendatakit/browse_thread/thread/
d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language
+encoding#d727d6e415c5695f), so assume this is is the same issue. The
html headers when I download the form show that utf-8 is set correctly
and the xml header sets it to be utf-8 too ... for info the http
headers when I download the form are:

https://hew-datacollect.appspot.com/formXml?formId=Amharic_test

GET /formXml?formId=Amharic_test HTTP/1.1

Host: hew-datacollect.appspot.com

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101
Firefox/5.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/
*;q=0.8

Accept-Language: en-gb,en;q=0.5

Accept-Encoding: gzip, deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

DNT: 1

Connection: keep-alive

Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ

HTTP/1.1 200 OK

Content-Type: text/xml; charset=utf-8

Content-Disposition: attachment; filename="TestingAmharic.xml";

Content-Encoding: gzip

Date: Tue, 02 Aug 2011 22:20:11 GMT

Server: Google Frontend

Cache-Control: private

Content-Length: 487

On the other side, I can submit a form response (via ODKCollect) which
includes Amharic script and these are displayed fine when I view the
form responses on ODKAggregate. So to me, it seems there is maybe an
issue with the encoding that the form uploader accepts?

I'd really like to get a fix for this, but not sure where to start
looking in the ODKAggregate code, if someone can give me some
pointers, am happy to see if I can figure out where the problem is.

Cheers,

Alex

--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en

--
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org
University of Washington
mitchellsundt@gmail.com

Thanks Mitch,

Seem to be a couple of reports about incorrect encoding (though these
may not be totally relevant):
http://code.google.com/p/googleappengine/issues/detail?id=2749 (seems
to be more related to python - but may shed some light on what the
issue is in java)
and
http://code.google.com/p/googleappengine/issues/detail?id=4265

Will have a look further.

Alex

··· On Aug 3, 9:10 pm, Mitch Sundt wrote: > Hi Alex, > > If you are seeing a difference between a Tomcat-hosted aggregate and a > GAE-hosted aggregate, it is likely an issue with the GAE infrastructure. > There might be an issue already opened against GAE on that; if you can find > it, please update the ticket Yaw opened (http://code.google.com/p/opendatakit/issues/detail?id=285). I'll take a > look at this later today. I definitely see the issue on GAE 1.0 alpha 3; I > haven't confirmed the difference in behavior on a local instance; there > might be an issue with the upload-form code and its handling of attachment > character sets. > > The downloaded form is retrieved and returned in the: > > org.opendatakit.aggregate.servlet.FormXmlServlet > > This same code is used for the human-readable display (http://opendatakit.appspot.com/www/formXml?readable=true&formId=Miramare) > and the XML downloaded by ODK Collect (http://opendatakit.appspot.com/formXml?formId=Miramare) > > The form upload page is rendered and handled by: > > org.opendatakit.aggregate.servlet.FormUploadServlet > > Mitch > > > > On Tue, Aug 2, 2011 at 4:01 PM, Alex Little wrote: > > As a follow up... if I put the form on my locally hosted ODKAggregate > > server then it will download fine with the correct characters encoding > > - so seems this may be an issue with ODKAggregate hosted on GAE? > > > (both my locally hosted server and GAE versions have been installed in > > the past couple of days using the same download of ODK Aggregate) > > > Alex > > > On Aug 3, 12:34 am, Alex Little wrote: > > > Hi all, > > > > I've been trying to create a form which includes Amharic (Ethiopian) > > > characters. The uploaded form (as far as I can tell) is encoded > > > correctly using UTF-8, at least, it displays the characters correctly > > > when I open the file in my text editor (gedit on Ubuntu). > > > > However after uploading the form if I view (on the Form XML viewer > > > page) or download it via ODKAggregate it doesn't recognise the Amharic > > > characters. You can see the form at: > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test. > > > > I saw another post regarding issues with cyrillic scripts (http:// > > > groups.google.com/group/opendatakit/browse_thread/thread/ > > > d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language > > > +encoding#d727d6e415c5695f), so assume this is is the same issue. The > > > html headers when I download the form show that utf-8 is set correctly > > > and the xml header sets it to be utf-8 too ... for info the http > > > headers when I download the form are: > > > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test > > > > GET /formXml?formId=Amharic_test HTTP/1.1 > > > > Host: hew-datacollect.appspot.com > > > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 > > > Firefox/5.0 > > > > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/ > > > *;q=0.8 > > > > Accept-Language: en-gb,en;q=0.5 > > > > Accept-Encoding: gzip, deflate > > > > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 > > > > DNT: 1 > > > > Connection: keep-alive > > > > Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ > > > > HTTP/1.1 200 OK > > > > Content-Type: text/xml; charset=utf-8 > > > > Content-Disposition: attachment; filename="TestingAmharic.xml"; > > > > Content-Encoding: gzip > > > > Date: Tue, 02 Aug 2011 22:20:11 GMT > > > > Server: Google Frontend > > > > Cache-Control: private > > > > Content-Length: 487 > > > > On the other side, I can submit a form response (via ODKCollect) which > > > includes Amharic script and these are displayed fine when I view the > > > form responses on ODKAggregate. So to me, it seems there is maybe an > > > issue with the encoding that the form uploader accepts? > > > > I'd really like to get a fix for this, but not sure where to start > > > looking in the ODKAggregate code, if someone can give me some > > > pointers, am happy to see if I can figure out where the problem is. > > > > Cheers, > > > > Alex > > > -- > > Post: opendatakit@googlegroups.com > > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > > Options:http://groups.google.com/group/opendatakit?hl=en > > -- > Mitch Sundt > Software Engineerhttp://www.OpenDataKit.org > University of Washington > mitchellsu...@gmail.com

Hi,

I've just been testing with that latest version of ODKAggregate (v1.0
rev3 utf-8) and this certainly fixes the issue of being unable to
download UTF-8 encoded forms from ODKAggregate running on GAE.

However, I'm still getting an issue when trying to display the forms
on my ODKCollect client - the characters encoding only seems to be
partially working. I can confirm the the form is downloaded and stored
on the phone correctly (looking at the downloaded file in sdcard/odk/
forms - for info the form I'm using is: https://hew-datacollect.appspot.com/formXml?formId=Amharic_3).

When I go through the questions on the form - the questions with the
Amharic characters do not display (just the 'missing character
square'). However, if I save a form then return to it by going to
'continue saved forms' and then I can view the summary of the form
questions and data entered so far (the page which asks if I want to go
to the start or end of the form), then on this page the Amharic
characters in the question will display fine, but not when I return to
the proper data entry page.

So it seems that this is very nearly working! Possibly just a encoded
setting needed on the question display page? I'll take a look at the
ODKCollect code and see if I can figure anything out.

Cheers,
Alex

··· On Aug 4, 11:28 am, Alex Little wrote: > Thanks Mitch, > > Seem to be a couple of reports about incorrect encoding (though these > may not be totally relevant):http://code.google.com/p/googleappengine/issues/detail?id=2749(seems > to be more related to python - but may shed some light on what the > issue is in java) > andhttp://code.google.com/p/googleappengine/issues/detail?id=4265 > > Will have a look further. > > Alex > > On Aug 3, 9:10 pm, Mitch Sundt wrote: > > > > > > > > > Hi Alex, > > > If you are seeing a difference between a Tomcat-hosted aggregate and a > > GAE-hosted aggregate, it is likely an issue with the GAE infrastructure. > > There might be an issue already opened against GAE on that; if you can find > > it, please update the ticket Yaw opened (http://code.google.com/p/opendatakit/issues/detail?id=285). I'll take a > > look at this later today. I definitely see the issue on GAE 1.0 alpha 3; I > > haven't confirmed the difference in behavior on a local instance; there > > might be an issue with the upload-form code and its handling of attachment > > character sets. > > > The downloaded form is retrieved and returned in the: > > > org.opendatakit.aggregate.servlet.FormXmlServlet > > > This same code is used for the human-readable display (http://opendatakit.appspot.com/www/formXml?readable=true&formId=Miramare) > > and the XML downloaded by ODK Collect (http://opendatakit.appspot.com/formXml?formId=Miramare) > > > The form upload page is rendered and handled by: > > > org.opendatakit.aggregate.servlet.FormUploadServlet > > > Mitch > > > On Tue, Aug 2, 2011 at 4:01 PM, Alex Little wrote: > > > As a follow up... if I put the form on my locally hosted ODKAggregate > > > server then it will download fine with the correct characters encoding > > > - so seems this may be an issue with ODKAggregate hosted on GAE? > > > > (both my locally hosted server and GAE versions have been installed in > > > the past couple of days using the same download of ODK Aggregate) > > > > Alex > > > > On Aug 3, 12:34 am, Alex Little wrote: > > > > Hi all, > > > > > I've been trying to create a form which includes Amharic (Ethiopian) > > > > characters. The uploaded form (as far as I can tell) is encoded > > > > correctly using UTF-8, at least, it displays the characters correctly > > > > when I open the file in my text editor (gedit on Ubuntu). > > > > > However after uploading the form if I view (on the Form XML viewer > > > > page) or download it via ODKAggregate it doesn't recognise the Amharic > > > > characters. You can see the form at: > > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test. > > > > > I saw another post regarding issues with cyrillic scripts (http:// > > > > groups.google.com/group/opendatakit/browse_thread/thread/ > > > > d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language > > > > +encoding#d727d6e415c5695f), so assume this is is the same issue. The > > > > html headers when I download the form show that utf-8 is set correctly > > > > and the xml header sets it to be utf-8 too ... for info the http > > > > headers when I download the form are: > > > > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test > > > > > GET /formXml?formId=Amharic_test HTTP/1.1 > > > > > Host: hew-datacollect.appspot.com > > > > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 > > > > Firefox/5.0 > > > > > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/ > > > > *;q=0.8 > > > > > Accept-Language: en-gb,en;q=0.5 > > > > > Accept-Encoding: gzip, deflate > > > > > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 > > > > > DNT: 1 > > > > > Connection: keep-alive > > > > > Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ > > > > > HTTP/1.1 200 OK > > > > > Content-Type: text/xml; charset=utf-8 > > > > > Content-Disposition: attachment; filename="TestingAmharic.xml"; > > > > > Content-Encoding: gzip > > > > > Date: Tue, 02 Aug 2011 22:20:11 GMT > > > > > Server: Google Frontend > > > > > Cache-Control: private > > > > > Content-Length: 487 > > > > > On the other side, I can submit a form response (via ODKCollect) which > > > > includes Amharic script and these are displayed fine when I view the > > > > form responses on ODKAggregate. So to me, it seems there is maybe an > > > > issue with the encoding that the form uploader accepts? > > > > > I'd really like to get a fix for this, but not sure where to start > > > > looking in the ODKAggregate code, if someone can give me some > > > > pointers, am happy to see if I can figure out where the problem is. > > > > > Cheers, > > > > > Alex > > > > -- > > > Post: opendatakit@googlegroups.com > > > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > > > Options:http://groups.google.com/group/opendatakit?hl=en > > > -- > > Mitch Sundt > > Software Engineerhttp://www.OpenDataKit.org > > University of Washington > > mitchellsu...@gmail.com

alex,

this is most certainly a collect bug. it's been filed at
http://code.google.com/p/opendatakit/issues/detail?id=303. i've got a
pretty good idea what the problem is so shouldn't be too hard to fix.

yaw

··· On Tue, Aug 16, 2011 at 01:18, Alex Little wrote: > Hi, > > I've just been testing with that latest version of ODKAggregate (v1.0 > rev3 utf-8) and this certainly fixes the issue of being unable to > download UTF-8 encoded forms from ODKAggregate running on GAE. > > However, I'm still getting an issue when trying to display the forms > on my ODKCollect client - the characters encoding only seems to be > partially working. I can confirm the the form is downloaded and stored > on the phone correctly (looking at the downloaded file in sdcard/odk/ > forms - for info the form I'm using is: https://hew-datacollect.appspot.com/formXml?formId=Amharic_3). > > When I go through the questions on the form - the questions with the > Amharic characters do not display (just the 'missing character > square'). *However*, if I save a form then return to it by going to > 'continue saved forms' and then I can view the summary of the form > questions and data entered so far (the page which asks if I want to go > to the start or end of the form), then on this page the Amharic > characters in the question will display fine, but not when I return to > the proper data entry page. > > So it seems that this is very nearly working! Possibly just a encoded > setting needed on the question display page? I'll take a look at the > ODKCollect code and see if I can figure anything out. > > Cheers, > Alex > > On Aug 4, 11:28 am, Alex Little wrote: >> Thanks Mitch, >> >> Seem to be a couple of reports about incorrect encoding (though these >> may not be totally relevant):http://code.google.com/p/googleappengine/issues/detail?id=2749(seems >> to be more related to python - but may shed some light on what the >> issue is in java) >> andhttp://code.google.com/p/googleappengine/issues/detail?id=4265 >> >> Will have a look further. >> >> Alex >> >> On Aug 3, 9:10 pm, Mitch Sundt wrote: >> >> >> >> >> >> >> >> > Hi Alex, >> >> > If you are seeing a difference between a Tomcat-hosted aggregate and a >> > GAE-hosted aggregate, it is likely an issue with the GAE infrastructure. >> > There might be an issue already opened against GAE on that; if you can find >> > it, please update the ticket Yaw opened (http://code.google.com/p/opendatakit/issues/detail?id=285). I'll take a >> > look at this later today. I definitely see the issue on GAE 1.0 alpha 3; I >> > haven't confirmed the difference in behavior on a local instance; there >> > might be an issue with the upload-form code and its handling of attachment >> > character sets. >> >> > The downloaded form is retrieved and returned in the: >> >> > org.opendatakit.aggregate.servlet.FormXmlServlet >> >> > This same code is used for the human-readable display (http://opendatakit.appspot.com/www/formXml?readable=true&formId=Miramare) >> > and the XML downloaded by ODK Collect (http://opendatakit.appspot.com/formXml?formId=Miramare) >> >> > The form upload page is rendered and handled by: >> >> > org.opendatakit.aggregate.servlet.FormUploadServlet >> >> > Mitch >> >> > On Tue, Aug 2, 2011 at 4:01 PM, Alex Little wrote: >> > > As a follow up... if I put the form on my locally hosted ODKAggregate >> > > server then it will download fine with the correct characters encoding >> > > - so seems this may be an issue with ODKAggregate hosted on GAE? >> >> > > (both my locally hosted server and GAE versions have been installed in >> > > the past couple of days using the same download of ODK Aggregate) >> >> > > Alex >> >> > > On Aug 3, 12:34 am, Alex Little wrote: >> > > > Hi all, >> >> > > > I've been trying to create a form which includes Amharic (Ethiopian) >> > > > characters. The uploaded form (as far as I can tell) is encoded >> > > > correctly using UTF-8, at least, it displays the characters correctly >> > > > when I open the file in my text editor (gedit on Ubuntu). >> >> > > > However after uploading the form if I view (on the Form XML viewer >> > > > page) or download it via ODKAggregate it doesn't recognise the Amharic >> > > > characters. You can see the form at: >> > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test. >> >> > > > I saw another post regarding issues with cyrillic scripts (http:// >> > > > groups.google.com/group/opendatakit/browse_thread/thread/ >> > > > d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language >> > > > +encoding#d727d6e415c5695f), so assume this is is the same issue. The >> > > > html headers when I download the form show that utf-8 is set correctly >> > > > and the xml header sets it to be utf-8 too ... for info the http >> > > > headers when I download the form are: >> >> > > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test >> >> > > > GET /formXml?formId=Amharic_test HTTP/1.1 >> >> > > > Host: hew-datacollect.appspot.com >> >> > > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 >> > > > Firefox/5.0 >> >> > > > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/ >> > > > *;q=0.8 >> >> > > > Accept-Language: en-gb,en;q=0.5 >> >> > > > Accept-Encoding: gzip, deflate >> >> > > > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 >> >> > > > DNT: 1 >> >> > > > Connection: keep-alive >> >> > > > Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ >> >> > > > HTTP/1.1 200 OK >> >> > > > Content-Type: text/xml; charset=utf-8 >> >> > > > Content-Disposition: attachment; filename="TestingAmharic.xml"; >> >> > > > Content-Encoding: gzip >> >> > > > Date: Tue, 02 Aug 2011 22:20:11 GMT >> >> > > > Server: Google Frontend >> >> > > > Cache-Control: private >> >> > > > Content-Length: 487 >> >> > > > On the other side, I can submit a form response (via ODKCollect) which >> > > > includes Amharic script and these are displayed fine when I view the >> > > > form responses on ODKAggregate. So to me, it seems there is maybe an >> > > > issue with the encoding that the form uploader accepts? >> >> > > > I'd really like to get a fix for this, but not sure where to start >> > > > looking in the ODKAggregate code, if someone can give me some >> > > > pointers, am happy to see if I can figure out where the problem is. >> >> > > > Cheers, >> >> > > > Alex >> >> > > -- >> > > Post: opendatakit@googlegroups.com >> > > Unsubscribe: opendatakit+unsubscribe@googlegroups.com >> > > Options:http://groups.google.com/group/opendatakit?hl=en >> >> > -- >> > Mitch Sundt >> > Software Engineerhttp://www.OpenDataKit.org >> > University of Washington >> > mitchellsu...@gmail.com > > -- > Post: opendatakit@googlegroups.com > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > Options: http://groups.google.com/group/opendatakit?hl=en >

Thanks Yaw - if/when you get an updated version that you;d like
testing - then just let me know :slight_smile:

Cheers,
Alex

··· On Aug 16, 3:45 pm, Yaw Anokwa wrote: > alex, > > this is most certainly a collect bug. it's been filed athttp://code.google.com/p/opendatakit/issues/detail?id=303. i've got a > pretty good idea what the problem is so shouldn't be too hard to fix. > > yaw > > > > > > > > On Tue, Aug 16, 2011 at 01:18, Alex Little wrote: > > Hi, > > > I've just been testing with that latest version of ODKAggregate (v1.0 > > rev3 utf-8) and this certainly fixes the issue of being unable to > > download UTF-8 encoded forms from ODKAggregate running on GAE. > > > However, I'm still getting an issue when trying to display the forms > > on my ODKCollect client - the characters encoding only seems to be > > partially working. I can confirm the the form is downloaded and stored > > on the phone correctly (looking at the downloaded file in sdcard/odk/ > > forms - for info the form I'm using is:https://hew-datacollect.appspot.com/formXml?formId=Amharic_3). > > > When I go through the questions on the form - the questions with the > > Amharic characters do not display (just the 'missing character > > square'). *However*, if I save a form then return to it by going to > > 'continue saved forms' and then I can view the summary of the form > > questions and data entered so far (the page which asks if I want to go > > to the start or end of the form), then on this page the Amharic > > characters in the question will display fine, but not when I return to > > the proper data entry page. > > > So it seems that this is very nearly working! Possibly just a encoded > > setting needed on the question display page? I'll take a look at the > > ODKCollect code and see if I can figure anything out. > > > Cheers, > > Alex > > > On Aug 4, 11:28 am, Alex Little wrote: > >> Thanks Mitch, > > >> Seem to be a couple of reports about incorrect encoding (though these > >> may not be totally relevant):http://code.google.com/p/googleappengine/issues/detail?id=2749(seems > >> to be more related to python - but may shed some light on what the > >> issue is in java) > >> andhttp://code.google.com/p/googleappengine/issues/detail?id=4265 > > >> Will have a look further. > > >> Alex > > >> On Aug 3, 9:10 pm, Mitch Sundt wrote: > > >> > Hi Alex, > > >> > If you are seeing a difference between a Tomcat-hosted aggregate and a > >> > GAE-hosted aggregate, it is likely an issue with the GAE infrastructure. > >> > There might be an issue already opened against GAE on that; if you can find > >> > it, please update the ticket Yaw opened (http://code.google.com/p/opendatakit/issues/detail?id=285). I'll take a > >> > look at this later today. I definitely see the issue on GAE 1.0 alpha 3; I > >> > haven't confirmed the difference in behavior on a local instance; there > >> > might be an issue with the upload-form code and its handling of attachment > >> > character sets. > > >> > The downloaded form is retrieved and returned in the: > > >> > org.opendatakit.aggregate.servlet.FormXmlServlet > > >> > This same code is used for the human-readable display (http://opendatakit.appspot.com/www/formXml?readable=true&formId=Miramare) > >> > and the XML downloaded by ODK Collect (http://opendatakit.appspot.com/formXml?formId=Miramare) > > >> > The form upload page is rendered and handled by: > > >> > org.opendatakit.aggregate.servlet.FormUploadServlet > > >> > Mitch > > >> > On Tue, Aug 2, 2011 at 4:01 PM, Alex Little wrote: > >> > > As a follow up... if I put the form on my locally hosted ODKAggregate > >> > > server then it will download fine with the correct characters encoding > >> > > - so seems this may be an issue with ODKAggregate hosted on GAE? > > >> > > (both my locally hosted server and GAE versions have been installed in > >> > > the past couple of days using the same download of ODK Aggregate) > > >> > > Alex > > >> > > On Aug 3, 12:34 am, Alex Little wrote: > >> > > > Hi all, > > >> > > > I've been trying to create a form which includes Amharic (Ethiopian) > >> > > > characters. The uploaded form (as far as I can tell) is encoded > >> > > > correctly using UTF-8, at least, it displays the characters correctly > >> > > > when I open the file in my text editor (gedit on Ubuntu). > > >> > > > However after uploading the form if I view (on the Form XML viewer > >> > > > page) or download it via ODKAggregate it doesn't recognise the Amharic > >> > > > characters. You can see the form at: > >> > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test. > > >> > > > I saw another post regarding issues with cyrillic scripts (http:// > >> > > > groups.google.com/group/opendatakit/browse_thread/thread/ > >> > > > d5620877b4e9cf05/d727d6e415c5695f?lnk=gst&q=language > >> > > > +encoding#d727d6e415c5695f), so assume this is is the same issue. The > >> > > > html headers when I download the form show that utf-8 is set correctly > >> > > > and the xml header sets it to be utf-8 too ... for info the http > >> > > > headers when I download the form are: > > >> > > >https://hew-datacollect.appspot.com/formXml?formId=Amharic_test > > >> > > > GET /formXml?formId=Amharic_test HTTP/1.1 > > >> > > > Host: hew-datacollect.appspot.com > > >> > > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 > >> > > > Firefox/5.0 > > >> > > > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/ > >> > > > *;q=0.8 > > >> > > > Accept-Language: en-gb,en;q=0.5 > > >> > > > Accept-Encoding: gzip, deflate > > >> > > > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 > > >> > > > DNT: 1 > > >> > > > Connection: keep-alive > > >> > > > Cookie: JSESSIONID=3h591gjuahT7_6WeZqfSlQ > > >> > > > HTTP/1.1 200 OK > > >> > > > Content-Type: text/xml; charset=utf-8 > > >> > > > Content-Disposition: attachment; filename="TestingAmharic.xml"; > > >> > > > Content-Encoding: gzip > > >> > > > Date: Tue, 02 Aug 2011 22:20:11 GMT > > >> > > > Server: Google Frontend > > >> > > > Cache-Control: private > > >> > > > Content-Length: 487 > > >> > > > On the other side, I can submit a form response (via ODKCollect) which > >> > > > includes Amharic script and these are displayed fine when I view the > >> > > > form responses on ODKAggregate. So to me, it seems there is maybe an > >> > > > issue with the encoding that the form uploader accepts? > > >> > > > I'd really like to get a fix for this, but not sure where to start > >> > > > looking in the ODKAggregate code, if someone can give me some > >> > > > pointers, am happy to see if I can figure out where the problem is. > > >> > > > Cheers, > > >> > > > Alex > > >> > > -- > >> > > Post: opendatakit@googlegroups.com > >> > > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > >> > > Options:http://groups.google.com/group/opendatakit?hl=en > > >> > -- > >> > Mitch Sundt > >> > Software Engineerhttp://www.OpenDataKit.org > >> > University of Washington > >> > mitchellsu...@gmail.com > > > -- > > Post: opendatakit@googlegroups.com > > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > > Options:http://groups.google.com/group/opendatakit?hl=en