Dari and Pashto in ODK?

todd,

the official android rom can display most unicode character sets, but
there are two caveats. first, the font on the device must support
those characters, and second, the renderer must be able to draw
connected glyphs. the font is a relatively easy fix, but the renderer
is a long standing bug in android (see
http://code.google.com/p/android/issues/detail?id=5597). we build odk
for the official roms, so it has to be fixed there first.

that said, if you are using the open source cyanogenmod rom, they have
been making big steps forward on a lot of languages. i know they now
have a few of the complex languages working. for example, they have
arabic fonts, an arabic keyboard, and right to left support in most
applications. i don't think they have dari or pashto, but that is
something that the afghan community can contribute. once it's in the
rom, it's very easy for us to support it in odk. go to
http://forum.cyanogenmod.com/ to get started on that work.

as a stop gap, odk collect v1.1.5 supports using images, audio and
video as question text. if you can render your question as an image
and accompany it with an audio clip, then that gives you complete
language support without waiting for the android fixes. unfortunately,
this doesn't help with the keyboard problem, but we like to fight one
battle at a time...

hope that helps.

yaw

ยทยทยท On Mon, Nov 1, 2010 at 11:57, Todd Huffman wrote: > Hello Yaw > > I'm working on another ODK survey in Afghanistan, this one will have > some non-english readers. I'll need to have questions in Dari and > Pashto script. From what I can tell in the docs, ODK can display > questions in any character sets supported by unicode, is that correct? > > I'd also like the pop up keyboard to display the Dari or Pashto > alphabet, is this possible? > > The documentation around Android / ODK for these languages is sparse / > non-existent, so I am hoping you can clue me in to any major issues > and possible pathways before I start. > > Thanks, > > -.- .--- -.... .--- --.- --.- > > Todd Huffman > > HuffmanTM@gmail.com > Office: (765) 633-2691 > Twitter: @toddhuffman >

Hi Todd,

We're using ODK in India and had the same issue with Android being
unable to display Indian fonts.

We found that transliterating the language into English letters has been
working fine for our enumerator staff, even though a few of them don't
speak English. We had them read a transliterated Kannada question as
part of the interview process (though almost all struggled with it in
the interview context - we just wanted to see if they could at least
start to sound the words out). Then it took a few extra days of
training, some one-on-one help, and sending them home with the survey to
study, but they've all, two weeks later, become very proficient at
reading the transliterated questions. However, this assumes that they
have at least learned English letters at some point, even if it was many
years ago. If the enumerators can at least sound out a word or two at
the start and you have some extra days of training, it should be enough
for them to be able to pick it up.

Cheers,
Emily

ยทยทยท On 11/2/2010 1:11 AM, Yaw Anokwa wrote: > todd, > > the official android rom can display most unicode character sets, but > there are two caveats. first, the font on the device must support > those characters, and second, the renderer must be able to draw > connected glyphs. the font is a relatively easy fix, but the renderer > is a long standing bug in android (see > http://code.google.com/p/android/issues/detail?id=5597). we build odk > for the official roms, so it has to be fixed there first. > > that said, if you are using the open source cyanogenmod rom, they have > been making big steps forward on a lot of languages. i know they now > have a few of the complex languages working. for example, they have > arabic fonts, an arabic keyboard, and right to left support in most > applications. i don't think they have dari or pashto, but that is > something that the afghan community can contribute. once it's in the > rom, it's very easy for us to support it in odk. go to > http://forum.cyanogenmod.com/ to get started on that work. > > as a stop gap, odk collect v1.1.5 supports using images, audio and > video as question text. if you can render your question as an image > and accompany it with an audio clip, then that gives you complete > language support without waiting for the android fixes. unfortunately, > this doesn't help with the keyboard problem, but we like to fight one > battle at a time... > > hope that helps. > > yaw > > > On Mon, Nov 1, 2010 at 11:57, Todd Huffman wrote: >> Hello Yaw >> >> I'm working on another ODK survey in Afghanistan, this one will have >> some non-english readers. I'll need to have questions in Dari and >> Pashto script. From what I can tell in the docs, ODK can display >> questions in any character sets supported by unicode, is that correct? >> >> I'd also like the pop up keyboard to display the Dari or Pashto >> alphabet, is this possible? >> >> The documentation around Android / ODK for these languages is sparse / >> non-existent, so I am hoping you can clue me in to any major issues >> and possible pathways before I start. >> >> Thanks, >> >> -.- .--- -.... .--- --.- --.- >> >> Todd Huffman >> >> HuffmanTM@gmail.com >> Office: (765) 633-2691 >> Twitter: @toddhuffman >>

Thanks for the info Yaw,

Since this appears to be on a boundary of utility right now I'd be
willing to hire someone to help figure out exactly what the best
strategy is. If anyone is interested in doing some side work ping me
off-list.

Background: I used ODK on 10 phones during the recent Afghan
elections to bring in reports of election fraud, but only with English
speakers. I want to expand out to non-English speaking Afghans for
obvious reasons, but Dari / Pashto support isn't well documented /
supported. The phones I've been able to get locally in Kabul are the
sony ericsson xperia x10 mini pro (they're coming in through Dubai).

I'm working with several groups in Afghanistan interested in using
ODK, including Small World News, Free and Fair Election Foundation of
Afghanistan, and Pajhwok News, so better language support would help
out numerous projects.

Cheers,

-.- .--- -.... .--- --.- --.-

Todd Huffman

HuffmanTM@gmail.com
Office: (765) 633-2691
Twitter: @toddhuffman

ยทยทยท On Mon, Nov 1, 2010 at 2:41 PM, Yaw Anokwa wrote: > todd, > > the official android rom can display most unicode character sets, but > there are two caveats. first, the font on the device must support > those characters, and second, the renderer must be able to draw > connected glyphs. the font is a relatively easy fix, but the renderer > is a long standing bug in android (see > http://code.google.com/p/android/issues/detail?id=5597). we build odk > for the official roms, so it has to be fixed there first. > > that said, if you are using the open source cyanogenmod rom, they have > been making big steps forward on a lot of languages. i know they now > have a few of the complex languages working. for example, they have > arabic fonts, an arabic keyboard, and right to left support in most > applications. i don't think they have dari or pashto, but that is > something that the afghan community can contribute. once it's in the > rom, it's very easy for us to support it in odk. go to > http://forum.cyanogenmod.com/ to get started on that work. > > as a stop gap, odk collect v1.1.5 supports using images, audio and > video as question text. if you can render your question as an image > and accompany it with an audio clip, then that gives you complete > language support without waiting for the android fixes. unfortunately, > this doesn't help with the keyboard problem, but we like to fight one > battle at a time... > > hope that helps. > > yaw > > > On Mon, Nov 1, 2010 at 11:57, Todd Huffman wrote: >> Hello Yaw >> >> I'm working on another ODK survey in Afghanistan, this one will have >> some non-english readers. I'll need to have questions in Dari and >> Pashto script. From what I can tell in the docs, ODK can display >> questions in any character sets supported by unicode, is that correct? >> >> I'd also like the pop up keyboard to display the Dari or Pashto >> alphabet, is this possible? >> >> The documentation around Android / ODK for these languages is sparse / >> non-existent, so I am hoping you can clue me in to any major issues >> and possible pathways before I start. >> >> Thanks, >> >> -.- .--- -.... .--- --.- --.- >> >> Todd Huffman >> >> HuffmanTM@gmail.com >> Office: (765) 633-2691 >> Twitter: @toddhuffman >> >

hi,

Maybe a bit late, but I have something new to this topic. I've been
doing some tests to be able to use Farsi in ODK and yesterday I could
print some Farsi characters within ODK-Collect. See attached image.

The tools I used:

.- Cyanogenmod 4.2 installed in a Android HTC Dream (G1), running Android 1.6
.- ODK-Collect 1.1.4
.- Arabic & Farsi fonts manually installed to the device.
.- some farsi keyboard from the market (I was using this one:
https://market.android.com/details?id=kasiltech.keyboard but I guess
any would work)

I don't even know what I typed in the form, since I don't speak any
Farsi, but it comes from some specs which are supposed to be in Farsi
:wink:

Main problem, and this is a question for the list (although it's not
100% odk, sorry), was to convert characters to utf-8, since I used
native2ascii to convert them from farsi to utf-8. Odk (javarosa?) is
not able to read the produced output by native2ascii and I had to
manually convert it to some xml-readable format.. (e.g: from "\u067e"
to "ูพ" ).

it would be great if someone knows a better way to convert farsi to
some utf-8 xml readable format..

I'm aware I'm not using the latest Cyanogenmod, but I'm sure it has to
be possible with newer versions.

By the other hand, and according to google, arabic is supported now
(since 2 weeks ago):

http://developer.android.com/intl/de/sdk/android-2.3.3.html#locs

but apparently there are already some issues (and it's not farsi..):

https://code.google.com/p/android/issues/detail?id=14654

cheers,

pau.

image

ยทยทยท 2010/11/2 Todd Huffman : > Thanks for the info Yaw, > > Since this appears to be on a boundary of utility right now I'd be > willing to hire someone to help figure out exactly what the best > strategy is. If anyone is interested in doing some side work ping me > off-list. > > Background: I used ODK on 10 phones during the recent Afghan > elections to bring in reports of election fraud, but only with English > speakers. I want to expand out to non-English speaking Afghans for > obvious reasons, but Dari / Pashto support isn't well documented / > supported. The phones I've been able to get locally in Kabul are the > sony ericsson xperia x10 mini pro (they're coming in through Dubai). > > I'm working with several groups in Afghanistan interested in using > ODK, including Small World News, Free and Fair Election Foundation of > Afghanistan, and Pajhwok News, so better language support would help > out numerous projects. > > Cheers, > > -.- .--- -.... .--- --.- --.- > > Todd Huffman > > HuffmanTM@gmail.com > Office: (765) 633-2691 > Twitter: @toddhuffman > > > > > > On Mon, Nov 1, 2010 at 2:41 PM, Yaw Anokwa wrote: >> todd, >> >> the official android rom can display most unicode character sets, but >> there are two caveats. first, the font on the device must support >> those characters, and second, the renderer must be able to draw >> connected glyphs. the font is a relatively easy fix, but the renderer >> is a long standing bug in android (see >> http://code.google.com/p/android/issues/detail?id=5597). we build odk >> for the official roms, so it has to be fixed there first. >> >> that said, if you are using the open source cyanogenmod rom, they have >> been making big steps forward on a lot of languages. i know they now >> have a few of the complex languages working. for example, they have >> arabic fonts, an arabic keyboard, and right to left support in most >> applications. i don't think they have dari or pashto, but that is >> something that the afghan community can contribute. once it's in the >> rom, it's very easy for us to support it in odk. go to >> http://forum.cyanogenmod.com/ to get started on that work. >> >> as a stop gap, odk collect v1.1.5 supports using images, audio and >> video as question text. if you can render your question as an image >> and accompany it with an audio clip, then that gives you complete >> language support without waiting for the android fixes. unfortunately, >> this doesn't help with the keyboard problem, but we like to fight one >> battle at a time... >> >> hope that helps. >> >> yaw >> >> >> On Mon, Nov 1, 2010 at 11:57, Todd Huffman wrote: >>> Hello Yaw >>> >>> I'm working on another ODK survey in Afghanistan, this one will have >>> some non-english readers. I'll need to have questions in Dari and >>> Pashto script. From what I can tell in the docs, ODK can display >>> questions in any character sets supported by unicode, is that correct? >>> >>> I'd also like the pop up keyboard to display the Dari or Pashto >>> alphabet, is this possible? >>> >>> The documentation around Android / ODK for these languages is sparse / >>> non-existent, so I am hoping you can clue me in to any major issues >>> and possible pathways before I start. >>> >>> Thanks, >>> >>> -.- .--- -.... .--- --.- --.- >>> >>> Todd Huffman >>> >>> HuffmanTM@gmail.com >>> Office: (765) 633-2691 >>> Twitter: @toddhuffman >>> >> > > -- > Post: opendatakit@googlegroups.com > Unsubscribe: opendatakit+unsubscribe@googlegroups.com > Options: http://groups.google.com/group/opendatakit?hl=en >

hey pau,

the problem is that native2ascii doesn't provide xml-safe characters.
not much javarosa or odk can do about that.

http://rishida.net/tools/conversion/ is an online tool, but it should
give you the hex ncrs you need.

another option is to write a script that takes the native2ascii output
and makes it xml safe. i haven't thoroughly tested this, but it should
give you the general idea.

create ascii version of in.txt and write it to out.txt

native2ascii -encoding utf8 in.txt out.txt

replace \u with &#x. put ; after the four char code.

sed -i '' -e 's/\u/&#x/g' -e 's/(x[0-9a-z]{4})/\1;/g' out.txt

ยทยทยท On Fri, Feb 25, 2011 at 01:44, Pau Varela wrote: > Main problem, and this is a question for the list (although it's not > 100% odk, sorry), was to convert characters to utf-8, since I used > native2ascii to convert them from farsi to utf-8. Odk (javarosa?) is > not able to read the produced output by native2ascii and I had to > manually convert it to some xml-readable format.. (e.g: from "\u067e" > to "ูพ" )

thanks yaw,

I'll take a look to the conversion tool and the sed script you sent
me. The script option was always there but I wanted to be sure I
wasn't missing anything... now you verified me native2ascii is the
problem..

most important for me was to let the community know it's possible to
use some combination of ODK-Collect and Farsi characters, something
that, as far as I know, no one tested before..

best,

pau.

ยทยทยท 2011/2/25 Yaw Anokwa : > hey pau, > > the problem is that native2ascii doesn't provide xml-safe characters. > not much javarosa or odk can do about that. > > http://rishida.net/tools/conversion/ is an online tool, but it should > give you the hex ncrs you need. > > another option is to write a script that takes the native2ascii output > and makes it xml safe. i haven't thoroughly tested this, but it should > give you the general idea. > > # create ascii version of in.txt and write it to out.txt > native2ascii -encoding utf8 in.txt out.txt > > # replace \u with &#x. put ; after the four char code. > sed -i '' -e 's/\\u/\&#x/g' -e 's/\(x[0-9a-z]\{4\}\)/\1\;/g' out.txt > > > On Fri, Feb 25, 2011 at 01:44, Pau Varela wrote: >> Main problem, and this is a question for the list (although it's not >> 100% odk, sorry), was to convert characters to utf-8, since I used >> native2ascii to convert them from farsi to utf-8. Odk (javarosa?) is >> not able to read the produced output by native2ascii and I had to >> manually convert it to some xml-readable format.. (e.g: from "\u067e" >> to "ูพ" ) >

Hello,

I am working on Dari and Pashto languages translations. I was sick for some time and was not able to continue this excellent task. I hope to get back well soon and start translation of the remaining part of it.

Thank you,
Akbarzai

1 Like