thanks yaw,
I'll take a look to the conversion tool and the sed script you sent
me. The script option was always there but I wanted to be sure I
wasn't missing anything... now you verified me native2ascii is the
problem..
most important for me was to let the community know it's possible to
use some combination of ODK-Collect and Farsi characters, something
that, as far as I know, no one tested before..
best,
pau.
···
2011/2/25 Yaw Anokwa :
> hey pau,
>
> the problem is that native2ascii doesn't provide xml-safe characters.
> not much javarosa or odk can do about that.
>
> http://rishida.net/tools/conversion/ is an online tool, but it should
> give you the hex ncrs you need.
>
> another option is to write a script that takes the native2ascii output
> and makes it xml safe. i haven't thoroughly tested this, but it should
> give you the general idea.
>
> # create ascii version of in.txt and write it to out.txt
> native2ascii -encoding utf8 in.txt out.txt
>
> # replace \u with &#x. put ; after the four char code.
> sed -i '' -e 's/\\u/\&#x/g' -e 's/\(x[0-9a-z]\{4\}\)/\1\;/g' out.txt
>
>
> On Fri, Feb 25, 2011 at 01:44, Pau Varela wrote:
>> Main problem, and this is a question for the list (although it's not
>> 100% odk, sorry), was to convert characters to utf-8, since I used
>> native2ascii to convert them from farsi to utf-8. Odk (javarosa?) is
>> not able to read the produced output by native2ascii and I had to
>> manually convert it to some xml-readable format.. (e.g: from "\u067e"
>> to "پ" )
>