Setting up ODK fora household survey in urban India

Hi Implementers,

I documented our experience using ODK for our survey in southern
India. It's below, or here:

Hopefully these notes will help others looking to use ODK in thinking
through all of the steps.


··· ------------------

Setting up Open Data Kit (ODK) for use in a survey of households in
urban India_

By Emily Kumpel

For the University of California, Berkeley study "Impact Evaluation of
Intermittent versus Continuous Water Supply in Hubli-Dharwad, India"

We are conducting an impact evaluation of the health, economics, and
water quality effects on households of a shift from intermittent to
continuous water supply in a small town in south India. I wanted to
write a narrative of sorts about our experience using Open Data Kit
(ODK) for our study. It is a
great tool, and a lot of how we went about the process of incorporating
ODK into what is normally a paper-based survey system was discovered
through trial and error. I thought I'd outline the steps we went through
and some comments on what I feel we would've done differently if we were
to start over, if it helps others get through the process of using ODK
for a study.

We had heard about the idea of using mobile phones/PDAs for collecting
survey data from friends, who had experience with mostly commercial
software for similar household surveys. The most frequent advantage
cited was the instantaneous availability of data, allowing the research
teams to continuously monitor what was happening, fix data issues with
data collection while it was still occurring, and enabling presentation
of data very soon after completion of the study. After our experience
running significantly (by an order of magnitude) smaller-scale studies,
also in India, and having to deal with pile-ups of paper and data entry
woes, we decided to at least consider an electronic method.

  1. Design your study. I'm presuming this is all being done on a
    separate track from ODK. The relevant things you need to know about your
    study before deciding to use ODK: location/context (is there easily a
    source of power for charging the phones frequently enough? What are
    conditions for enumerators going to be like? Is there technical
    knowledge/ability among the field supervisor staff?), study size and
    length of time (relevant for hardware selection and budgeting).

  2. Decide whether ODK is for you. We looked into several other
    software/hardware options, including Windows Mobile, Nokia, and iPhone
    -- MobileActive's guide Comparing Mobile Data Collection Tools
    has a great listing of the options available (this didn't exist when we
    were making our decision). We downloaded samples, including ODK, to try
    it out, and looked at the following factors that were most relevant for us:

a. Budget -- this was our biggest consideration, since we are operating
on a very tight budget. The quick estimate we got for commercial
software (Windows Mobile, iPhone) was at least US$7,000. After pricing
out the costs of phones and software, ODK was a clear winner. After our
sample size doubled from 2000 to 4000 households a few months into setup
(after we had already decided to use ODK), we looked at our budget and
discovered that the cost of the hardware (Android phones) was less than
what the photocopying/data entry costs associated with a paper-based
system would have been.

b. Language support -- Our study is in the state of Karnataka, and our
survey is administered in the language Kannada. Currently, Android does
not support many complex language fonts, including various Indian
languages (see thread in ODK Implementers list
However, we're in an urban setting and found that most of or enumerator
staff would have at least basic knowledge of English letters. Also, many
people in India are familiar with transliteration -- turning Kannada
sounds into English letters -- as it's common in text messages and store
signs. We talked with many Kannadigas and decided transliteration of
Kannada would work for our enumerators (part of our interview process
for hiring involved reading a bit of transliterated Kannada).

c. Usability for research team -- I was the 'programmer' (by programmer,
I mean I used to change up the html on geocities pages back in '97, have
taken classes that involved FORTRAN and MATLAB, and am reasonably adept
at looking at existing code and using cut-copy-paste. While we were
still deciding on whether to use ODK, I tried out the demo forms, made
adjustments and used the basic features, loaded it onto a borrowed
Android phone, and set up Aggregate. I figured if I could get through
this in a day, combined with the helpful team on the ODK listservs and
some friends in Berkeley's CS department, I could manage setting up ODK.

d. Features -- Our most important requirement was skip logic -- in fact,
that was one of the main reasons we wanted electronic data collection.
Our survey is extremely complicated, as we're trying to document things
like the various sources of water that people use and many other
features of such a supply. This translates into >8 different options for
a source of water, with 4-20 questions after they say 'yes' to any one
of them. This level of complexity would have been impossible in paper
form, so the main feature requirements for us was skip logic and the
ability to input numbers/decimals/text; everything else (constraints,
GPS, etc) was icing.

  1. Make the basics work. Once we made the decision, I needed to teach
    myself how to use ODK, and also be able to let my other team members
    know what was and was not possible. To get set up with the basics, I set
    up a 10 question survey that used all of the types of questions we
    intended to use. The steps I went through:

a. Find an Android phone to borrow or use the emulator (I followed these
instructions to
the letter and it worked).

b. Download ODK Collect on the phone/emulated phone through the
marketplace or the website directly (often the marketplace didn't work
on my emulator, so I just navigated to the website on the phone's
browser and downloaded) using these instructions

c. Try out the sample forms on the ODK appspot server. Download them to
your device, fill them out, send them back to the server, view the data
on the server. Get a feel for how they work.

d. Set up ODK Aggregate (optional). This wasn't necessary, but I was in
the US when I got started on this and thought I'd be using it, so I set
it up and tested it., so had plentiful internet and was using Google
appspot for testing(we never even ended up using anything that involves
the internet for our actual study).

e. Download one of the sample forms. Upload it to your Appspot site by
using the 'Upload form' option. Change the server on your ODK Collect
(phone or emulator) to your own appspot site address (press 'menu' when
you're on the main screen), find your form, and bring it do your device.
Try out the form, send data back, view it on you appspot. Congrats! The
groundwork for everything you'll do is set.

f. Read up the Xform tutorial
and follow the examples. Try your own forms through the whole system.
There's now ODK Build, but I knew we'd be using very complicated logic
which is not supported in Build at this time, so I decided to code from

g. Edit your sample forms, incrementally. Since I didn't really know
what I was doing, I found it was best to change just one question at a
time, constraint, question type, etc, so when something went wrong
(which it, of course, did nearly every step of the way as I made mistake
after mistake), I knew exactly where to find the problem. Don't worry,
there will be lots of crashes and problems, but that's learning! Just
take it slowly and be patient. And use the Validator

h. Make a 10-question form that includes all of your survey question
types. I went through the pilot questionnaire (at this point, the final
one was not yet finished by my colleagues) and chose 10 questions that
incorporated all the types of questions and constraints I was likely to
have (multiple choice, text/integer/decimal entry, GPS, skip logics with
4 relevancies, various constraints etc.). I worked on getting a form to
work that incorporated all of these. From then on, I just copied and
pasted from this form to make the full survey.

  1. *Decide on your hardware. *I looked through the relevant question in
    the FAQs and
    also pricing and decided based on others' experiences with ODK in the
    field and cost to go with MyTouch phones. We ended up needing more
    phones when we were in India, so we got one of the 3 Android options
    readily available in our town -- the Sony Ericsson Xperia x8. Again,
    cost was our biggest concern. It's worked fine, though recently we're
    having some trouble with the GPS.

  2. Decide how data will be transferred between Androids and a
    This is dependent on the design of your field logistics. Are
    the enumerators bringing the phones back to a central point daily?
    Weekly? Will you have a computer at that point? How often do you want
    data transferred? Also, are there privacy concerns for your data? Who
    will be there when something goes wrong with whatever system you have
    set up?

a. No internet?: As a team we decided on logistics: that the phones
would be handed in by the enumerators each evening and given back each
morning. This meant that there was no reason for us to even have SIM
cards in the phones or use the internet for our transfer of data at all.

b. Aggregate: Aggregate is not (yet) secure on its own, so we would've
needed to add security to it ourselves and figure out how to put it on a
computer. I wasn't sure I was easily capable of this, so I'd need to
call on a friend to help. While possible, this was not ideal -- since I
was the one in the field, I wanted to be capable of handling any issue
that came up.

c. KoboPostProcessor: I discovered the best solution for us was
This takes the forms produced by ODK Collect and transcribes it into a
.csv file. I made the decision to use this and shelved it for a later to
actually set up and figure out (of course, it would be better to at
least have gone through the process of getting it to work a few times
before decided to use it!).

*6. *Convince your survey writers to give you the 'final final' survey
questionnaire. Our survey went through extensive 2-3 months piloting --
all in paper form -- before it even got to a 'final form'. Our survey is
extremely long, and divided into 10 sections, and some sections were
finished before others. I required at least 95% completion of sections
before I started programming, as I knew it would be a lot more work for
me to make changes later. It's tempting when things are not printed out
in paper to keep making changes until the last minute, but I think it's
important to set deadlines and constraints about how much can be changed
after a version is given to the 'programmer'.

*7. *Code your survey.

*a. *Decide how you'll handle language issues. Since the survey was
still undergoing piloting and the translation was not yet done, and I
don't speak Kannada, everything was coded initially in English. We
decided that our survey was incredibly long, and since others had
reported running into crashes with long surveys, we wanted to keep the
English and Kannada versions entirely separate. Also, we knew almost all
of our surveys would be in Kannada, so there likely wasn't going to be a
need for a separate English survey. ODK has great functionality for
changing languages within the form, but we didn't use any of that. I
completed everything in English and copied and pasted the Kannada
language in over the English later.

*b. **Keep things simple. *We wanted the screens to be as uncluttered as
possible, and the questions to be as simple and plain as possible. A lot
of this is survey craft, but relevant for ODK: **

  • i. *Question numbering. All of the questions in the paper survey
    were numbered, since its necessary for paper-based skip logic. However,
    we left the numbering out of the ODK form because there was really no
    need for it -- the skip logic would be taking care of itself, and it
    just added more text to the screen that there was no reason for the
    enumerators to need to see or keep track of. **

  • ii. *Groupings. While for ourselves, we had sections like 'Child
    Health' or 'Water Treatment,' and our first version of the ODKized
    survey had these as 'groups' that then showed at the top of the screen,
    we realized there was really no reason for the enumerators to need this
    information and it just added more things that cluttered up the screen.
    It was useful for us, early on in the programming and testing phase, but
    we got rid of it for the final form. **

  • iii. *Hints. Many of our questions had hints like 'Enter 99 if don't
    know' or 'DON'T read the options'. The first version was full of long,
    convoluted various forms of these, so we standardized the 4-5 basic
    messages that appeared throughout the survey and made them as short as
    possible. **

  • iv. **Constraint messages. *While it's great that you can put in
    specific constraint messages, we felt it flashed for too short of a time
    for our enumerators to read and process (especially considering they
    would have been in transliterated Kannada, which is slower for reading).
    We started out with specific constraint messages but ended up
    simplifying and just leaving the default there for everything except one
    question where we wanted to be specific. **

*c. **Annotate your survey document. *This is likely a personal
preference and I don't know if this will work for everyone, but it
certainly did for me. Our survey was being written and piloted by
printing it out in Word. I took the word document and started making my
own 'coding' annotations to it. In almost all questions, my colleagues
had put in the skip logic and constraints. Skip logic in a paper
questionnaire is generally backwards from the way ODK thinks about it.
In a paper questionnaire, the answer to question 4.1.1 might be: Yes (if
yes, skip to question 4.1.3). ODK works backwards, and instead you need
to append to question 4.1.3 that it is only a relevant question if
4.1.1=yes. Annotating in Word was necessary for me to keep all of this
straight. Before each question in the Word doc, I'd write (in a
different font) notes for myself: the name of the variable (CostWater),
the relevant parameters (int, constraint <10000; relevant if PayWater=1,
hint: '99 if don't know'), and the values that would be given for each
multi-choice. We also knew we'd be teaching the enumerators that any
'don't know' in a numerical input question would be '99', so I made sure
our constraints accounted for this. **

In retrospect, while annotating was a great idea, doing so in Word and
doing so while the survey was still being amended was not. It made
changes hard (if my colleagues wanted to make a change to a question,
they'd make it on their version, do 'track changes' and then I'd have to
go back to my version and make the same change; vice versa for when I
found mistakes in their skip logic, etc.). I couldn't annotate directly
to their document, since it was in a formatting useful for printing it
out daily and using it in the field for piloting. I think what I would
recommend is using either Excel or at least a table in Word to keep
track of it. I haven't thought this through, but in the next month I'll
be programming again an updated survey document for Round 2 and I'll
append this if I find a better system.

In the end, it was great we had this annotated document, and I made sure
that everything that ended up in the coding of the form was also
reflected in the document. This made pulling out the information to make
a codebook easy.**

*d. **Adjust all questions work with ODK. *Through the annotations
process, you'll find some questions whose format or skip logic that
won't really work with ODK. **

  • i. *Range answers. One that came up many times in our survey was
    'range' answers. For example, a question like 'how many days ago did you
    last collect borewell water' might have an answer of 3, 4, or 3-4 days.
    We would have to do this using a text input, so the enumerators would
    have to switch from the text to the numerical keyboard to type "3-4".
    Adding symbols complicates matters too, and since most of the answers to
    this question were not a range, we didn't want to complicate things by
    having them switch between keyboards. We came up with a few solutions:
  1. Turn it into a multi-select (with the values of 1, 2, 3, 4, etc.) and
    instruct the enumerators to select both '3' and '4' if the answer was
    3-4; 2) Make single-select 'buckets' like 'less than 2 days, 2-4 days'
    etc. or 3) Use a decimal input and instruct enumerators to use '3.5' in
    the event they reported 3-4 days. The solution we used varied depending
    on the question and the resolution of answers that we got. **
  • ii. Tables. One of our questions in the paper format involved a
    table, where the enumerators would fill in information about water
    containers, including sizes, materials, shape, etc. A table worked great
    on paper, but is hard to translate into ODK. I first started with a
    repeating group, where each of the questions (size, material, shape,
    etc) was asked in turn. This worked well until I discovered that
    KoboPostProcessor would not work with repeats (see my comment above that
    I should've tried it out much, much earlier). I then make it into a
    forced repeat (rather than using a loop, just asking the questions again
    and again). But since we were allowing for up to 15 different storage
    container combinations, this grew quickly into an extra 75 questions,
    which led to worries about crashes due to survey length (we were also
    using the previous version of ODK collect -- maybe the new one could
    handle this)? We ended up deciding that 1) we didn't need this
    information from all 4000 households; 2) it was much, much easier to
    fill this information out in table form; and 3) we were nervous about
    the enumerators holding the phones over the water storage containers as
    they measured the tops and sides of them. So we decided to have them
    fill out this information in a 1-page paper format at only their first
    house of the day. Some of our other info for houses is on paper (name
    and address information and consent scripts (an ethics committee
    protocol), sheets that household IDs are checked off of, etc.), so the
    enumerators were used to having some paper out during the survey, and we
    were set up for a small amount of data entry.

*e. *Work on small sections first. Luckily our survey was already
broken into smaller sections, so I coded forms individually by section.
It made it much easier to debug than a huge long form would have been.
If there were relevancies that depended on previous sections, I left a
commented note to do this when I integrated them. I also named each with
different groupings (Section 5, Section 6, etc) to make it easy to
navigate. For our complicated issues with relevancy and constraints, I
mostly borrowed from the great example of the icmi form (on the 'example
forms' page on the ODK site). Some of the things I tried worked, some
didn't; when I came upong something that I wasn't sure was possible, or
that left me feeling like it was a huge hassle/danger of leading to
fatal crashes, I asked my friends to change up the survey to accommodate
easier logic (adding a new question, breaking up one question into
several, etc).

*f. *Extensively test sections. Our survey includes complicated skip
logics. I enlisted the help of my colleagues and everyone else who was
around to test every possible permutation. When there were crashes
(there always were), I'd go to the code of the screen that it had
crashed near and try to find the problem.

*g. *Slowly integrate sectional forms together. I started to bring
sections into one 'master' form and test, tested extensively at each
step of the way. No, testing again and again is not fun, but every time
I'd get ambitious and confident, I'd end up with a bug I couldn't trace.

*h. *Get your translation right. This isn't an ODK issue, but we
wished we had had this advice beforehand to make sure we put the time in
our schedule. We sent our survey out for help with translation to some
students who had done some translation work for us in the past. We also
got it independently back-translated. When our field supervisor came on
board, he tore it apart and we went question by question with him making
sure the survey questions were asking what we wanted them to ask (a
translation for the answer 'buried' for the question 'where do you put
your child's feces' had been translated into the Kannada 'on your
ancestor's grave'). While sending it off and not being a part of the
process is a good first cut, make sure you account for the copious
one-on-one time needed with someone who speaks both languages and
understands what it is you're trying to get at with your questions.

*i. **Extensively test final form. *My colleagues made a flow chart of
the especially tricky skip logic sections and went through step by step
to make sure everything checked out with ODK. **

*8. **Set up data transfer. *We used KoboPostProcessor. First I hooked
up the phone, navigated to the ODK>forms folder on the phone, took the
forms to my computer in a folder I called "raw data". I then set the
KoboPostProcessor to Transcribe from "raw data" to "transcribed data"
folder. **

*a. *I didn't get the Sync feature of KoboPP to work, so a friend wrote
a [[simple python script]] which could pull data from multiple phones at
once to the computer. **

*b. *KoboPP does a great job pulling the data from the xml forms to a
.csv, however, for some reason it re-aligns all the columns in a strange
order. To alleviate this, I added numbers in front of all of the
variables, e.g. A01, A02, B01 etc (a fair amount of work, but I think
better to do after the form was programmed, in case there were changes
to the survey and re-numbering would have been worse). We would then
just re-sort the columns in Excel. **

*c. *We also found that KoboPP on my computer (Windows XP) would break
the answers of Multi-select questions into separate columns (with
associated '1's and '0's', our field supervisor's computer (Windows 7)
would not. My only guess is that this is a Windows issue, since we tried
it on several other computers and came up with the same results. We
ended up having to re-transcribed the first week's worth of data, since
we had switched computers that were doing this operation a week into the
survey. **

*d. *KoboPP doesn't work with repeats in the forms, so we rid of the few
of these we had. **

*9. *Prepare the phones. We went through the phones and put ODK
collect on all of them, changed settings (simple background, not
allowing a rotating screen, hiding all apps on the home page, putting
ODK Collect on every single page, airplane mode, etc). We then put our
forms on all of them by hooking them up to a computer and manually
putting the form into the ODK>forms folder. I've found that sometimes
the form doesn't entirely get onto the phone properly the first time --
I have no idea why, but sometimes, after putting a form onto the SD card
and trying to open the form in ODK, it would crash. So whenever we
loaded a new form to the SD cards of the phones, we'd run through each
survey at least once and save it to make sure it was working OK before
handing them to the enumerators.

*10. *Train the enumerators. Refer to Neil's excellent guide We followed
this exactly, though with the first few days set aside for teaching the
enumerators to read transcribed Kannada.

*11. *Set up a daily system. The procedure for the end of each day: 1)
connect 4 phones through USB hub; 2) test that they're all there using
the android debug bridge (adb); 3) run the script to pull the data; 4)
count that the number of forms in the folder matches what was expected
(he has a paper from the enumerator teams each day that states the
number of households each visited); 5) repeat for other 4 phones, and
check numbers; 6) transcribes data and re-sorts it; 7) check that
household IDs in the transcribed .csv match what was expected, and check
through data for errors; 8) charge phone batteries and delete data from
the phones; 9) password protect all data (complies with our CPHS

*12. **Bi-monthly. *Every 2 weeks the enumerator teams finish a 'ward'
(geographically specified area). At this time we go through the data to
check that things are making sense and have enumerators return to
households to fix issues (the wonderful thing about electronic data
collection!), check the accuracy of GPS (enumerators re-take coordinates
if accuracy is >20m), and make maps of coordinates to make sure they
stayed within boundaries (and re-do households if they haven't). **

Questions? Email me at ekumpel (at) berkeley (dot) edu

thanks emily! this really helps other implementers understand what it
takes to deploy odk.

i just blogged about this at and added
the link to the list of training materials at

as always, if there is something our team can help with. let us know!

··· On Tue, Feb 15, 2011 at 03:10, Emily Kumpel wrote: > Hi Implementers, > > I documented our experience using ODK for our survey in southern India. > It's below, or here: > > Hopefully these notes will help others looking to use ODK in thinking > through all of the steps. > > Emily

Great writeup!

Just wanted to mention that the XForm building guide has moved to:


··· On Tue, Feb 15, 2011 at 7:24 AM, Yaw Anokwa wrote: > thanks emily! this really helps other implementers understand what it > takes to deploy odk. > > i just blogged about this at and added > the link to the list of training materials at > > > as always, if there is something our team can help with. let us know! > > On Tue, Feb 15, 2011 at 03:10, Emily Kumpel wrote: >> Hi Implementers, >> >> I documented our experience using ODK for our survey in southern India. >> It's below, or here: >> >> Hopefully these notes will help others looking to use ODK in thinking >> through all of the steps. >> >> Emily > > -- > Post: > Unsubscribe: > Options: >