Early on, when the internet came to your house on an AOL CD through snail
mail, when the Spam Wars raged in every inbox, when dragging your 11 pound
laptop into a Guatemalan cyber cafe and asking to plug a Cat5 cable into
the network hub was both bizarre and obscene, I tried to change the
nomenclature of the day. You may recall in the mid 90s, that no one quite
knew what to call this thing. Luddite news anchors would snark through
reports of new software, websites, scams and schemes of the nascent network
now ubiquitously called "The Internet". They called it "The Web, The Net,
The InterWeb, The World Wide Web, the WebNet" and many other abbreviations
and portmanteaus. As an early adopter, I thought it would be great to have
a cool way to talk about sending a message over the WebNet, and I didn't
like the word "E-mail". So, I started asking people to "Zap me with a
"Here's my address, firstname.lastname@example.org," I would say, "Zap me with a
Zoltar tomorrow, and we'll talk." This was still at the time when people
would ask you if there were any spaces in your email address, or which
letters were capitalized. They would tell you to go to a web site at
telling your the address. At the time, everything was so up in the air and
so dumb that calling emails *Zoltars *didn't seem all that crazy. Now, it
sounds stupid, I know, but there are still a few choice friends and family
members who will humor me to this day by zapping my inbox with a friendly
My point here is that not all ideas are good ideas, though they may sound
good at the time, it's the spin cycle of heavy use that centrifuges out the
fluff and nonsense and leaves you with something you can use and depend on.
If you looked at all of the ODK users, I'm sure you would see a spectrum,
some of whom are early adopters, many of whom just want something that
works. Me, I am a reckless early adopter, first in line for whatever
personal jet-pack, sub-dermal cerebral stimulator, or gadget that just
rolled off of the bespoke Akihabara electronics production line. And so it
was that I produced my first digital data collection system in 2006, to
support research on war refugees in Uganda
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1448368. This thing
had "early adopter" written all over it. We used smart phones that probably
cost $800 a piece once we had jacked them up with super size batteries and
memory cards, they ran WindowsME (?!?!), and i built the survey in a sort
of chopped down version of Visual Basic called Visual CE
(I'm coming around to talking about ODK pretty soon here, so stick with me)
This Uganda survey was a monster, on paper it was 25 pages long and had
it's share of skip logic and data constraints. The software I used to build
it had a sort of weird conceit where every element you added (a label, a
list of options, a text entry box) was given an X,Y coordinate. That is,
every single item had to be plotted on a Cartesian plane. So, I had to
imagine the screen dimensions, place the elements for each question in that
"screen", and then when the user had completed the question, move their
view to the coordinates of the next "screen". As I added questions, the
Cartesian plane grew and grew and grew. In the end, the entire survey
covered a virtual surface that was as large as the side of a barn. It
pushed the hard limit of 32,000 pixels maximum dimension. It was as if the
phone was a little window that the user looked through, and as they clicked
buttons, I would slide this barn sized survey around behind the window.
When you wanted to edit the thing, you had to drag around the Cartesian
plane until you found the one label or drop-down list you needed to change.
I still have nightmares about this thing.
Despite all of the trouble, the survey turned out great. We could query the
database after every data sync, and look for errors that could be fixed
before data collection was complete. This, at a time when people doing
research in developing countries were shipping hundreds of pounds of paper
back to their universities where graduate students chained in basements
performed double data entry for months. Principal Investigators would get
their first look at data collected over the summer while leaves turned
orange in the crisp Autumn air.
This proof of concept was an incredible risk, an expensive and difficult
gamble that only in hindsight could be shown to save a ton of money and
time while providing faster and higher quality data. Way to go, Us! This
was when I worked at Tulane with Phuong Pham and Patrick Vinck who now run
the KoBo Project http://www.kobotoolbox.org/ at the Harvard Humanitarian
Initiative http://hhi.harvard.edu/, and you have to give them credit for
foresight and stone cold nerve in taking the risk to prove the point that
paper was out-dated, inaccurate, and expensive.
The result was a bit of funding to really get going with digital data
collection, and to come up with a better system than the pixel counting
horror movie that I first used. I was charged with figuring it out, and my
first thought was that it had to be better and easier to work with. It had
to be open source, and it had to be adaptable to a variety of surveys. Some
poking around led me to Open Data Kit, and a few conversations with Yaw
sealed the deal. I wrote my next survey in XML, again for Uganda, and it
was many times easier. Even then, this was a risk. There weren't a ton of
people using ODK, but it had a University program behind it, so there
wasn't a danger of the old bait-and-switch to having to pay for it (looking
at you, Magpi), if you wanted to add something to it you could edit the
code, and it had a charismatic leader in Gaetano Borriello.
I met Gaetano when we were both on a panel at a the "Soul of the New
Machine: Human Rights, Technology & New Media" conference at UC Berkeley.
He embodied a virtue that I aspire to, Problem Solving Through Risk. As
previously mentioned in this discussion series, it was quite risky to
imagine that smartphones were the best solution to data collection in
developing countries at a time when only nerds and trust fundees had
smartphones, and everyone else thought that we should dumb down to collect
data on those little flip-phones. Gaetano knew that you aimed up and better
technology, not down at cheaper tech. I begged him to let me present
first, so that I could be the opening act instead of having to follow him
and the other luminaries on the panel. Afterwords, we had this deep
troubleshooting conversation in which I presented a haunting ODK related
issue. He leaned in to it the same way I lean into a seemingly unsolvable
problem, joyfully grinding through the symptoms, ruling out unknowable
variables, and finally concluding it could only be a simple but unavoidable
hardware issue persistent to that brand of phone. Few people can have this
kind of conversation, most people see a technical issue as a problem to
sweat through, not an opportunity to make things better.
Now, when you take up ODK Collect to do your data collection, it's as
smooth as silk. Their is a huge community asking and answering questions,
and a ton of ancillary support mechanisms like KoboForm, ODK Aggregate, and
custom versions of Collect. My point is, because of people like Gaetano,
the road is smooth enough that data collection is democratized and open to
people who don't want to edit 5000 lines of XML by hand, they can just jump
in and get to work.
There are, however, always those persons who seek the limits of the field.
They will ask "Can it do this?" and "Can it do that?" until someone says
No. There was a lot of that early on in ODK and a lot of features were
added and extra tools developed to accommodate the needs of researchers.
Now, ODK Collect can do almost everything you can image.
But what if you want to query a database in the middle of a survey to
populate a list of choices? What if you want to modify the layout and look
of the question screen? What if you want to collect longitudinal data,
linking data collected last year with data you collect this year? ODK
Collect stores all of its collected data in discrete XML files, you can't
query them or do anything really clever. There is no real database behind
ODK Collect while you are working on the phone. If you want to take things
to the next level, Open Data Kit has your back! It's time to look at ODK
I've been working with ODK 2.0 https://opendatakit.org/use/2_0_tools/ for
a year now, and it has that seat of your pants feeling of experimental
excitement that can only come with Problem Solving Through Risk that you
can only get trying something new. I'm happy to say that the results of
using this in a very large and demanding field survey are very positive,
and I hope I can encourage more people to pick it up, and even to
contribute to its further development.
In brief, an NGO called PATH http://www.path.org/has a Malaria
elimination program http://sites.path.org/macepa/ whose data requirements
exceed the capabilities of ODK Collect. They decided, in a move whose sheer
nerve I'm not sure if everyone fully understood, decided to go with data
collection using ODK 2.0. They needed to be able to do things like query a
CSV file mid-survey, and they needed to record a ton of data about every
member of a household, and then later on do things like populate a list of
choices with "All Female members of the household between the ages of 12
and 49 who tested Negative for Malaria". This kind of complex work
requires a database behind the survey, and the flexibility to push the
boundaries of the survey's capabilities.
Not only did PATH decide to go with ODK 2.0, they went so far as to develop
a super cool front end on top of ODK Survey that adds capabilities for
epidemiological sampling and navigation. You can even try it out yourself,
it's in the google Play store (Yay, Open Source!) and it's called Episample
(The developer is a wonderful Ethiopian gentleman named Belendia Serda who
you should all know as one of the single most talented ODK developers out
Taking the risk to develop custom software on top of ODK 2.0, and to deploy
it in the field in something as critical as a mass drug trial in a public
health program shows the kind of forward thinking and enthusiastic adoption
of open-source technology that has made ODK great in the last ten years.
The results have been very positive, and the data collected is able to
demonstrate the clear success of the malaria elimination program.
Now, I have a lot to say about the actual implementation, the pitfalls,
late night struggles, hilarious gaffs and mistakes I made and fixed in the
service of bringing an ODK 2.0 project to shimmering life against
incredible odds with all the chips on the table at the 11th hour and
without a net, but I may have written more than I intended about the
philosophical idea behind such a scheme, and I think I might be just as
well off to leave it open to questions from the peanut gallery. Since this
is a discussion series, let's discuss ODK 2.0:
- How do you decide if you should stick with ODK Collect, or go on to
- What are the best features of ODK 2.0 tools?
- What areas of development are most needed for wider adoption?
I would love to hear your questions and thoughts, so please Zap me with a
Zoltar and I'll answer with more directness and brevity than I have brought
to bear on my opening comments.
Best from Washington DC,