Enable Case Management/Preloading

Hello All :slight_smile:
we are doing case Management in a complicated way because pulldata is always calles after data was changed.

some ideas to solve it:

  • allow variables in "Default" field, so you can pulldata to variable and not in the field itself
  • allow a if without an else or an else that is linked to the actual field.
  • Switch off the second call of calculation field with a command in calculation

What do you think about this?

Hello ODK Team,
just wondering if you think it's possible to implement...

Especially the If-Condition without an else.
So Case Management could be done by:

  • preloading the Data in the Lable field
  • if the text-input field is empty (length==0) -> pulldata to the text-field, else without a command/no calculation done

This way data would only be pulled if it doesn't change. This would be a great feature that yould enable the Case-Management stuff.

Hi @Mic, I'm afraid I don't fully understand your specific technical suggestions. Case management can mean different things to different people and it would be great to get some specific examples from you and others of what you are trying to do. Perhaps we can start by putting the "how" aside and focus on what you could do in an ideal world.

In particular, it would be helpful to know:

  • the domain of the data you are collecting (health, tree conservation, ...)
  • do you need to fill the same form about the same entity (person, tree, etc) or different forms?
  • what is the workflow you use to assign specific forms or entities to enumerators?
  • at what frequency is new data collected?
  • does more than one enumerator collect information about the same entity?

If you can attach a form demonstrating the complicated strategy you are currently using, that would be very helpful. For anyone else interested in some kind of more refined preloading, it would be really helpful to get your use cases and answers to the questions above as well.

cc @Vanubhav @Snvssh4a2017 because I think Automatic default selection is similar

1 Like

Hello Hélène,
Sorry for late Reply, just returned from my summer holiday.
I implemented ODK for a NGO that is supporting refugees in Lebanon with various activities like School-Bus for children, Teaching in Embedded community Centers, aid Distribution and also case Management for families that Need Special Support because of medical, education, financial or whatever reason.

We have 4 productive devices and 1 for testing/development... so it´s a small scale so far. Aggregate runs over a Google Appspot (GSuite... the professional account that ensures privacy) and we use a Device Management. We have 2 different GSuite Accounts, one with very limeted Access for Administration and Storage of private data as FusionTable in the Google Drive. The Devices are registered on another account where only the required data for preloading (.csv Files) is stored on the drive as anonymized as possible.

There are 3 forms productive so far:

  • Event Reporting: if there is a Distribution, an education Event or so we make a GPS-Pin, Description, Sponsor if somebody donated for this, type of Event, Distribution to a case is also an Event. We preload some Event Types and other Select fields fo make it configurable. And we preload a list of Cases and Settlements. That works well because the preloaded values don´t Need to be modified.

  • Settlement Assessment: We go through the Settlement, collect names of the Families, Needs, Infrastructure like if they have a toilet, We can select which aid they Need and which Projects are possible in this Settlement. We Preload Settlement "Names" (P-Codes defined by UNHCR) to have data consistency and some select-fields. We do not preload personal data as this changes too often so we make an assessment always from a more or less empty form. This is working well.

  • Case Assessment: This consists of questions in the fields: Health, WASH (Hygiene), Shelter.... to calculate a vulnerability score and to coordinate which aid this case needs. First this is done on an empty form, then over the time we Support this case (Family) we Need to do at least 2 Follow-Ups where we go through the questions and modify if needed. At least on a monthly basis. And for this we Need a preload that can be modified.
    We want to use the same form for "new cases" and "Follow-Up" as it is the same data and shouls be stored at the same place. so it would be the same form. Steps to solve the case like distributions or medical Support will be tracked with the "Event Report" form, we can link it afterwards with FusionTables.

Now the Problem with the pulldata in "calculate" is that it´s called twice, fist when opening the form which is absolutely fine and second when the form is finished. with the second call it overwrites the edited values.
In an ideal world we could call if-functions and pulldata in the "Default" value field in the XLS-Form. Another Option would be to have a flag that disables the second call of the "calculate" field when finished.

Still the data can be locally stored and manually updated, so it does not Need a direct Connection to the Database.

Puh, i hope this made more clearly what is needed, thanks for reading and thinking!

1 Like

This has been a long time coming, but the ODK 1 TSC has started work on specifying case management!

@adam.butler presented his vision for this feature in this slide deck and over the next few weeks, he'll be writing up a spec that will guide at the implementation.


@aurdipas and I have just been emailing about some of the details of the process, especially de-duping entities, and I figured it would be good to continue the discussion here and get some more ideas and contributions.

A quick summary of the slides:

  • "case management" is taken to mean (a) defining entities and then (b) making multiple temporally distinct reports on those entities
  • this is implemented using two forms, Form A which stores responses as entities, and Form B which requires that an entity is selected from a list before it can be filled; the response to Form B includes the UUID of the relevant entity

@aurdipas had two good questions:

  1. How do you transfer the already existing entity to the device?
  2. How you can avoid that the same entity is not captured on a second device (duplication)?

These are the answers that I gave, but I'd love to hear peoples' thoughts:

  1. I think we would use the kind of CSV preloading that is already available for options. It would probably also make sense to extend this so that is uses the mechanism as the recent form update notifications, so that there is a reasonable guarantee that devices have the complete entity list.

  2. The auto-updates would go some way to resolving the duplication issue, but is obviously not a satisfactory solution. Probably it would make sense to build some duplication detection and resolution into ODK Central. Ideally, it would only possible to do data collection on entities that have come from Central, so that they will always have to go through this de-duping, but this is obviously not acceptable if I want to register a patient and then make a case report on them in a totally offline setting. I could see a possible solution using a kind of tombstone for de-duped entities, so that a process might look like this:

  • while offline, I register patient dd6c32a4 using Form A
  • dd6c32a4 is now marked as "pending" on my device, which means I can submit case reports against it, but it's not on Central
  • I then do a case report on dd6c32a4 using Form B
  • when eventually online, I submit both to Central
  • it turns out that patient dd6c32a4 is an exact duplicate of an existing patient, 19f44a40, who already has case reports
  • (more details about how exactly de-duping works here)
  • my case report is switched to refer to the existing patient, 19f44a40
  • patient dd6c32a4 is replaced in Central with a tombstone that refers to 19f44a40
  • all incoming case reports for dd6c32a4 will be switched to refer to 19f44a40
  • once my device has updated its entity list, I will no longer be able to make a case report against dd6c32a4

For the specifics of the de-duping process, I would probably use a combination of approaches. First you need to find possible matches, probably using an n-gram algorithm (or possible Levenshtein distances) on identifying fields such as name, village, etc. This is then combined with matches on other fields (e.g. date of birth or geopoint) to calculate a similarity score. You can then figure out values and say something like "if it's over 95%, just merge them automatically" and "if it's over 80%, flag them as probable dupes", and provide a simple interface that displays the data with yes/no buttons. I've done something like this for de-duping patient lists in DRC and it worked pretty well.

Another thing that @aurdipas suggested is that you could check through a list of entities before registering a new one with Form A, to make sure that the person/village/tree you're about to register doesn't already exist in the database, which is a good idea.

Any thoughts?


Our idea to contribute to the development and implementation of the module
is that in the ODK COLLECT a POST request is added to the Aggregate server
by means of which the consultation methods are defined (Code QR, Code Barras and by ID)
and when selecting any of them, this allows access to the database and
receive the information that you want to consult the module, then select the form
new that has relationship and that this is autocomplete with the information required in this form B

1 Like

Thanks for this suggestion @Controller_Cercafe!

So just to make sure I've understood you correctly, you're proposing that instead of preloading lists of entities (patients, villages, trees, etc.), we add an endpoint to Aggregate/Central that takes an ID (possibly encoded as QR/bar code) and returns an entity, which can then be directly used in Form B?

This is a good idea, and would solve the problem of having a potentially very large number of entities stored on the device. OTOH it would require that the device is online, which is not always the case. I would propose that we add this in a second phase, once the basic functionality is working - how does that sound?

It reads very promising, in essence if it is what is required, the only "problem" we see is that the internet would be necessary to do it, and the offline one is required for the task where it will be implemented ...

On the other hand, I do not know what you mean by "second phase"

We remain attentive, greetings

I mean that it would be in the "1.1" version of this feature, rather than the "1.0".

I've written up a more detailed (although still incomplete) spec here: https://github.com/opendatakit/roadmap/issues/23

@TAB sorry this is a bit last minute, but it would be helpful to read it through before today's call if you have time

1 Like

Hi guys how are you going ... I would like to know what has happened with what was raised in this discussion, if there is any progress, if we can help something

We want to move forward, we want to help, we want to proceed, we need directions to do it

Stay tuned


Hi all,

this is related to a use case we have:

  • Turtle nesting beaches are surveyed, turtle tracks and nests are recorded by multiple teams using multiple devices. The nesting beaches are in several geographically separated locations.
  • Turtle nests will eventually hatch. We want to follow up and re-visit some of the nests.
  • Nests are marked with a stake carrying a unique ID. The ID is recorded every time we encounter the nest. The data warehouse ingesting the data from ODK Aggregate can filter nests by ID, so we get a full life history for each nest.
  • A user visiting a beach will want to know all recorded turtle nests for only that beach (there are thousands of records on other beaches, possibly too many for one device to download/store).
    The users only want to navigate back to geolocations of existing nests, or see which existing nests are near - their own location.
  • As each encounter with a nest, even if already recorded earlier, is a new record (ODK form), the users want to view, but don't need to edit, the existing records. Viewing existing records doesn't even have to occur in ODK Collect.
  • We would like to have an offline-capable, following mapping app showing some background rasters (aerial image), some user-supplied vectors (e.g. place markers, administrative boundaries, place names etc), and all previously recorded nests on a given location.

A possible data flow could look like this:

  • All data collection devices upload to one ODK Aggregate server. As we want to access data from multiple devices, ODK-A is the first data container which holds all the required data.
  • There is one form for turtle track/nest encounters.
  • There should be one export of that form per location (filtered to that location) into a format like KML/GeoJSON.
  • Users should have an offline-capable mapping app on their data collection devices (7" tablets or larger).
  • There should be a user-friendly way for users to sync data from ODK Aggregate and the other background data to their devices.

I have built a proof of concept using the offline-capable mapping app MapIt and a data warehouse (which ingests ODK Aggregate and offers an API), docs are here. However, this process involves opening a bookmarked API URL behind basicauth (returning GeoJSON), saving the resulting JSON to a local directory, and deleting/recreating a layer in MapIt.

A similar use case is asset management - a ranger visits existing inventory (benches, bbqs, toilets, shelters, displays and signs) in a national park to record its presence, any maintenance needed (which again causes a follow-up visit by a ranger with a paint bucket searching the park bench that needs painting), and scanning the bar code on the asset label.

1 Like

This feature is needed so much especially by research organizations:
Case in point:

  1. You go out in the field as a group to recruit patients for some health conditions and upload this data
  2. Some turn out to be eligible based on inclusion criteria and thus require follow-up at a later date
  3. Each member of the team needs to access this collected data later to fill out a followup and linking should be automated such that no one needs to key in the linking key manually.

Thanks in advance.

1 Like


This feature is extremely valuable for us

I work in an Latin-American ONG called TECHO (currently in 19 countries with a team of 5000 people), we work in slums with volunteers, our main program it's the construction of emergency housing.

In order to do that, we survey families asking about their housing and living situation. We usually need at least two surveys for each family to understand if our program would have a positive impact for that case.

We usually have many more families applying that we could actually help, so we have to decide which families we are going to work with.

This is what we wish we could do:

  • Before going to a community: a volunteer does pre-field work selecting families that need to have their situation updated (maybe the previous survey was incomplete, has risen doubts or has been done a long time ago). Downloads all this data to the mobile device.

  • In the community, before talking to the family. The volunteer searches the previous survey (offline), reads it and pre-loads a new survey response with that previous data.

  • Once with the family, in a conversation, all the necessary information is asked again to update or clarify. When all the information is updated the new response is sent.

  • Back from the community: the volunteer should be able to see the updated data for those families. (Extra: can also see previous versions of that survey to understand how it has developed over time).

With the all the families situation updated, we select families to work with for the next construction event.

We've been using kobotoolbox for collecting data. Not having a feature for managing cases have been a pain point to manage all this information. This feature would allow us to liberate our volunteers of typing data from paper surveys and crunching data in multiple google docs, and freeing them to focus on the social work.

Context: Constructions are big events done as a weekend activity (5 times a year) were a massive amount of volunteers go to a community and join a beneficiary family to help them build their emergency housing donated by TECHO.


Hi @Marcos_Wolff, and welcome to the ODK forum!

Thank you for describing your process in such detail - this is very useful for us in putting together a specification that that will benefit as many ODK users as possible. It's particularly interesting for me to see that your enumerators need to see previous responses - this is a feature that we had decided not to include in the first iteration, but based on your input it looks like we should revisit that decision.

I'm in the process of writing a full spec for this feature, and once that's done we should be in a position to start implementing it, but please be aware that this will take some time - I wouldn't expect anything to be available before next year, so if you want this feature before then, you may have to investigate alternatives and/or workarounds in the meantime.


Hi @adam.butler. Thanks so much for the welcoming message and the quick response!

I wanted to include a thread that we started in kobotoolbox forum a while ago in which I propose this as a use case and @Tino_Kreutzer describes accurately a step by step suggested workflow to clarify. He also mentions this feature as instrumental.

TECHO would love to contribute in the process of that spec, let me know if we could be useful somehow. Maybe describing in greater detail, reviewing/contributing drafts or coordinating interviews with the volunteers working in the field to ask questions/test hypothesis.


Just FYI I had to implement something along these lines for our XForms solution (they're called 're-inspections'), so I'm very interested in what we might come up with to more formally define a specification around the usecase.