Cascading sequential numbering with ODK

htuser · May 30, 2025, 11:23pm

As requested by @LN , I'm opening a new thread following this one: Access to ODK Collect from XML data from other Android applications - #9 by LN

The problem
It is using ODK to solve a classic issue of Listing applications (usually used by among others, DHS, MICS, etc surveys). Listing are usually used to built sampling database. It consists in a multiple levels data collection, but a very simple application.
a) Cluster/First level (spatial unit);
b) Building/Second level;
c) Household/Third level; (statistical unit)
d) Individuals/Fourth level (statistical unit)
And because the listing database will be used to randomly select household and individuals, it require to have unique sequential number for all statistical and spatial unit.

Explanations

How/when are concessions and households identified?

a) Concession is the french (some african countries) name for building. They are self identified in the field (so called apartment building in North America);
b) Household are defined as is in statistical/survey methodology;

Does a single person do all work related to a single concession? A single household? Are all households in one concession identified/visited sequentially?

With CSPro, not necessarily since any enumerator can sync data with another one's. Supervisor can also re-assign tasks to any enumerator etc. CSPro logic offer almost unlimited flexibility in data synchronization and sharing between teams, automatic multi server synchronization etc. However, I would stay only on scenarios that ODK can offer.

I'm trying to understand whether or not repeats are appropriate in this context.

In CSPro, I designed such apps as:
a) A case (questionnaire)for one cluster;
b) A roster (or matrix) for a building (or Concession) record;
c) A roster (or matrix) for household record;
Sometimes the number of building (or Concession) is know before the fieldwork because of recent mapping. But, the number of household in a specific building (or Concession) is only know during the Listing fieldwork.
For the sampling purpose, all the numbers (Concession (or Building), household and individuals must be unique. This is what I helped to solve with CSPro Logic and/or SQL https://forum.getodk.org/uploads/short-url/s62lpSjmpbywpoUAVU9evNimUx6.pdf

If you need more explanations, please let me know.
Thanks in advance for your incredible support!

jniles · May 31, 2025, 2:34pm

Hi @htuser,

I'm not familiar with CSPro, but I can try to respond to the ODK and data modeling side of things. It sounds like you are doing a census of all individuals in all houses in all buildings of a cluster. Is that correct?

If so, your workflow will likely need to be different from other tools you may have experience with. ODK Collect's strength is collecting diverse sources of data on offline devices. This also means that it has the limitation that the data recorded is often not linked or referenced to other data collected previously on other devices or even the same device. We only recently got Entities to try to solve some of these shortcomings for longitudinal data collection, but I'm not experienced enough with them to say whether they might work for your use case.

Instead, I would think about this in two steps. ODK will generate a unique id (uuid) for every form submitted. If you chose to have the form be filled out at the household level (third level), then you might have a question that asks "how many people live here?", then use a repeat group to loop through each household member and fill out demographic data for them.

Once you have that data, you'll submit it to ODK Central, download as a CSV file, and then assign your numeric IDs per household and/or individual using Excel. This would also allow you to do data cleaning and quality checks (duplicate submissions for households, checks to see if the GPS coordinates are geographically distinct if applicable, completeness checks, etc).

This may or may not work for your workflow, depending on how quickly you need to have household numbers assigned to the household and how short the timeframe is for collecting data.

Finally, if you have a listing of the clusters and buildings, you can try to reduce errors in your data collection by pre-populating these fields in a cascading dropdown menu. I documented how I solved with DHIS2 data, but the process should be similar for any set of nested data represented in a CSV file.

Hope this helps!

htuser · June 16, 2025, 4:55pm

Sorry for delayed my response. Thanks a lot for your explanations. However, it will be better if you can post a small demo implementing my request with ODK.

seadowg · July 14, 2025, 12:09pm

This was super interesting! There was one part I didn't understand however:

In this case, the filename is health-facilities.csv (the .csv suffix is implied). This is a file we'll have to provide when we upload the form to ODK Central. I prefer to reuse a single file, since it is easier ensure consistency (i.e. so that I don't make a mistake) by having a single file than to have multiple files for provinces, health zones, health areas, etc.

It wasn't clear to me why having a single file makes it easier to ensure consistency here. Are you able to expand on that?