Generate sequential ID based on pulled data

1. What is the issue? Please be detailed.
We do annual census in which we collect data from individuals that were in in the previous census, here we pull data from the previous census and validate the participant id (PTID), if an individual is new in the household e.g immigrant or new born we add them to the current census and create for them a new PTID.
For example, if a household was censused in village 6 and household 205 and the last created PTID for another individual in that household was 6205045, the new PTID number in this household will be 6205046 for this new individual in this census. ODK must search from the pulled data (csv) check the previous number assigned in that village(6) and that household (205) and assign the next sequential number
Any help will be much appreciated

2. What steps can we take to reproduce this issue?

3. What have you tried to fix the issue?

4. Upload any forms or screenshots you can share publicly below.

Prepare another external dataset in which you only store maximum number value of the PTID for each household. When you want to add new member in household you use this value, increment it by 1, and then use it to assign as PTID to new individual member of the HH.

You also need to keep updating this external dataset based on new daily updates in the main submission dataset.

Thank you @joybindroo , could you be having a sample xls

You can take similar steps for achieving the functionality

  • Prepare the External Dataset:
    Create a CSV file containing only the household number and the highest PTID for each household. This will allow the form to fetch the maximum PTID for the household and assign the next sequential number for new members.

    Example external CSV dataset (ptid_max.csv):

    household, max_ptid
    6205, 6205045
    6206, 6206049
    7101, 7101020
    

    In this CSV file:

    • The household column represents the household number (which includes your village id too).
    • The max_ptid column stores the last PTID in that household so far.

  • Update the External Dataset:
    Regularly update this dataset with new PTIDs after daily census operations or at the end of each census batch. This ensures that newly added members get the correct next PTID.

  • Modify Your XLSForm:
    Use the pulldata() function to fetch the maximum PTID for each household from the external CSV and create logic to assign the next PTID to new members.

    Key elements to include in your XLSForm:

    • Use the pulldata() function to fetch the max_ptid for the household from the external dataset based on the household ID (${HHID}).
    • Use a simple calculation to increment the PTID by 1 for the new household member.

    Example XLSForm rows:
    I assume HHID to be the variable holding HH number, your scenario may differ - so adjust accordingly.

    Survey sheet:

    type name label calculation relevant
    calculate max_ptid pulldata('ptid_max', 'max_ptid', 'household', ${HHID})
    • The pulldata() function pulls the maximum PTID for the entered household (${HHID}) from the external CSV dataset.
      Then create a new calculated new_ptid field , that adds 1 to the max_ptid to generate the next PTID for the new member.

  • Handle New Households:
    In case the household has no previous PTID (e.g., for a new household or missing data), the pulldata() function will return an empty value. You should set a default starting value for such cases. This can be based on the household ID, like starting from <HHID> + 001.

    Example formula for new households:

    if(${max_ptid} != '', ${max_ptid} + 1, concat(${HHID}, '001'))
    

    This formula will assign the next sequential PTID if max_ptid exists, or create a new PTID starting with the household ID for completely new households.


  • Deploy and Test:
    After designing the form, test it using mock data to ensure that new members receive the correct PTIDs.

It is also crucial to ensure that the external dataset is being updated accurately for future rounds of the census.


Thank you, this is really helpful

Instead of using manual updates, consider using Entities which let you share data between forms. You can think of an Entity List as a CSV attachment that's shared between forms and can be updated by form submissions.

To generate sequential ids, you could use a household Entity List that keeps track of the highest participant ID assigned within that household as @joybindroo suggests. Submissions can then automatically update that highest ID. You can use pulldata in the same way as described.

Another option would be to use a participant Entity List that keeps track of each participant, the household that they're in, and their number in the household. You can then use the max function to get the highest participant id used so far and use it as the basis for the next participant id. This would be a good approach if you also want to use participant information such as their name in future follow-up forms.

Let us know if it would be helpful to see a worked example for one or both options.

Currently, Entity updates require getting a form update from the server but soon they will happen offline as well. That means you could generate sequential IDs for two participants in the same household that you visit one after the other without connecting to the server in between.

1 Like

Thank you @LN very useful, would really appreciate a worked example of both

So sorry about the delay in getting back to this!

You can find @joybindroo's example adapted to use Entity Lists here.

To try it out:

  • create and publish the form
  • go to the households Entity List add the "household" property. We need to add this one manually because the form doesn't save to it.
  • upload households.csv (99 Bytes) to the households Entity List (docs)
  • try the form out using a web form or assign the form to an App User to try it in Collect

To see the id update within a single household, you will need to refresh the form in between adding new household members. If you'd like to see updates happen offline, you can try the latest Collect beta with Central v2024.2.0 following the instructions here.

Example with household and participant Entity Lists

You can find a more complex example using a participant Entity List here.

This example demonstrates doing a little bit more participant management by giving the option to archive a participant in addition to adding a new one.

To try it out:

  • if you've been experimenting with the form above, you'll need to create a new project for this one because both have a households Entity List with a different structure
  • create the households Entity List (docs). This one isn't modified by the form so it needs to be created ahead of time.
  • add the following properties to the Entity List you just created: village and household
  • upload households.csv (736 Bytes) to the households Entity List (docs)
  • create and publish the form
  • go to the participants Entity List that was created by the form, add the village and household properties
  • upload the participants.csv (7.1 KB) to the participants Entity List (docs)
  • try the form out using a web form or assign the form to an App User to try it in Collect

:warning: Some things to note:

  • The village and households lists feel like they might not be needed. They're convenient for specifying labels, but maybe the ids would be enough. This is the same issue that @TomJ described in the idea to have a unique/distinct function
  • If you upload the household and participant CSVs as I created them, their order will look backwards in clients. This is related to the idea to make Entity List sortable
  • There's a lot of computation going on in this form! I would want to try some test data with my target devices if I wanted to use this for more than 20,000 participants or so

I know there's a lot going on in these examples! They were a good opportunity to illustrate several different Entities-related ideas. Hopefully you can take what you find interesting and leave the rest behind.

1 Like