Generate sequential ID based on pulled data

1. What is the issue? Please be detailed.
We do annual census in which we collect data from individuals that were in in the previous census, here we pull data from the previous census and validate the participant id (PTID), if an individual is new in the household e.g immigrant or new born we add them to the current census and create for them a new PTID.
For example, if a household was censused in village 6 and household 205 and the last created PTID for another individual in that household was 6205045, the new PTID number in this household will be 6205046 for this new individual in this census. ODK must search from the pulled data (csv) check the previous number assigned in that village(6) and that household (205) and assign the next sequential number
Any help will be much appreciated

2. What steps can we take to reproduce this issue?

3. What have you tried to fix the issue?

4. Upload any forms or screenshots you can share publicly below.

Prepare another external dataset in which you only store maximum number value of the PTID for each household. When you want to add new member in household you use this value, increment it by 1, and then use it to assign as PTID to new individual member of the HH.

You also need to keep updating this external dataset based on new daily updates in the main submission dataset.

Thank you @joybindroo , could you be having a sample xls

You can take similar steps for achieving the functionality

  • Prepare the External Dataset:
    Create a CSV file containing only the household number and the highest PTID for each household. This will allow the form to fetch the maximum PTID for the household and assign the next sequential number for new members.

    Example external CSV dataset (ptid_max.csv):

    household, max_ptid
    6205, 6205045
    6206, 6206049
    7101, 7101020
    

    In this CSV file:

    • The household column represents the household number (which includes your village id too).
    • The max_ptid column stores the last PTID in that household so far.

  • Update the External Dataset:
    Regularly update this dataset with new PTIDs after daily census operations or at the end of each census batch. This ensures that newly added members get the correct next PTID.

  • Modify Your XLSForm:
    Use the pulldata() function to fetch the maximum PTID for each household from the external CSV and create logic to assign the next PTID to new members.

    Key elements to include in your XLSForm:

    • Use the pulldata() function to fetch the max_ptid for the household from the external dataset based on the household ID (${HHID}).
    • Use a simple calculation to increment the PTID by 1 for the new household member.

    Example XLSForm rows:
    I assume HHID to be the variable holding HH number, your scenario may differ - so adjust accordingly.

    Survey sheet:

    type name label calculation relevant
    calculate max_ptid pulldata('ptid_max', 'max_ptid', 'household', ${HHID})
    • The pulldata() function pulls the maximum PTID for the entered household (${HHID}) from the external CSV dataset.
      Then create a new calculated new_ptid field , that adds 1 to the max_ptid to generate the next PTID for the new member.

  • Handle New Households:
    In case the household has no previous PTID (e.g., for a new household or missing data), the pulldata() function will return an empty value. You should set a default starting value for such cases. This can be based on the household ID, like starting from <HHID> + 001.

    Example formula for new households:

    if(${max_ptid} != '', ${max_ptid} + 1, concat(${HHID}, '001'))
    

    This formula will assign the next sequential PTID if max_ptid exists, or create a new PTID starting with the household ID for completely new households.


  • Deploy and Test:
    After designing the form, test it using mock data to ensure that new members receive the correct PTIDs.

It is also crucial to ensure that the external dataset is being updated accurately for future rounds of the census.


Thank you, this is really helpful

Instead of using manual updates, consider using Entities which let you share data between forms. You can think of an Entity List as a CSV attachment that's shared between forms and can be updated by form submissions.

To generate sequential ids, you could use a household Entity List that keeps track of the highest participant ID assigned within that household as @joybindroo suggests. Submissions can then automatically update that highest ID. You can use pulldata in the same way as described.

Another option would be to use a participant Entity List that keeps track of each participant, the household that they're in, and their number in the household. You can then use the max function to get the highest participant id used so far and use it as the basis for the next participant id. This would be a good approach if you also want to use participant information such as their name in future follow-up forms.

Let us know if it would be helpful to see a worked example for one or both options.

Currently, Entity updates require getting a form update from the server but soon they will happen offline as well. That means you could generate sequential IDs for two participants in the same household that you visit one after the other without connecting to the server in between.

1 Like

Thank you @LN very useful, would really appreciate a worked example of both