Preventing from collecting two same persons

During a survey, data collectors collected the same person more than one without knowing it.µ
This type of mistake happened to many data collectors.
I do think that to resolve this issue, it surely exists a way to prevent them from collecting the same interviewee many times.
We use ODK v1.25.2.

A part from telling to be careful, we really want to have any other solutions.

Can you include a question at the start to ask the interviewee if they have previously responded to the questionnaire and then end the survey if they respond "yes"?

This is a hard problem. Technically, there are potentially some "workarounds" that you could try, but I can't think of any that would yield a satisfactory result. The data is only on the collection device until submitted, and then the data flow with Collect is not bi-directional, devices don't know about the accumulating data on the server.

From a data standpoint, you would need some sort of unique identifier for each person. Collecting such information means you'll be collecting personal data, and should be concerned with ethical data management processes and data privacy laws. Last name is not enough. First and last name is also not guaranteed to be unique. Potentially phone number, but there are places where a single SIM is shared among multiple people, or a single person might have multiple SIM cards. Potentially some sort of government ID number. But people might not have one, or not want to provide it. And with any of those identifiers, a single letter or number typo could result in a failure to identify a duplicate without more complex fuzzy-matching or something.

It would have been great if the interviewees were the same. I mean, each household is identified by the head household's name and the ID attributed to them but the respondant cannot forcefully be the head household. In this mistake case, data collectors came yesterday interview the wife's head household and today they meet the eldest child (the parents being out). This to imply that the interviewee could answer "No" while the household had already been interviewed.

Yes, I share your view. The data is only on the device until submitted.

To respond to your last paragraph, to each household that must be interviewed, is attributed an ID (They don't know about it because they have been ramdomly selected from a base we have). So, can you explain your ideas about the single letter or number mentionned, please?

If the phones all come back to your office each day, and if you have the list of IDs and the households know their ID, at the end of each day you could load all the phones with an updated CSV of the IDs that have been completed, and you could include a question in the survey using something like pulldata() to see if the household ID being interviewed was already completed on a previous day.

1 Like

Would you please clarify more how to prevent duplicate entry provided that we have their ID number? I am going to collect data from hospital patients, each patients have their own unique hospital ID number. I want to use that number to prevent duplicate entry. However, I am not able to command on such a way. Please concerning this issue?

It would only work for data that has been collected on previous days and loaded to the survey. In this example the CSV is updated with previously collected IDs. This will not work "live" during a day of data collection - only after the CSV file on the phones is updated.

pulldata.xlsx (9.9 KB)
listing.csv (73 Bytes)

1 Like

Thanks a lot for your response, I will try that and inform you soon.

Hi danbjoseph, I tried the way you advised me. However, it's still accepting the same patient ID number which I already inserted yesterday. When the CSV file on the phone is updated? Am I expected to upload CSV file or to do other thing otherthan what I typed in XLSX?
Thanks in advance.

are you able to upload your XLSForm here?

DM2interv.xlsx (29.1 KB) here is the xls form. But the appearance was written as 'number' previously i.e. missing 's' if it's something which makes a problem. U can access the issue in row 8 to 12. Thanks

The ID can be letters and/or numbers. You can adjust how it is recorded.
You will need to create an updated listing.csv file and add it to the phones or update the version on your server and download to the phones that way.
Screen Shot 2020-03-05 at 2.10.00 PM

Thanks for your dedicated responses... I need to use only numbers as patient ID (so what to do for this case?). Besides, am I expected to add listing.csv files after filling those numbers I already completed in the survey? How can I convert it to csv file and how I add to the phone or update it in the server? Is it the same way we do for xlsx form or something different?

Any spreadsheet program (such as Excel) can save out a file as a CSV.

You can hand-copy the file to the devices. When you first download the form, it shows in the *-media folder as just the listing.csv file. After opening the form once, it changes to a listing.csv.imported and listing.db file. You can delete those two files and copy in a new CSV with updated rows.

I hand copied listing.csv in the forms named DM2interv3-media. but can't open it. what is the next step?
Thanks a lot...

listing.csv (46 Bytes)

pulldata will look for a value in one column and then return a value from a different column in that same row. so in the below example it will look in the listing.csv file, it will return the value from the "done" column, it will look for a match in the "patientId" column, and it will lookup using the answer provided for the "patient_id" question.

Screen Shot 2020-03-05 at 5.28.04 PM

when the value, such as "1234" exists in the csv, it returns the value (in this case "yes") from the "done" column.

when the value doesn't exist, nothing is returned.

You could also include in the csv, data about who completed the interview, or when.

And you can use the result of the pull data in a relevant to guide your survey.

Really thank you very much for detailed explanation and invaluable commitment you made. I will try that and inform you.

I have tried by copying my listing.csv in ODK collect form with in the folder-media. But still it's accepting duplicate entries. Would you please help me more? Let me send you the xlsx file, csv, and zipped file which is in my ODK collect forms. Please check whether they appear right or identify the faults. Or let me know if there is any constraint in updating and inserting the csv file in ODK collect...Thanks in advance...

Here is simple example showing how to use the result of the lookup and relevant formulas to either start a group, or show a note and end the survey:
lookup-w-relevant.xlsx (14.5 KB) & listing.csv (54 Bytes)

When the lookup finds a match:

When the lookup does not find a match:

Thanks for your response. I will try that way. But would you please tell me if the way I copied the csv file if you have already seen the zipped file (which contains the ODK form files in my phone device) which I have already sent here. Thanks again