Select random individuals from within a roster

1. What is the issue? Please be detailed.
I am trying to make a form that does a 2 step process

Step 1 - Generate an N long household roster This is straightforward and I have existing forms for this using indexed repeats etc.

Step 2 - Select X random individuals within that household roster
If it was just a single person I could see that I could use calculate to generate a random number between 1 > N and use that to select someone.
But I cant see a solution for selecting more than 1 individual because I think if I use random it is possible for the same person to be selected both times?

2. What steps can we take to reproduce this issue?
The attached excel I took from a similar example (https://stats4sd.org/resources/398)
This is really nice and
a) Makes a household list
b) Identifies people within that list who are eligible
c) Then selects 1 person that list at random

So effectively I want to be able do 2 or 3 runs of the Calculate in Line 17 and the note in 18 so that it randomly selects >1 person from the list

4. Upload any test forms or screenshots below.
random-list-name.xlsx (14.5 KB)

1 Like

I gave this a shot by generating a household roster of members in a repeat, then selecting the number to sample, finally generating a random number of repeats based on the sample required. Then in each 'random repeat' getting the information from the list of households based on the HH roster's position. However, it was still selecting the same households more than once and couldn't get in info needed from the 1st repeat.

I think it needs a de-duplicate() function or distinct-values() function. It's described as a GitHub issue here and it is possible in SurveyCTO described in post here.

Test_villagesampling.xlsx (12.6 KB)

What I would do is generate the combinations ahead of time and randomly select one of those.

Here is an annotated example to give you a feel for what I mean.

I assumed we were selecting 3 people and that family size would always be between 3 and 6. You could do the same for selecting subsets of any size. You could even have separate lists for choose 2 and choose 3 if that needs to be variable. You could also use if to account for households with fewer individuals than need to be selected.

I generated combinations for a family size up to 6. That was quick to do by hand. You could extend that and automate generating the combinations (there are 20 choose 3 = 1140 for a household of 20). Pay attention to the combination order -- this works because e.g. all of the combinations for a family of 5 come before the combinations for a family of 6. That enables us to compute a single random number, scale it up to the number of combinations for the family size, and select one combination.

Please note that for any form with parallel repeats like this, edits to the first repeat may lead to bad behavior. For example, if you enter the first person as Kwame Jones, age 3, answer the second repeat's worth of questions for Kwame and then change the first repeat to Beatrix Jones, age 75, you'll have answers in the second repeat that now look like they're for Beatrix Jones but were originally captured for Kwame. This is important to train on specifically.

Since sampling is complex, I would recommend talking it through with a statistician as well. As usual, it's always a good idea to do sample runs and training!

I've updated the form you attached to make sure the random value is only computed once (see docs). You could also use @Lal_S's approach of a calculation with random() wrapped in once().

1 Like

Thank you @LN very helpful and I think we can use this in our household surveys.

One limitation could be a situation where we would like to randomly sample a proportion of household members, say 30% with out replacement, so the people we select would vary as would the family size, but we would only like to select 30%. I don't think this is possible with pre-specifying the combinations ahead of time.

1 Like

Glad to hear it’s looking like a promising approach! If you do end up testing and polishing it, please do share your findings.

I think you could do the same. For each household size, you’d calculate how many individuals represent 30% and you’d use that as the number of selections to include in your combinations. You’d have to build separate lists for each (30_perc_of_4, 30_perc_of_5, etc).

1 Like