Random sampling in ODK Collect

To followup on my original posting, it turns out accommodating external datasets didn't take too much effort! :man_dancing: I also made a couple improvements to the original (ie internal) random sampling form.

First, I removed the need to have a contiguous index field - 1,2,3...N - for every item in your dataset. Although this remains a strict requirement for the related randomizing question order form, its not actually required for this random sampling workflow; instead, you just need a field in your dataset containing a unique identifier for each element [Important: this id cannot contain spaces!]. The name of this id field isn't terribly important, but I've kept it as the regular choices 'name'. You can change the name of this field, to perhaps something that maps directly to your particular dataset, but you will then have to make the corresponding changes to the form, and you may get Validate errors being thrown when pyxform cant find the name and label fields that it is expecting for choice lists. So I'd probably recommend changing your dataset id field to 'name' until you are comfortable with exactly what this form is doing and how.

Second, I added in an optional filter on the overall (external) dataset, in case you might want to restrict the sub-sampling to a particular subset of your data. In the example shown here, I've restricted the random sample to only organizations with over 1000 employees. This requires adding an appropriate choice-list filter in a couple places; specifically

once(join(' ', randomize(instance('dataset')/root/item[Employees>=1000]/name)))

and

count(instance('dataset')/root/item[Employees>=1000])

If you dont need to filter and simply want to take a random sample from your entire dataset, just replace "Employees>=1000" with "true()" in both places.

Most everything else is much the same as internal dataset random sampling form, other than obviously using a select_multiple_from_file question at the end instead of the original select_multiple

Random sampling with external dataset form

The new random sampling form - for external datasets - is below, along with the sample dataset that I used (basically, the same data as previously, but now in an actual external cvs file rather than copied into the choices sheet).

randomsample_external.xlsx (10.3 KB)

dataset.csv (13.2 KB)

Result

3 Likes