Analysing data from multiple select/ODK aggregate

So, this is slightly more low-level than the advice offered previously on
this thread, but in terms of implementing an expansion of select_multiple
variables, my approach is this:

  1. I pad the strings with spaces on either end, e.g. "1 2" >> " 1 2 " (can
    be done via a simple concatenate command in whatever program you're using)
  2. Then, I generate dummy variables for each possible option, e.g. dummy1
    dummy2, and set them equal to 1 if the string contains the space-padded
    number, e.g. search for " 1 ", " 2 ", etc. (Does that make sense? This is
    to prevent an option like "14 11" from showing up in a search for "1").

In Excel, you could use the search function to create these dummies, e.g.

dummy1 >> =IF(ISNUMBER(SEARCH(" 1 ",A1)),1,0)
dummy2 >> =IF(ISNUMBER(SEARCH(" 2 ",A1)),1,0)

(Obviously, you'll need to update the cell reference from A1).

If you use stata, I've written an ado file that automatically does this for
coded answers 1-15, -996, -998, -999 (you could update these ranges as
appropriate to your project and coding strategy); it also takes a defined
value label list and uses that to name the dummy output. Code pasted below.

Would love to hear how other people achieve the same.


ยทยทยท --------------------------

*! Program to expand variables that list multiple choices as a string, into
distinct individual choices. Assumes a string of space-separated number
codes and "other" responses.

program define kexpand
version 12

syntax varlist, valuelabel(string)

foreach v of varlist varlist' { replace v' = subinstr(v', "other", "-996", .) // Replace "other" responses with code replace v' = "" if v'=="." // Replace "." responses with blank string replace v' = " " + v' + " " if v'!="" // Pad string with
spaces on both ends
forval x= 1/15 { // Expand to
dummies using regex match
gen v'_x' = .
replace v'_x' = 0 if v'!="" replace v'_x' = 1 if regexm( v', " x' ") local varname ":label valuelabel' x''"
label var v'_x' "`varname'"

            foreach x in 996 998 999 {               // Expand missing 

values to dummies using regex match
gen v'_x' = .
replace v'_x' = 0 if v'!="" replace v'_x' = 1 if regexm( v', "x'") local varname ":label valuelabel' -x''"
label var v'_x' "`varname'"

foreach v of varlist varlist' { // Drop if variable is totally empty foreach x of varlist v'_* {
qui sum x' if r(mean) == 0 & r(max) == 0 & r(min) == 0 { drop x'
di "`x' dropped because all values were zero or


On Thursday, April 4, 2013 2:17:17 PM UTC-4, wrote:


I wondered if anyone had any tips on analysing data gathered using a
'select multiple' question, with data processed by ODK aggregate

At the moment, for each set of responses it produces a string separated by
spaces in each cell, for example:

1 2
1 3 4
2 4

But the problem is, this makes things tricky for data analysis. Sometimes,
I am interested just in everyone who answered '2' - but it isn't possible
to filter the results to see just the people who answered '2' - because
some people answered '1 2', some '2 4' etc.

I wondered what your tips were on dealing with this kind of data?