Is it possible to link 3 surveys dependent on initial data collected in the first survey? Problems with the repeat function

Good afternoon!

We regularly use ODK surveys from Excel to collect farmer group member data. This works really well except for the repeat function.

For example, one of the questions is, 'how many members of your household are in the same group?' If the answer is 2, then the next question is on repeat and asks for the details of each household member. Some will have no members, some will have 9. This functions well but for me the time-consuming part is then matching up the index numbers with the other household members when the data is downloaded on Excel.

These results are in a separate Excel sheet and one by one, I have to look at who answered ýes' to the initial question, scroll to the far right, fiind the index number and then search for those repeat household members on the other sheet, go back to the previous and match those names with others on the initial sheet. This is to be clear about doublecounting of households.

Now, to take this example one step further, I want to do an agribusiness survey of those group members and a nutrition survey for each woman of reproductive age and child aged 6-23 months in their households. So on the initial survey, the question will be, do you have any women aged between 15-49 in your household. If yes, then the number will be entered. Same for the children. And then the names which will be on repeat, depending on teh previous answer. Which will then link to an extra Excel sheet with index numbers in the results.

Then there are nutrition surveys for each woman and each child. The problem is that I can see no way of linking these surveys, so each time a woman starts a nutrition survey, we have to ask the same questions in the agribusiness survey, name, age, group name, location etc etc. And then the name of their household group member so we can match the 2 together.

If I make the repeat function for those surveys, then I will have the task of matching up the indexes of the agribusiness survey with each of the women's surveys and the answers to the women's surveys will be displayed vertically rather than horizontally on the second sheet, meaning it will be very difficult to extract the data from them.

Is there any way to extend a survey based on what was entered at the start so that the results from that second survey are not linked with an index number and a separate Excel sheet?

Oh, this question is hard to describe! My apologies, and many thanks to anyone who is able to unpack it!

Hi Heather,

Although I am not sure if I understand your issue completely, here are some ideas:

  1. Just checking: are you using INDEX-MATCH to combine the different excel sheets, and if not, would that make things easier?

  2. Another possible solution could be to create a csv file of your agribusiness survey results, attach it to your nutrition survey and pull data from that (see http://xlsform.org/#pre-loading-csv-data). This way you can for example use dynamic cascading selects to identify which household you are doing the survey with (for example ask for region, then village, then select a name), and if you know that you can pull in answers from a previous survey. Not sure if it's clear what I mean, I can elaborate or send you examples if you think it could be a solution.

  3. You could consider using an ODK-based service like SurveyCTO. If you use their software, you can choose to export repeat groups in 'wide' format (so you don't have separate csv's anymore). To my knowledge ODK Briefcase does not have that function. I'm not affiliated to SurveyCTO and don't mean to advertise but this feature of theirs does sound like something you're looking for.

Hope this helps :slight_smile:

1 Like

Thanks so much Lieke!

  1. Index match is not really a useful option here because I need to cross check with the entered data from the household survey, not list the results. Not sure if that is clear? These issues are so difficult to describe!

  2. Is it possible to use this csv file while the survey is being completed? And also will cascading selects work when the previous options were also cascading selects? Let me try to explain..

The intention is for the enumerators to complete the agribusiness survey and then the nutrition survey at the same time. What I am trying to do is link the responses from the agribusiness survey to generate the nutrition survey(s). So if a member of the HH is a woman of reproductive age, the nutrition survey will start after the agribusiness survey. If 2 members of the HH are women of reproductive age, 2 nutrition surveys will start. What I do not want to have is a repeat function where the nutrition survey results are in that second sheet because it's not just a match of person but an entire survey.

In the agribusiness survey, we already have cascading selects. Municipality goes to selection of Admin post, then Suco, then Aldeia (all locations). This is some of the data which would be useful not to have to select both for the agribusiness and nutrition surveys as they will be in the same household. BUT also the entered data- name of group, name of participant also need to replicated from the agribusiness survey to the nutrition survey because they may not be the same person in each survey but in the same household.

Can cascading selects (already selected) from one survey be cascading selects in the other?

And can csv files update while the survey is being completed without internet? So if the agribusiness survey gets the response that there are women of reproductive age, the nutrition survey will be generated complete with the data already selected in the agribusiness survey just a few moments ago?

Thank you so much for your assistance, I'd experiment a lot more if I had decent internet but this is why we use ODK in the first place (-:

Oh, and I forgot to mention that while it may not seem necessary to have the surveys matched by municipality etc as well as group name and person name, the problem here is that many people have the same names and many of the groups have the same name! It's only through all of teh additional data (DOB, location etc) that we are able to differentiate between them! Just to complicate matter....

Ah no, I'm afraid that without internet, data from an attached csv can only go to the survey, not the other way around. So it would only work if you completed your household survey, submitted the data and extracted the results from that into a csv, which you then attach to the nutrition survey.

I think the only way to use answers to earlier questions in other questions without using internet is for them to be in the same survey.

What I don't entirely understand yet is why repeats are not an option: you can set them to repeat as many times as you have women of reproductive age in the HH; and you could use calculate variables to pull in loads of data from the main survey, like municipality, admin post, but also maybe a unique household ID, which will then be in the same sheet as the nutrition results.

Then you state that it's an issue that the women will each have their own row in the separate Excel sheet, but isn't there a way to fix that with some smart use of Excel? I think it's hard for me to grasp what the problem is here because I don't know how you want to use or analyse the data. I would guess that if you put the separate sheets (agribusiness, nutrition) in the same workbook, you can use functions countif(s), sumif, to pull out the Nutrition results for all the women from the same household. But it's probably more complicated than that? :wink:

1 Like

@avansa, like @Lieke said before, with surveycto you will be able to download the final data in a wide format, where you will have the repeat group questions inline with the main data, for example, if there are 3 members in a household, you first ask how many members are in this household? let say the variable name for this question is household_size and the answer is 3, the repeat questions will be like what is the gender of the first household member? lets say male for the first member and the rest female, and the variable name is gender, and lets say the last question in the repeat group is how old is he/she now? lets say 20 for the first member and the rest 35. and the variable name is age, in the wide format, your data will appear in this form, household_size = 3, gender_1 = male, gender_2 = female, gender_3 = female, age_1 = 20, age_2 = 35, age_3 = 35. gender_1 to gender_3 will be a variable name for the first member to the third member in the main dataset without the sub group sheet with the repeat group and sheet, if this is what you are looking for, then you have to use stata to convert the long format you are currently having to wide format

if i get you right, you want to link the agribusiness survey with the nutrition survey such that when a member in the initial survey answer yes to do you have any women aged between 15-49 in your household? and you ask for the number of women in that age range, and you take the details of each women in the initial survey with in a repeat group. and afterwards you then ask the agribusiness questions and that of the nutrition questions, if this is so, you can create the agribusiness survey in a group form in the initial survey and that of the nutrition survey use calculation to call the index of the each woman into the agribusiness or nutrition group.

can you share your data with me and i see if i can use stata to merge them for you, thus the initial data (main data) and the repeat group data. odk does not give the wide format export of data yet but may be possible in the near future but for now it is the long format that odk provides

fabla2020@gmail.com

Thanks again Lieke!

I've been playing with repeat functions and have now 4 surveys linked via repeat. I can't see any other way of doing this even on SurveyCTO (which was however, very useful to look at various formulas to reach my goal- thanks for the link!

SO I think I have one final question for you 'use calculate variables to pull in loads of data from the main survey, like municipality, admin post, but also maybe a unique household ID, which will then be in the same sheet as the nutrition results.'

What do you mean by calculate variables? Will these work if the data is from a cascading selection?

I'd love to have colleagues to work with on these issues but unfortunately I'm on my own so I'm so sorry if these questions are very basic....

By calculate variables I mean the question type "calculate" which is hidden for the user but the results will be stored in your data (https://docs.opendatakit.org/form-logic/#calculations).

If you put ${fieldname} in the calculate column it will store the value of the answer that was given to that earlier question, regardless of whether the answering options came from a choice list or a csv or cascading select.

Alternatively, if you want to store the label instead of the value (in case of a select_one or select_multiple question), you can use "jr:choice-name(${selectonefield}, '${selectonefield}')". See also https://docs.opendatakit.org/form-operators-functions/#accessing-response-values

And lastly you can use "pulldata(source, colname, lookupcolname, lookupval)" to pull in a value from an attached csv file. I use this for example to lookup the ICS code and information on cooperative membership on an attached farmerlist when we interview farmers. So this is an interesting option if you already have some information on the households your are interviewing before you get there.

Don't worry about it, happy to help. It takes a while to get to know this stuff even with a reliable internet connection :slight_smile:

3 Likes

can you share a sample file and explain how you want it, maybe this can help

Thanks again Lieke, I've had no time to work on this but now back! I'm trying to use the calculate to show the data from the initial questions but I don't know where to put the calculate?

  • the initial questions run through -municipality, admin post, suco and aldeia which are cascading. Then group name, name of first respondent, hh type. This is the data that I want to have within the repeating surveys.
  • After this 'are there any women of reproductive age in the household available to be interviewed today' Then name of first woman, 2nd woman, 3rd woman.
  • Are there any children aged 6-23 months? Then name of 1st, 2nd, 3rd.
  • then the agribusiness questions for the first respondent
    -Then there are 2 repeats which are based on the responses to the women and children answers. In each there is an option to select from the names previously entered.

I have tried putting the initial data using calculate as per your response, as notes, each on separate rows within a group, which is useful for the enumerator to check but does not show in the results on the Excel.

How do I get it to automatically enter into the second and third survey results in the separate Excel sheets from each repeat? Is the 'type' wrong? It shouldn't be a note? But then how do I convert the answer from a previous question into a question?

I can't pull this data from previous surveys because the women of reproductive age and children are not previously noted anywhere.

I hope this is somewhat understandable!

Thanks again

Don't worry Lieke, I got it! All calculations wouldn't work because they had no child. All notes won't work because there's no data. So a note + calculates have resolved all my problems.

Thanks SOOOOO much !!!!:grinning:

3 Likes