ODKmeta with Stata

Hema · May 16, 2019, 9:48am

I have tried to make the do files using ODKmeta.
I have exported the CSV from ODK briefcase but due to repeated group and count the form data break into three csv. so i tried to make do file while making the do files stata ask for all three files but after successfully making the do file I run the do file to check the output of form it is given partial data in output (only first csv data).
please share the way to get the full data?
should i merge the csv before making do file or stata will do it?

Matthew_White · May 17, 2019, 12:01am

Hi @Hema!

It's normal for ODK Briefcase to export a separate CSV file for each repeat group in your form. If your form has two repeat groups, then you should have three CSV files, one for the main fields (the fields outside a repeat group), then one each for the two repeat groups.

odkmeta expects this: it expects a separate CSV file for each repeat group. The odkmeta do-file saves a .dta file for each CSV file, so in your case, at the end of the do-file, there should be three .dta files.

The do-file also merges the dataset for each repeat group into its parent dataset. For example, if you have a form with a repeat group named A, which itself contains a repeat group named B, then there will be three datasets: one for the main fields, one for A, and one for B. The dataset for B will be merged into the dataset for A (using the reshape and merge commands in Stata), then the dataset for A will be merged into the dataset for the main fields.

Does the odkmeta do-file run without error? If there is an error, what does the message say?

Jess · May 20, 2019, 6:27am

Hi @Matthew_White (I'm a long-time reader, first-time poster) - I'm glad to see this clarification come up on the forum. I was also super glad to see odkmeta - definitely appreciate that you coded it up and are making it available to all. I was using odkmeta to convert files saved to Google Sheets and wasn't sure how to treat repeat groups as you and Hema are discussing. But, I'm still a bit unclear -

If odkmeta expects multiple, separate CSVs, how do you code it in? The syntax in STATA is:

odkmeta using filename, csv(csvfile) survey(surveyfile, surveyopts)
choices(choicesfile, choicesopts) [options]

Where csv(csvfile) points to the directory with the data stored as a csv files downloaded from google sheets. How can you signal that there are multiple csv files? (I'm using the google sheets, haven't yet decided if I'll set up ODK aggregate).

Background in case this doesn't sound familiar: Similar to ODK briefcase, uploading the data to google sheets generates separate worksheets for each repeat group. So when you save as a csv, you have to save each sheet as a separate csv.

Matthew_White · May 20, 2019, 6:38pm

Hi @Jess, and welcome!

You actually don't need to tell odkmeta that there are multiple CSV files. odkmeta creates the do-file using the survey and choices sheets of the XLSForm, and it determines from the survey sheet how many repeat groups there are and what their names are. Just specify to option csv() the name of the CSV file for the main fields (the fields outside a repeat group). ODK Briefcase uses that filename for each CSV file it exports, appending the repeat group name to the name of the associated CSV file. (There's currently a slight discrepancy between the filenames that Briefcase uses to export and the filenames that odkmeta expects. However, it should be possible to rename the exported files to match what odkmeta expects, or to rename the CSV and .dta filenames in the do-file to match what Briefcase exports.)

Note that odkmeta does expect the CSV files to match the Briefcase export CSV format. I'm actually not sure whether the Google sheets match that format — I would be interested to hear what you discover there!

Hema · May 21, 2019, 5:17am

Thankyou very much for your reply.

Can you please guide me regarding the huge file extraction. I have a odk form that having around 50000 and still we are collecting data of that form but I am unable to extract that data from odk aggregate server or briefcase. Please share some source to extract that data apart from database.

iamnarendrasingh · May 21, 2019, 6:11am

You should use ODK Briefcase for downloading data which is one of the best software to download/extract your dataset from the server.
How to use ODK Briefcase? This might be your next question and solution for this question is Here. You can use above odk document for how to use ODK Briefcase and I am 100% sure that this will help you.

Best,
@iamnarendrasingh

Matthew_White · May 21, 2019, 2:07pm

I also recommend using ODK Briefcase. (I believe that ODK Aggregate exports data in a slightly different format from Briefcase, and odkmeta expects the Briefcase format.) If you have trouble using Briefcase after visiting the documentation, I recommend creating a separate topic about that.

Hema · May 27, 2019, 9:02am

While extracting the file from ODK briefcase it get failed. I am unable to extract it from Briefcase.

Matthew_White · May 27, 2019, 8:19pm

I see that you started another topic about the Briefcase issue here. If you answer Yaw's questions in that topic, it should be possible to make progress toward a solution.

Jess · May 28, 2019, 6:31am

Hi @Matthew_White - Thank you for clarifying. I ended up deciding that my repeat group was more trouble than it was worth -- so I just coded it up as a list and then added a relevancy conditions depending on how many text boxes the interviewer fills in. I also read elsewhere that Google Sheets as a server does not handle repeat groups as well as briefcase.

Thanks again!

Dushime · February 6, 2022, 7:42am

Can someone post the example of ODKMETA example here, just the syntax only.
. odkmeta using import.do, csv("C:\Users\dts\Desktop\testing ODK meta\datacheck.csv") survey( "C:\Users\dts\Desktop\testing ODK meta\survey.csv") choices("C:\Users\dts\Desktop\testing ODK meta\choices.csv")
Then stata is giving me this error messages?
column header type not found
invalid type() suboption
invalid survey() option